Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Profile mehari #430

Open
xiamaz opened this issue Apr 2, 2024 · 6 comments
Open

Profile mehari #430

xiamaz opened this issue Apr 2, 2024 · 6 comments
Assignees
Labels
enhancement New feature or request

Comments

@xiamaz
Copy link
Contributor

xiamaz commented Apr 2, 2024

Is your feature request related to a problem? Please describe.
We need to understand, which code paths influence mehari performance.

Describe the solution you'd like
Use flamegraph, DHAT to understand code paths and allocations. Add a branch with coz support. Look into https://crates.io/crates/criterion.

Describe alternatives you've considered
None

Additional context
This is the basis for additional work.

@xiamaz xiamaz added enhancement New feature or request documentation Improvements or additions to documentation and removed documentation Improvements or additions to documentation labels Apr 2, 2024
@xiamaz xiamaz self-assigned this Apr 2, 2024
@tedil
Copy link
Contributor

tedil commented Apr 30, 2024

profiling

Profiling was done with an explicit profiling profile:

[profile.profiling]
inherits = "release"
debug = true

on a machine with the following specs

CPU: quad core Intel Xeon E5-1630 v3 (-MT MCP-)
speed/min/max: 1621/1200/3800 MHz Kernel: 6.5.0-28-generic x86_64
Mem: 9386.1/64180.0 MiB (14.6%) Storage: 685.61 GiB (26.7% used) Procs: 337
Drives:
  Local Storage: total: 685.61 GiB used: 182.78 GiB (26.7%)
  ID-1: /dev/sda vendor: Samsung model: MZ7LN256HCHP-00000 size: 238.47 GiB  # hosts /, /home
  ID-2: /dev/sdb model: MZ7KH480HAHQ0D3 size: 447.13 GiB  # hosts mehari DB

flamegraph

Only uses the first 1M records from the input file (NA-12878WGS_dragen.vcf.gz)

invocation

cargo flamegraph --profile profiling --bin mehari -- annotate seqvars --path-db /mnt/data/mehari/0.21.0/db --path-input-vcf tests/data/annotate/seqvars/NA-12878WGS_dragen.first1M.vcf.gz --path-output-vcf /tmp/NA-12878WGS_dragen.first1M.annotated.vcf.gz --path-input-ped data/FAM_BE_10.sup.ped

result

flamegraph
(click here for the interactive version)

dhat

Only uses the first 100k records from the input file (NA-12878WGS_dragen.vcf.gz)

invocation

cargo run --profile profiling --features dhat-heap --bin mehari -- annotate seqvars --path-db /mnt/data/mehari/0.21.0/db --path-input-vcf tests/data/annotate/seqvars/NA-12878WGS_dragen.first100k.vcf.gz --path-output-vcf /tmp/NA-12878WGS_dragen.first100k.annotated.vcf.gz --path-input-ped data/FAM_BE_10.sup.ped

result

dhat-heap.json (view in dhat-viewer)

coz

Figure out reasonable scopes before running coz.

@tedil
Copy link
Contributor

tedil commented Apr 30, 2024

At the moment, my reading of this is roughly:

  • have a look at ConsequencePredictor (predict/build_ann_field)
  • check hgvs Mapper g_to_n and similar
  • cache alignment::Mapper (have looked into that, seems promising, see below)
  • cache RefTranscriptData::new (doesn't seem to have much impact)
  • is using bcf instead of vcf an option? (smaller files, less text parsing, fewer writes)

@tedil
Copy link
Contributor

tedil commented May 2, 2024

caching build_alignment_mapper: 103s/1M records → 71s/1M records

@tedil
Copy link
Contributor

tedil commented May 8, 2024

Setting lto = "fat" and codegen-units = 1 shaves another ~7s off the 1M records timing. (64s/1M records)

@tedil
Copy link
Contributor

tedil commented May 8, 2024

Using noodles Async{Reader, Writer} shaves another ~10s off the 1M records timing. (54s/1M records)

@holtgrewe
Copy link
Contributor

@tedil impressive 50% time saved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: Ready
Development

No branches or pull requests

3 participants