Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FlowFeatureMapper speed improvements #8982

Draft
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

dror27
Copy link
Contributor

@dror27 dror27 commented Sep 16, 2024

  1. Allow for using separate threads for reading / processing / writing (max 3)
  2. Use NM SAM tag instead of edit distance
  3. Perform likelihood scoring on trimmed read

Additional changes include:

  1. CachingIndexedFastaSequenceFile is enhanced to be thread safe and allow for adjusting its cache size. The change involved synchronizing the main query method (getSubsequenceAt) and deriving the cache size from a settable variable rather than from a constant.

@ilyasoifer ilyasoifer marked this pull request as draft September 17, 2024 09:27
@ilyasoifer ilyasoifer self-requested a review September 17, 2024 09:28
Copy link
Collaborator

@ilyasoifer ilyasoifer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Some minor comments

/**
* do not emit edit distance at all
**/
@Argument(fullName = "no-edit-distance", doc = "do not emit edit distance at all ?", optional = true)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine that these changes are going to raise some questions. Can you please add description in the PR?

@@ -759,5 +886,35 @@ private FeatureMapper buildMapper() {
throw new GATKException("unsupported mappingFeature: " + fmArgs.mappingFeature);
}
}

private FlowBasedRead buildTrimmedFlowBasedRead(final GATKRead read, final TrimInfo trimInfo, final FlowBasedReadUtils.ReadGroupInfo rgInfo) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this code might be duplicated.
HaplotypeCaller routinely trims reads. Could you check if there is a function like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have refactored this code to be closer in naming convention to its purpose

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really sorry for being a pain, can you explain why functions from ReadClipper (e.g. SoftClipByReadCoordinates) etc. can't be used for this task?

@dror27
Copy link
Contributor Author

dror27 commented Sep 23, 2024

@ilyasoifer I have addressed your comments. Thank you

@dror27
Copy link
Contributor Author

dror27 commented Oct 1, 2024

@ilyasoifer could you please take a look at the two failing CI/tests and see if there is something to be done - or it is a usual/environmental thing? Thanx

@ilyasoifer
Copy link
Collaborator

@ilyasoifer could you please take a look at the two failing CI/tests and see if there is something to be done - or it is a usual/environmental thing? Thanx

@dror27 - looks like an environment thing (because I see that other PRs also fail), but let's wait for a bit with merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants