-
Notifications
You must be signed in to change notification settings - Fork 117
Add BGEN option to explicitly not include sample ids #59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Henry D <[email protected]>
Codecov Report
@@ Coverage Diff @@
## master #59 +/- ##
==========================================
+ Coverage 94.1% 94.11% +<.01%
==========================================
Files 73 74 +1
Lines 3560 3566 +6
Branches 331 326 -5
==========================================
+ Hits 3350 3356 +6
Misses 210 210
Continue to review full report at Codecov.
|
kianfar77
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One suggestion. If you disagree let me know and I approve. The rest looks good.
| import io.projectglow.common.logging.{HlsMetricDefinitions, HlsTagDefinitions, HlsTagValues, HlsUsageLogging} | ||
| import io.projectglow.common.{GlowLogging, WithUtils} | ||
| import io.projectglow.sql.util.{ComDatabricksDataSource, SerializableConfiguration} | ||
| import io.projectglow.vcf.VCFOption |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It may be a good idea to gather all bgen options (including this one) in a BGENOption object like what we did for VCFOption. It will be cleaner and creates separation of VCF and BGEN. Just a thought. Let me know if you disagree and I will approve.
Signed-off-by: Henry D <[email protected]>
3061ab5 to
6d2e356
Compare
Signed-off-by: Henry D <[email protected]>
2db3617 to
a649844
Compare
kianfar77
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great.
…ctglow#59) * name change Signed-off-by: Henry Davidge <[email protected]>
* Add BGEN option to explicitly not include sampleIds Signed-off-by: Henry D <[email protected]> * organize datasource options Signed-off-by: Henry D <[email protected]> * add header Signed-off-by: Henry D <[email protected]> Signed-off-by: Henry Davidge <[email protected]>
What changes are proposed in this pull request?
Previously, if a BGEN file contained sampleIds, they were always included in the emitted DataFrame. If a user didn't want them (common when dealing with large cohort sizes), they had to manually drop them.
Now, you can provide the same
includeSampleIdsoption used by the VCF and Plink datasources. If it isfalse, sample ids are not included in the output.cc @emaxwell
How is this patch tested?
(Details)