Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 1 addition & 11 deletions core/src/main/scala/io/projectglow/bgen/BigBgenDatasource.scala
Original file line number Diff line number Diff line change
Expand Up @@ -39,17 +39,7 @@ class ComDatabricksBigBgenDatasource extends BigBgenDatasource with ComDatabrick

object BigBgenDatasource extends HlsUsageLogging {

val BITS_PER_PROB_KEY = "bitsPerProbability"
val BITS_PER_PROB_DEFAULT_VALUE = "16"

val MAX_PLOIDY_KEY = "maximumInferredPloidy"
val MAX_PLOIDY_VALUE = "10"

val DEFAULT_PLOIDY_KEY = "defaultInferredPloidy"
val DEFAULT_PLOIDY_VALUE = "2"

val DEFAULT_PHASING_KEY = "defaultInferredPhasing"
val DEFAULT_PHASING_VALUE = "false"
import io.projectglow.common.BgenOptions._

def serializeDataFrame(options: Map[String, String], data: DataFrame): RDD[Array[Byte]] = {

Expand Down
14 changes: 14 additions & 0 deletions core/src/main/scala/io/projectglow/common/datasourceOptions.scala
Original file line number Diff line number Diff line change
Expand Up @@ -36,11 +36,25 @@ object VCFOptions {
}

object BgenOptions {
// Reader options
val IGNORE_EXTENSION_KEY = "ignoreExtension"
val USE_INDEX_KEY = "useBgenIndex"
val SAMPLE_FILE_PATH_OPTION_KEY = "sampleFilePath"
val SAMPLE_ID_COLUMN_OPTION_KEY = "sampleIdColumn"
val SAMPLE_ID_COLUMN_OPTION_DEFAULT_VALUE = "ID_2"

// bigbgen write options
val BITS_PER_PROB_KEY = "bitsPerProbability"
val BITS_PER_PROB_DEFAULT_VALUE = "16"

val MAX_PLOIDY_KEY = "maximumInferredPloidy"
val MAX_PLOIDY_VALUE = "10"

val DEFAULT_PLOIDY_KEY = "defaultInferredPloidy"
val DEFAULT_PLOIDY_VALUE = "2"

val DEFAULT_PHASING_KEY = "defaultInferredPhasing"
val DEFAULT_PHASING_VALUE = "false"
}

object PlinkOptions {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,3 @@ object HlsTagValues {

val EVENT_PLINK_READ = "plinkRead"
}

object HlsBlobKeys {
val PIPE_CMD_TOOL = "pipeCmdTool"
}
Original file line number Diff line number Diff line change
Expand Up @@ -99,7 +99,7 @@ class PipeTransformer extends DataFrameTransformer with HlsUsageLogging {
// record the pipe event along with tools of interest which maybe called using it.
// TODO: More tools to be added
val toolInPipe = Map(
HlsBlobKeys.PIPE_CMD_TOOL ->
LOGGING_BLOB_KEY ->
pipeToolSet
.foldLeft(Array[String]())(
(a, b: String) =>
Expand Down Expand Up @@ -141,6 +141,7 @@ object PipeTransformer {
private val ENV_PREFIX = "env_"
private val INPUT_FORMATTER_PREFIX = "in_"
private val OUTPUT_FORMATTER_PREFIX = "out_"
val LOGGING_BLOB_KEY = "pipeCmdTool"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we moving this here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all other features, like vcf/bgen/plink read/write and variant normalizer transformer, the options which are defined inside the respective packages are used as keys for the maps in the logging json blobs. I do not want this one to be an exception and be defined outside of pipe transformer. This makes the logging code cleaner on the universe side as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can we move the transformer options to a higher-level object as we did for the datasource options for transparency?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can. But if ok with you let's do it in another PR because I want to get the universe PR review moving froward with data and observability teams and that depends on this PR. Moving options to higher level can be done independently.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, that works.


private def lookupInputFormatterFactory(name: String): Option[InputFormatterFactory] =
synchronized {
Expand Down