-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add job source bucket to output path #1101
Conversation
Pull Request Test Coverage Report for Build 7402
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)
storage/rowwriter.go
line 181 at r1 (raw file):
// Get implements factory.SinkFactory func (sf *SinkFactory) Get(ctx context.Context, dp etl.DataPath) (row.Sink, etl.ProcessingError) { s, err := NewRowWriter(ctx, sf.client, sf.outputBucket, path.Join(dp.Bucket, dp.Path+".jsonl"))
Why are we changing this from json
to jsonl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 1 approvals obtained (waiting on @cristinaleonr)
storage/rowwriter.go
line 181 at r1 (raw file):
Previously, cristinaleonr (Cristina Leon) wrote…
Why are we changing this from
json
tojsonl
?
Because these are JSONL files. The original extension was a misnomer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status:
complete! 1 of 1 approvals obtained (waiting on @cristinaleonr)
Thank you! |
Today, when we change the archive source bucket for a datatype in the gardener config, the output data mixes results from two different sources. We can work around this by deleting the bucket directory, but this is slow. Better for them to remain separate.
This change includes the job source bucket in the output path so that the gardener system preserves the separation between the two directories.
Both the parser and gardener must agree on this path. So, this change must be deployed with its companion in etl-gardener m-lab/etl-gardener#407
Part of:
This change is![Reviewable](https://camo.githubusercontent.com/1541c4039185914e83657d3683ec25920c672c6c5c7ab4240ee7bff601adec0b/68747470733a2f2f72657669657761626c652e696f2f7265766965775f627574746f6e2e737667)