Add job source bucket to output path #1101

stephen-soltesz · 2022-08-02T00:07:33Z

Today, when we change the archive source bucket for a datatype in the gardener config, the output data mixes results from two different sources. We can work around this by deleting the bucket directory, but this is slow. Better for them to remain separate.

This change includes the job source bucket in the output path so that the gardener system preserves the separation between the two directories.

Both the parser and gardener must agree on this path. So, this change must be deployed with its companion in etl-gardener m-lab/etl-gardener#407

Part of:

Flexible configuration - specify output tables through configuration - versioned tables etl-gardener#349

This change is

coveralls · 2022-08-02T00:10:56Z

Pull Request Test Coverage Report for Build 7402

1 of 2 (50.0%) changed or added relevant lines in 2 files are covered.
2 unchanged lines in 1 file lost coverage.
Overall coverage increased (+0.04%) to 67.227%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
storage/rowwriter.go	0	1	0.0%

Files with Coverage Reduction	New Missed Lines	%
active/active.go	2	90.63%

Totals
Change from base Build 7397:	0.04%
Covered Lines:	3321
Relevant Lines:	4940

💛 - Coveralls

cristinaleonr

Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)

storage/rowwriter.go line 181 at r1 (raw file):

// Get implements factory.SinkFactory
func (sf *SinkFactory) Get(ctx context.Context, dp etl.DataPath) (row.Sink, etl.ProcessingError) {
	s, err := NewRowWriter(ctx, sf.client, sf.outputBucket, path.Join(dp.Bucket, dp.Path+".jsonl"))

Why are we changing this from json to jsonl?

stephen-soltesz

Reviewable status: 0 of 1 approvals obtained (waiting on @cristinaleonr)

storage/rowwriter.go line 181 at r1 (raw file):

Previously, cristinaleonr (Cristina Leon) wrote…

Why are we changing this from json to jsonl?

Because these are JSONL files. The original extension was a misnomer.

cristinaleonr

Reviewable status: complete! 1 of 1 approvals obtained (waiting on @cristinaleonr)

stephen-soltesz · 2022-08-02T17:05:16Z

Thank you!

stephen-soltesz added 3 commits July 29, 2022 20:55

Add bucket to output path

f929169

Include new output path

1f28a97

Include source bucket in output path

5f9891a

stephen-soltesz mentioned this pull request Aug 2, 2022

Add job source bucket to output path m-lab/etl-gardener#407

Merged

stephen-soltesz requested a review from cristinaleonr August 2, 2022 00:08

cristinaleonr reviewed Aug 2, 2022

View reviewed changes

stephen-soltesz commented Aug 2, 2022

View reviewed changes

cristinaleonr approved these changes Aug 2, 2022

View reviewed changes

stephen-soltesz merged commit 1be639d into master Aug 2, 2022

stephen-soltesz deleted the sandbox-soltesz-output-bucket branch August 2, 2022 17:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add job source bucket to output path #1101

Add job source bucket to output path #1101

stephen-soltesz commented Aug 2, 2022 •

edited

Loading

coveralls commented Aug 2, 2022

cristinaleonr left a comment

stephen-soltesz left a comment

cristinaleonr left a comment

stephen-soltesz commented Aug 2, 2022

Add job source bucket to output path #1101

Add job source bucket to output path #1101

Conversation

stephen-soltesz commented Aug 2, 2022 • edited Loading

coveralls commented Aug 2, 2022

Pull Request Test Coverage Report for Build 7402

💛 - Coveralls

cristinaleonr left a comment

Choose a reason for hiding this comment

stephen-soltesz left a comment

Choose a reason for hiding this comment

cristinaleonr left a comment

Choose a reason for hiding this comment

stephen-soltesz commented Aug 2, 2022

stephen-soltesz commented Aug 2, 2022 •

edited

Loading