Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add job source bucket to output path #1101

Merged
merged 3 commits into from
Aug 2, 2022

Conversation

stephen-soltesz
Copy link
Contributor

@stephen-soltesz stephen-soltesz commented Aug 2, 2022

Today, when we change the archive source bucket for a datatype in the gardener config, the output data mixes results from two different sources. We can work around this by deleting the bucket directory, but this is slow. Better for them to remain separate.

This change includes the job source bucket in the output path so that the gardener system preserves the separation between the two directories.

Both the parser and gardener must agree on this path. So, this change must be deployed with its companion in etl-gardener m-lab/etl-gardener#407

Part of:


This change is Reviewable

@coveralls
Copy link
Collaborator

Pull Request Test Coverage Report for Build 7402

  • 1 of 2 (50.0%) changed or added relevant lines in 2 files are covered.
  • 2 unchanged lines in 1 file lost coverage.
  • Overall coverage increased (+0.04%) to 67.227%

Changes Missing Coverage Covered Lines Changed/Added Lines %
storage/rowwriter.go 0 1 0.0%
Files with Coverage Reduction New Missed Lines %
active/active.go 2 90.63%
Totals Coverage Status
Change from base Build 7397: 0.04%
Covered Lines: 3321
Relevant Lines: 4940

💛 - Coveralls

Copy link
Contributor

@cristinaleonr cristinaleonr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @stephen-soltesz)


storage/rowwriter.go line 181 at r1 (raw file):

// Get implements factory.SinkFactory
func (sf *SinkFactory) Get(ctx context.Context, dp etl.DataPath) (row.Sink, etl.ProcessingError) {
	s, err := NewRowWriter(ctx, sf.client, sf.outputBucket, path.Join(dp.Bucket, dp.Path+".jsonl"))

Why are we changing this from json to jsonl?

Copy link
Contributor Author

@stephen-soltesz stephen-soltesz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 1 approvals obtained (waiting on @cristinaleonr)


storage/rowwriter.go line 181 at r1 (raw file):

Previously, cristinaleonr (Cristina Leon) wrote…

Why are we changing this from json to jsonl?

Because these are JSONL files. The original extension was a misnomer.

Copy link
Contributor

@cristinaleonr cristinaleonr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewable status: :shipit: complete! 1 of 1 approvals obtained (waiting on @cristinaleonr)

@stephen-soltesz
Copy link
Contributor Author

Thank you!

@stephen-soltesz stephen-soltesz merged commit 1be639d into master Aug 2, 2022
@stephen-soltesz stephen-soltesz deleted the sandbox-soltesz-output-bucket branch August 2, 2022 17:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants