Skip to content

Handle create page source for Druid segment on s3#14922

Merged
zhenxiao merged 1 commit intoprestodb:masterfrom
jinyangli34:jinyang-fix_druid_segment_on_s3
Aug 1, 2020
Merged

Handle create page source for Druid segment on s3#14922
zhenxiao merged 1 commit intoprestodb:masterfrom
jinyangli34:jinyang-fix_druid_segment_on_s3

Conversation

@jinyangli34
Copy link
Contributor

Please make sure your submission complies with our Development, Formatting, and Commit Message guidelines.

Fill in the release notes towards the bottom of the PR description.
See Release Notes Guidelines for details.

== RELEASE NOTES ==

Druid Changes
* Fix errors when reading Druid segment file on S3

To address: #14921

@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jul 29, 2020

CLA Check
The committers are authorized under a signed CLA.

  • ✅ jinyangli34 (0e1a62155e3e66d3d6868a9d5170272fc1e47559)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything else we need escaping in the uri? something like '?' and '='.
I'm thinking if we can use any existing '%' encoding function to do this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to URLEncoder to process special chars in general.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extract s3a, s3n, S2Schema and etc. as constants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to throw a PrestoException(DRUID_METADATA_ERROR`, ...)? also for line 144

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

@mbasmanova mbasmanova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to add a test?

Copy link
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me except a couple of minor issue. It would enable the scanning on s3 and gcs.
Nice contribution, thank you @jinyangli34 !
@zhenxiao what do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any chance to switch by the enum you just defined -- DeepStorageType ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any document or github issue we can link here rather than a piece of source code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm removing the link as current code looks self-explained after clean up. Let me know if you prefer a github issue link here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I think we can just remove the link. Thanks!

Copy link
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice work, @jinyangli34
mostly good, only a few minor issues

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/S3_ZIP/S3/g
zip is the file format, we just use S3 for storage type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated enum variable name, while keeping the type name as defined from Druid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/GOOGLE/GCS/g

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsupported segment filesystem

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove this blank line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Collaborator

@zhenxiao zhenxiao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@jinyangli34 could you please merge the 3 commits into 1, and rerun the test?

@jinyangli34 jinyangli34 force-pushed the jinyang-fix_druid_segment_on_s3 branch from e8ebd9c to 680a43f Compare August 1, 2020 00:17
@jinyangli34
Copy link
Contributor Author

@zhenxiao rebased commits and verified with segments on S3.

@zhenxiao zhenxiao merged commit a4f14e4 into prestodb:master Aug 1, 2020
@caithagoras caithagoras mentioned this pull request Aug 14, 2020
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants