Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storehaus-s3 #231

Open
alexanderdean opened this issue Apr 22, 2014 · 7 comments
Open

storehaus-s3 #231

alexanderdean opened this issue Apr 22, 2014 · 7 comments

Comments

@alexanderdean
Copy link

Sometimes I like to blur my eyes and think of Amazon S3 as a key-value store. If I added storehaus-s3 to this project, would anyone object?

I was thinking:

  • bucketName as a piece of configuration, equivalent to columnFamily for HBase
  • key maps to S3 object key, aka "folder path" plus filename
  • store the value as a string in a UTF-8 file with mimetype text/plain
  • implement using http://www.jets3t.org/
@rubanm
Copy link
Contributor

rubanm commented Apr 23, 2014

@alexanderdean Sounds good to me. S3 is indeed a kv store I think, with blobs as values.

Not completely familiar with this -- but why not to use Amazon's own java sdk instead of jets3t?

@alexanderdean
Copy link
Author

Agreed - let's use Amazon's own Java SDK to keep dependencies down. I will have a go at this soon.

@rubanm
Copy link
Contributor

rubanm commented Apr 23, 2014

Great :) I'm also curious to learn what use-case you have in mind.

@alexanderdean
Copy link
Author

I'm thinking of using it as the configuration layer for the Snowplow Scalding + Kinesis flows. Currently we push config into Scalding as command line args [1], and Kinesis config through a HOCON config file. I'm thinking of harmonizing on a S3 path holding the config, then use storehaus-s3 to pull down the config. Using storehaus means it should be relatively simple to alternatively store the application config in DynamoDB et al later too.

[1] https://github.com/snowplow/snowplow/blob/master/3-enrich/scala-hadoop-enrich/src/main/scala/com.snowplowanalytics.snowplow.enrich.hadoop/EtlJob.scala#L130

@cfregly
Copy link

cfregly commented Apr 23, 2014

hey guys-

keep an eye on the Amazon license agreements. For the AWS Java SDK, i
think you're ok as it looks like the SDK is covered under the Apacke 2.0
license: https://github.com/aws/aws-sdk-java/blob/master/LICENSE.txt

we're dealing with this on the Spark project - specifically for the AWS
Kinesis Client Library from the AWS Labs repo.

we're doing some special package/build work to support the Amazon-specific
license:
https://github.com/awslabs/amazon-kinesis-client/blob/master/LICENSE.txt

here's the relevant Apache JIRA in case you're interested:
https://issues.apache.org/jira/browse/LEGAL-198

again, i think you're ok, but something to keep in mind.

thanks!

-chris

On Wed, Apr 23, 2014 at 11:41 AM, Alexander Dean
[email protected]:

I'm thinking of using it as the configuration layer for the Snowplow
Scalding + Kinesis flows. Currently we push config into Scalding as command
line args [1], and Kinesis config through a HOCON config file. I'm thinking
of harmonizing on a S3 path holding the config, then use storehaus-s3 to
pull down the config. Using storehaus means it should be relatively simple
to alternatively store the application config in DynamoDB et al later too.

[1]
https://github.com/snowplow/snowplow/blob/master/3-enrich/scala-hadoop-enrich/src/main/scala/com.snowplowanalytics.snowplow.enrich.hadoop/EtlJob.scala#L130


Reply to this email directly or view it on GitHubhttps://github.com//issues/231#issuecomment-41198712
.

@rubanm
Copy link
Contributor

rubanm commented Apr 23, 2014

@cfregly Thanks for the note.

@caniszczyk Does AWS Java SDK look okay to you from a license perspective? Thanks.

@alexanderdean
Copy link
Author

Thanks @cfregly - yes I've been following that on the Spark side of things...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants