Skip to content

Conversation

@electrum
Copy link
Contributor

This is a new proof-of-concept S3 FileSystem implementation that uses the AWS SDK. It passes the basic unit tests but has not been tested otherwise.

@zhenxiao
Copy link
Collaborator

Hi David,

This is great for Presto working on S3. Relieved the packaging work.

I tried this on a 11 node cluster, it is running OK, Just have a few questions:

#1. Still could not figure out a way to disable logging, extensive logging made Presto's performance on S3 not so good, compared with reading from HDFS. Here is my etc/log.properties:

com.facebook.presto=DEBUG
org.apache.hadoop=WARN
org.apache.http.wire=WARN
org.apache.http.headers=WARN
org.jets3t.service=WARN

But, still I get tens of:

2014-01-30T22:13:03.945+0000 INFO 20140130_221259_00004_56i98.1.6-0-81 stdout 22:13:03.945 [20140130_221259_00004_56i98.1.6-0-81] DEBUG org.apache.http.wire - << "[0xbd][0x14]5[0xe2]b[0xe6][0xf3][0xc9][0x9]s[0xbc][0x9a][0xd9][0xb6][0xa4]y9[0xc9][0x7][0xdf][0xe9][0x13][0xc7][0xfd][0xa2][0x13][0x88][0xfe][0xfa][0xbf]z[0x9][0xea][0xcc][0x7][0xd3][0x96][0x11][0xd0][0xa6]A7[0x1d]q[0xac][0xb5]`[0x83][0x2][0x84])h[0xca]rq[0xde][0xa9]3[0xa0]Yo[\n]"

2014-01-30T22:13:03.946+0000 INFO 20140130_221259_00004_56i98.1.6-0-81 stdout 22:13:03.946 [20140130_221259_00004_56i98.1.6-0-81] DEBUG org.apache.http.wire - << "[0xa4][0xda]=[0x18][0x95]gb[0xbd]2[0xc0][0xe4]T[0xc7][0xc8][0xc9][0x6]][0xc3][0x1c]1:[0x9b]:[0x9f]w`[0xaf]K[0x9d]cp[0x2][0xb4][\r][0x8c]?)[0xf5]<[0x1b][0xcb][0xa8]u[0x1a]+I[0xbb][0xf]:

Which not only made Presto running slow, but also consumed lots of disk space. Do you have any hints about how to disable the log(as how Presto is running on HDFS)?

#2. The httpclient deadlock problem is still there. I found a way to get it fixed. Will post my patch following this one.

This is really great work to start trying Presto on S3. I tried some experiments(not using this patch, but using emr's hadoop jars), and it showed that Presto on S3 could be as fast as Presto on HDFS. I think we should put all S3 related work in this packaging structure, and make its performance as fast as using emr's hadoop.

Thanks,
Zhenxiao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a bug we should pull out into a separate commit?

@dain
Copy link
Contributor

dain commented Jan 31, 2014

This seem like a reasonable start. Looks good

@electrum
Copy link
Contributor Author

Updated to address review comments. Will push after release.

@electrum electrum merged commit 1541c8d into prestodb:master Feb 13, 2014
@electrum electrum deleted the s3 branch February 13, 2014 05:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants