Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support connecting to local s3 object stores in datafusion-cli #10072

Closed
alamb opened this issue Apr 13, 2024 · 2 comments · Fixed by #10080
Closed

Support connecting to local s3 object stores in datafusion-cli #10072

alamb opened this issue Apr 13, 2024 · 2 comments · Fixed by #10080
Labels
enhancement New feature or request

Comments

@alamb
Copy link
Contributor

alamb commented Apr 13, 2024

Is your feature request related to a problem or challenge?

I am trying to use Sprox locally to query parquet files

Sprox currently proxies requests to an actual S3 instance or local file cache.

I would like to be able to create an EXTERNAL table to read from this instance. Here is how it works in DuckDB:

CREATE SECRET (
    TYPE S3,
    PROVIDER CREDENTIAL_CHAIN,
    ENDPOINT 'localhost:8080',
    USE_SSL false,
    URL_STYLE path
);

select * from read_parquet('s3://sprox/sample.parquet');

Describe the solution you'd like

I would like to do something like this in datafusion-cli:

-- Create external table
CREATE EXTERNAL TABLE sample
STORED AS PARQUET
OPTIONS(
    'aws.access_key_id' 'A',
    'aws.secret_access_key' 'B',
    'aws.endpoint' 'http://localhost:8080',
)
LOCATION 's3://sprox/sample.parquet';

When I run that today here is the error I get

datafusion-cli -f sprox.sql
DataFusion CLI v37.0.0
Internal error: Config value "" not found on AwsOptions.
This was likely caused by a bug in DataFusion's code and we would welcome that you file an bug report in our issue tracker
Error during planning: table 'datafusion.public.sample' not found

I think this particular error is related to the fact that the config provider doesn't check for aws.endpoint. However, even once I fixed that locally I still couldn't make the external table -- I get an error about scheme not allowed.

Describe alternatives you've considered

Note you can do this workflow using environment variables

$ (venv) andrewlamb@Andrews-MacBook-Pro:~/Software/arrow-datafusion2/datafusion-cli$ AWS_ALLOW_HTTP=true AWS_ACCESS_KEY_ID=A AWS_SECRET_ACCESS_KEY=B AWS_ENDPOINT=http://localhost:8080  datafusion-cli
DataFusion CLI v37.0.0
> CREATE EXTERNAL TABLE sample
STORED AS PARQUET
LOCATION 's3://sprox/sample.parquet';
0 row(s) fetched.
Elapsed 2.266 seconds.

Additional context

No response

@alamb alamb added the enhancement New feature or request label Apr 13, 2024
@Lordworms
Copy link
Contributor

Lordworms commented Apr 14, 2024

take this one since I've been doing a related one #9964

@alamb
Copy link
Contributor Author

alamb commented Apr 14, 2024

Thanks @Lordworms -- I actually have already made a PR for this one, I just didn't have a chance to push it yet. Sorry about that

Update: #10080

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants