Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for reading data from stdin for datafusion-cli #9430

Open
13minutes-yt opened this issue Mar 2, 2024 · 5 comments
Open

support for reading data from stdin for datafusion-cli #9430

13minutes-yt opened this issue Mar 2, 2024 · 5 comments
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@13minutes-yt
Copy link

13minutes-yt commented Mar 2, 2024

Support for reading from stdin or memory would really help in the usage of linux pipes and other solutions whereby datafusion will be chosen as an option rather than choosing other binaries. #9409 (comment)

@13minutes-yt 13minutes-yt changed the title I don't think you are missing anything -- it does not have support for reading from stdin. I think it would be a good feature. support for reading from stdin Mar 2, 2024
@alamb alamb changed the title support for reading from stdin support for reading data from stdin for datafusion-cli Mar 2, 2024
@alamb alamb added help wanted Extra attention is needed enhancement New feature or request labels Mar 2, 2024
@alamb
Copy link
Contributor

alamb commented Mar 2, 2024

Note there are some good examples on #9409

@SteveLauC
Copy link
Contributor

Copy my comment here:

I tried this with JSON:

$ cat sql
create external table test stored as json location '/dev/stdin';
select * from test;

$ cat 1.json
{ "age": 2, "name": "steve" }

$ cat 1.json |datafusion-cli -f sql
DataFusion CLI v36.0.0
0 rows in set. Query took 0.001 seconds.

0 rows in set. Query took 0.000 seconds.

It won't give you error, though it won't print the rows as well..

For parquet, datafusion-cli gives you an error because the file size of a pipe is 0:

$ cat src/main.rs
fn main() {
    let stat = nix::sys::stat::fstat(0).unwrap();
    println!("{}", stat.st_size);
}

$ cargo b
$ cat 1.json | ./target/debug/rust
0
$ cat sql
create external table test stored as parquet location '/dev/stdin';
select * from test;

$ pqrs cat 2.parquet

###############
File: 2.parquet
###############

{Age: 2}

$ cat 2.parquet | datafusion-cli -f sql
DataFusion CLI v36.0.0
Execution error: file size of 0 is less than footer
Error during planning: table 'datafusion.public.test' not found

For CSV, it gives you an error Object Store error: Generic LocalFileSystem error: Error seeking file /dev/stdin: Illegal seek (os error 29) because pipe is not seekable:

$ cat sql
create external table test stored as csv location '/dev/stdin';
select * from test;

$ cat username.csv
Username; Identifier;First name;Last name
booker12;9012;Rachel;Booker
grey07;2070;Laura;Grey
johnson81;4081;Craig;Johnson
jenkins46;9346;Mary;Jenkins
smith79;5079;Jamie;Smith


$ cat username.csv | datafusion-cli -f sql
DataFusion CLI v36.0.0
Object Store error: Generic LocalFileSystem error: Error seeking file /dev/stdin: Illegal seek (os error 29)
Error during planning: table 'datafusion.public.test' not found

I am interested in this, but not quite sure if I can make it as fixing this needs to work with those arrow-xx crates, but anyway.

I think the first thing I am gonna do is to take a look at the above JSON example, and figure out why it does not print the data even on success.

@SteveLauC
Copy link
Contributor

take

@alamb
Copy link
Contributor

alamb commented Mar 3, 2024

I am interested in this, but not quite sure if I can make it as fixing this needs to work with those arrow-xx crates, but anyway.

You may be able to something like implement a StdinObjectStore and register it in datafusion-cli

Then there is a separate question of handling urls like /dev/stdin 🤔

@SteveLauC
Copy link
Contributor

You may be able to something like implement a StdinObjectStore and register it in datafusion-cli

Then there is a separate question of handling urls like /dev/stdin 🤔

Thanks for the tip, will give it a think:)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants