Skip to content

Commit 1e05bea

Browse files
marvelshanalamb
andauthored
datafusion-cli: document reading partitioned parquet (#15505)
* datafusion-cli: document reading partitioned parquet * change .slt to the origin * docs: clarify usage and remove wildcard examples * Update docs/source/user-guide/cli/datasources.md Co-authored-by: Andrew Lamb <[email protected]> --------- Co-authored-by: Andrew Lamb <[email protected]>
1 parent 7b2d704 commit 1e05bea

File tree

1 file changed

+27
-10
lines changed

1 file changed

+27
-10
lines changed

docs/source/user-guide/cli/datasources.md

Lines changed: 27 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,7 @@ additional configuration options.
9595
# `CREATE EXTERNAL TABLE`
9696
9797
It is also possible to create a table backed by files or remote locations via
98-
`CREATE EXTERNAL TABLE` as shown below. Note that wildcards (e.g. `*`) are also
99-
supported
98+
`CREATE EXTERNAL TABLE` as shown below. Note that DataFusion does not support wildcards (e.g. `*`) in file paths; instead, specify the directory path directly to read all compatible files in that directory.
10099
101100
For example, to create a table `hits` backed by a local parquet file, use:
102101
@@ -126,6 +125,32 @@ select count(*) from hits;
126125
1 row in set. Query took 0.344 seconds.
127126
```
128127
128+
**Why Wildcards Are Not Supported**
129+
130+
Although wildcards (e.g., _.parquet or \*\*/_.parquet) may work for local filesystems in some cases, they are not officially supported by DataFusion. This is because wildcards are not universally applicable across all storage backends (e.g., S3, GCS). Instead, DataFusion expects the user to specify the directory path, and it will automatically read all compatible files within that directory.
131+
132+
For example, the following usage is not supported:
133+
134+
```sql
135+
CREATE EXTERNAL TABLE test (
136+
message TEXT,
137+
day DATE
138+
)
139+
STORED AS PARQUET
140+
LOCATION 'gs://bucket/*.parquet';
141+
```
142+
143+
Instead, you should use:
144+
145+
```sql
146+
CREATE EXTERNAL TABLE test (
147+
message TEXT,
148+
day DATE
149+
)
150+
STORED AS PARQUET
151+
LOCATION 'gs://bucket/my_table';
152+
```
153+
129154
# Formats
130155
131156
## Parquet
@@ -149,14 +174,6 @@ STORED AS PARQUET
149174
LOCATION '/mnt/nyctaxi/';
150175
```
151176
152-
Register a single folder parquet datasource by specifying a wildcard for files to read
153-
154-
```sql
155-
CREATE EXTERNAL TABLE taxi
156-
STORED AS PARQUET
157-
LOCATION '/mnt/nyctaxi/*.parquet';
158-
```
159-
160177
## CSV
161178
162179
DataFusion will infer the CSV schema automatically or you can provide it explicitly.

0 commit comments

Comments
 (0)