-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Select count(*) fails on big amount of data in spark #58
Comments
Can you please provide the output of the following commands: Please note that Aggregate push down is not available in hdfs_fdw 2.0.3. |
explain extended select count(*) from test; == Analyzed Logical Plan == == Optimized Logical Plan == == Physical Plan == EXPLAIN VERBOSE select count(*) from spark_table ;
|
In case of beeline a map-reduce job will be initiated for doing hash aggregate, in hdfs_fdw case all rows would first get selected which triggers OOM error. This will work when we provide support for pushing down aggregates to the spark/hive server in the remote query. |
Thanks, are you going to provide this support soon or not? |
I tested hdfs_fdw with spark. In spark i created table from local file with 100M rows. In spark beeline i count with select count(*),but pg server get an error oom, and spark thift server fails too. Pg 9.6, spark 2.2.0, hdfs_fwd 2.0.3
The text was updated successfully, but these errors were encountered: