You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for creating this amazing library. Is there any way to use the library for PySpark and SQL instead of Pandas?
The text was updated successfully, but these errors were encountered:
nithinreddyy
changed the title
This library is amazing. Is there any way to use the library for PySpark instead of Pandas?
This library is amazing. Is there any way to use the library for PySpark and SQL instead of Pandas?
Jan 17, 2023
Right now this relies on being able to run aggregations quickly to summarize the data to add to the prompt, so it only really works if the data is in memory.
For things like dask, pyspark, modin, etc (remote data, likely 'big' data): this would require updating the aggregation code. This is theoretically possible (datasketch aggregations that this is intended to be working off of are O(N), parallelizable and mergable) That said, this doesn't support this right now.
For systems like "SQL" (eg. remote databases: snowflake, clickhouse, postgres, sqlite) this cannot be directly used right now without downloading the table first. (eg. can use pd.read_sql
Hello Team,
Thanks for creating this amazing library. Is there any way to use the library for PySpark and SQL instead of Pandas?
The text was updated successfully, but these errors were encountered: