-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of memory when sorting #157
Comments
I filed an issue against the core DataFusion repo - apache/datafusion#5108 |
@djouallah please check some advices in apache/datafusion#5108 on how to enable memory pool with spilling, to use only allocated memory instead of all available |
We need to expose the memory/disk config in the Python bindings so that they can be set when creating the SessionContext. Here is Rust code for reference. I will try and look at this over the weekend unless someone beats me to it. let runtime_config = crate::execution::runtime_env::RuntimeConfig::new()
.with_memory_pool(Arc::new(crate::execution::memory_pool::GreedyMemoryPool::new(1024*1024*1024)))
.with_disk_manager(crate::execution::disk_manager::DiskManagerConfig::new_specified(vec!["/Users/a/spill/".into()]));
let runtime = Arc::new(crate::execution::runtime_env::RuntimeEnv::new(runtime_config).unwrap());
let ctx = SessionContext::with_config_rt(SessionConfig::new(), runtime); |
@djouallah Hopefully this helps: #165 |
Describe the bug
try a sort and export a parquet file using Colab generate an Out of memory error
To Reproduce
Expected behavior
I expected to use only the available memory
here is the link comparing the same using Polars and DuckDB
https://colab.research.google.com/drive/1pfAPpIG7jpvGB_aHj-PXX66vRaRT0xlj#scrollTo=O8-lyg1y6RT2
The text was updated successfully, but these errors were encountered: