You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please write here what feature pandarallel is missing: Would like to control the number of workers being generation without touching the code for different machines. Example: A new pandas API which is not (yet) supported by pandarallel.
The text was updated successfully, but these errors were encountered:
I'd prefer to keep the code in the core package minimal to make things easier to maintain.
Wouldn't you be able to maintain the same functionality by reading in the environment variables with os.environ and passing them to pandarallel.initialize?
IMO this is a wontfix issue, unless a compelling reason is given.
One of the tools I use, that uses pandarallel, fails consistently in a cluster environment with out-of-memory errors.
According to vladr on SO:
Memory-wise, we already know that subprocess.Popen uses fork/clone under the hood, meaning that every time you call it you're requesting once more as much memory as Python is already eating up, i.e. in the hundreds of additional MB, all in order to then exec a puny 10kB executable such as free or ps. In the case of an unfavourable overcommit policy, you'll soon see ENOMEM.
This wouldn't be a problem in the general case, but overcommiting memory is disabled on the cluster. Since the cluster comes with a lot of cores this easily eats up the entire RAM, even for processes that would be fine with 10GB of memory.
I've ran the tool with the exact same commands on a working machine with a few cores and overcommiting enabled and it worked fine.
If I could just limit the number of workers/subprocesses this problem wouldn't occur.
Edit: Also note that I cannot just edit pandarallel.initialize since I'm using the code from someone else.
Please write here what feature
pandarallel
is missing: Would like to control the number of workers being generation without touching the code for different machines.Example: A new
pandas
API which is not (yet) supported bypandarallel
.The text was updated successfully, but these errors were encountered: