Skip to content

{tools}[foss/2024a] modin v0.32.0#21667

Merged
smoors merged 19 commits intoeasybuilders:developfrom
lcniel:20241015192634_new_pr_modin0320
Nov 3, 2024
Merged

{tools}[foss/2024a] modin v0.32.0#21667
smoors merged 19 commits intoeasybuilders:developfrom
lcniel:20241015192634_new_pr_modin0320

Conversation

@lcniel
Copy link
Copy Markdown
Contributor

@lcniel lcniel commented Oct 15, 2024

@Micket Micket added the new label Oct 16, 2024
@Micket Micket added this to the release after 4.9.4 milestone Oct 17, 2024
'checksums': ['9f68557add5e92617d006eb98bfd93d298e7d4d9932a0cd48a5e38ee4f30a134'],
}),
('unidist', '0.7.1', {
'source_tmpl': '%(namelower)s-%(version)s-cp312-cp312-manylinux_2_17_%(arch)s.manylinux2014_%(arch)s.whl',
Copy link
Copy Markdown
Contributor

@Micket Micket Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to prefer the wheel for this one? We usually go for the source.

I was also tempted to suggest breaking this out into it's own package, but i see so far modin is the only usecase, so maybe not..? Things might change though, so i wouldn't be against putting this into it's own thing, and just have modin depend on it instead of mpi+dask+ray
https://unidist.readthedocs.io/en/latest/using_unidist/index.html#libraries-powered-by-unidist

I'm also confused as to why we need the extras here specific to mpi? it supposed dask, ray, mpi, and we have all of those as dependencies, so presumably, this should be used here?

unidist automatically detects which execution backends are installed and uses that for scheduling computation.

wouldn't mpi4py be enough then?

Copy link
Copy Markdown
Contributor Author

@lcniel lcniel Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was some error related to copying/finding files during Cython compilation when I tried to build from source. I have absolutely no idea how one would go about solving it, it was trying to do some "include source_file.cpp" thing and not finding source_file.cpp even though that's clearly in that directory...

About the pip install extras, I'm just following the requirements in the modin installation files which specify "unidist[mpi]". I guess that only amounts to a dependency check?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Modin basically supports all three frameworks, but unidist seems to be specifically for MPI, from their setup.py:

https://github.com/modin-project/modin/blob/main/setup.py

Their readme also says that you must set both MODIN_BACKEND and UNIDIST_BACKEND to MPI if you use unidist. So it's a bit of a special thing.

Comment on lines +41 to +42
sanity_check_commands = ['python -c "import modin"']

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already done by default for all extensions

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@Micket
Copy link
Copy Markdown
Contributor

Micket commented Oct 17, 2024

For reference: issue with builds stem from modin-project/unidist#428

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera22-2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/f74cb8d96277931b4193d3a7730cdab8 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera22-2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/7a32ad8e9219a9e596effbc33aa7327b for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera22-2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/d21bc09a2b697ad1dea2d4b1b778d3b2 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera30-1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/1f0584fa312034f8d6e0375c87ffcf3c for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera22-2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/36ad19489e639867276b780859a4c341 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera22-2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/3affc0936894a0a8886a15f680c70860 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera22-2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/42e4dcc36028e86dbb5757369085c4eb for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera41-3 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/8252ad97f9e01975f1ffca8a83373968 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

OK, the failure now is from ray-project/ray#7724 when running the test suite.

Long story short, ray relies on AF_UNIX sockets which have hard limits on path length, so it cannot handle complicated tempdir paths. It is a baffling design decision to build the software in this way, because the only way to fix this temporarily would be to overwrite TMPDIR with some short path during the test, which can hardly be done dynamically within an easyconfig. So I'm going to have to remove the test for the Ray backend.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera41-3 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/b8e1f6677fbaa0f0f7b69922321a9fce for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera41-3 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/0f959a8bb68e495f216171bbcf223ebe for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera26-2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/e6158ec92c422e1a4486e77901b79c19 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
vera11-2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/cc157d6cf1055fdb0baf2954421460b1 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera12-3 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/5f214d8c2a3bb1461f14adacbc5592d8 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
vera12-3 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/55c5d3390995dee49d2255e3f0e571e5 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

OK, so my conclusion is that Unidist does not work at the moment, I tried it in several contexts and it just gets stuck. Dask works fine, Ray works if you don't have a complicated tempdir path specified.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 17, 2024

Test report by @lcniel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
vera18-4 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/3843b0b52d22f769aaf7505abbf96a7d for a full test report.

use_pip = True

exts_list = [
# modin has another backend, "unidist", for MPI, but it seems broken currently.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a bit more info why it's broken?
what's the error message?
maybe open a github issue that you can link to?

Copy link
Copy Markdown
Contributor Author

@lcniel lcniel Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no error message, trying to build it and running tests with unidist as the backend, or even just importing it and trying to use anything, e.g. modin.pandas.DataFrame(), results in all your nodes running some kind of unidist init code and firing up all the nodes to 50%. I could not find any documentation of this behaviour.

I have tried the usual MPI remedies to attempt to debug it but I can't get it to work at all. Not sure if anyone else wants to dig into it, but I have no idea how one would go about debugging it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to add a little more info, but there is really not much that I can say about it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I found a way to resolve it. Maybe!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@smoors I think it's fixed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great, thanks a lot!

@YarShev
Copy link
Copy Markdown

YarShev commented Oct 21, 2024

For reference: issue with builds stem from modin-project/unidist#428

We just released unidist 0.7.2, which adds missing files in the PyPI tarball. Thanks for raising the issue.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 21, 2024

Test report by @lcniel
FAILED
Build succeeded for 0 out of 1 (1 easyconfigs in total)
vera02-2 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz, Python 3.6.8
See https://gist.github.com/lcniel/6c4866a015654cf2b7c705ee97a8d5a7 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 22, 2024

Test report by @lcniel
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
vera-r08-05 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz, Python 3.6.8
See https://gist.github.com/lcniel/1b461fec19725db550a6ffccb960b683 for a full test report.

@lcniel
Copy link
Copy Markdown
Contributor Author

lcniel commented Oct 22, 2024

OK, I figured out a way to make unidist work by trial and error as well as looking at the way tests are run in the unidist repo: https://github.com/modin-project/unidist/blob/master/.github/workflows/ci.yml

Looks a little bit hackish to me (some implementation choices like using multiprocessing.cpu_count() to get the number of tasks are a bit questionable as well) but I guess they wrote it that way for a reason.

I also had to change UCX_TLS: export UCX_TLS=dc,ud,sm,self, but I didn't put this in the easyconfig as it might be related to our local configuration.

@boegel
Copy link
Copy Markdown
Member

boegel commented Oct 25, 2024

@boegelbot please test @ generoso

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on login1

PR test command 'EB_PR=21667 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs /opt/software/slurm/bin/sbatch --job-name test_PR_21667 --ntasks=4 ~/boegelbot/eb_from_pr_upload_generoso.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 14577

Test results coming soon (I hope)...

Details

- notification for comment with ID 2437725407 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
cns1 - Linux Rocky Linux 8.9, x86_64, Intel(R) Xeon(R) CPU E5-2667 v3 @ 3.20GHz (haswell), Python 3.6.8
See https://gist.github.com/boegelbot/27f94917c02f28e29ef68bf4df815ee6 for a full test report.

@boegel
Copy link
Copy Markdown
Member

boegel commented Oct 25, 2024

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Copy Markdown
Collaborator

@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=21667 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_21667 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5161

Test results coming soon (I hope)...

Details

- notification for comment with ID 2437760280 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.4, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.18
See https://gist.github.com/boegelbot/3a4a1f24a458df85c49fd3778011d6ec for a full test report.

@lcniel lcniel requested review from Micket and smoors October 28, 2024 21:59
Copy link
Copy Markdown
Contributor

@smoors smoors left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@smoors
Copy link
Copy Markdown
Contributor

smoors commented Nov 3, 2024

Going in, thanks @lcniel!

@smoors smoors merged commit d99fd3d into easybuilders:develop Nov 3, 2024
@boegel boegel modified the milestones: release after 4.9.4, 5.0.0 Mar 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants