-
-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: improve build script; enable openmp #67
base: master
Are you sure you want to change the base?
Conversation
TIL: trailing comma for **kwargs is not allowed in old python versions
Thanks for your interest in making the project faster! It will be interesting to see if we can enable multiprocessing without breaking the project’s ability to build on some platforms. (Like: what if folks are using clang instead of gcc for their Python? Would the option still work? What about clang on macOS? And so forth.)
Good question! Check out the
Parallel processing would only work with n satellites. It cannot work, so far as I can see, with one satellite and m times, because the process of generating a position updates (I think?) internal fields in the |
Good point. After all, this is what this PR is about. I should have mentioned that the following platforms should be supported by the new method (needs testing of course):
I didn't think about other architectures though. (Is ARM something you support?) However, as you define the C extension to be optional at this point, it should even install (the slower Python implementation), if the compilation fails for some reason.
Thanks for the hint. I wasn't aware that there is a special release branch. What would be the preferred way to test this PR against the wheel building? If I make the PR against the release branch, the travis VM would try to upload wheels (if I'm not mistaken) - which is for sure not appreciated by you 😉. Shall I add some other architectures to the
That is correct. I overlooked the |
Turns out, I was mistaken. Travis does not expose secrets to PRs from forks. |
A little update. I worked on the The good news is that it works on both, Linux and Windows, without problems. I encountered some issues with the tests though (I guess, these slipped through because you only test against Linux). One is that on Windows the line ending is if 'xx' in actual_line:
similar = (actual_line == expected_line) should be replaced with if 'xx' in actual_line:
similar = (actual_line.strip() == expected_line.strip()) Another issue is with the accuracy of some tests, see e.g., the output of the Linux test suite. Perhaps one should compare float results up to a given accuracy? I often use Note also that both, clang and LLVM, complain about linking against a wrong architecture (X64 vs. i386). I'm not sure, if this problem is caused by my changes or was hidden in your own travis logs. I added env:
global:
- CIBW_TEST_COMMAND="python -m sgp4.tests"
- CIBW_BUILD_VERBOSITY=1
- CIBW_BUILD="cp36-*" # can restrict to certain python versions for testing in which |
Neat, I don’t remember seeing This week I will hopefully have time to take a look at your branch and read over its I wonder if I should think about switching to cibuildwheel over on the main branch, just to run tests? It seemed very heavyweight compared to the current |
My experience with this is that CI/CD issues take a lot of the development time. Even more so, if one has a lot of dependencies ( which is not really the case for |
There is an interesting conversation and a possible workaround in a different project , FYI in case it is helpful. |
@bwinkel — Since we last discussed CI, I've simplified the approach ("release" is now simply a tag that lives somewhere on the master branch), and have migrated away from Travis CI and over to GitHub Actions. If I recall, you found that turning this compile option on didn't actually seem to make the Linux-built library start using multiple cores? Were you ever able to get behavior that felt multi-core? What are your experiences around libraries that use multiple cores without users asking? Someone opened an issue on Skyfield the other day because, without the user's asking, it tried to use all their CPU cores, such that (a) the program didn't really go any faster, but (b) everything else on the system slowed down. It turns out that it was NumPy's fault, and that needing to turn off its assault on multi-CPU machines is common: https://stackoverflow.com/questions/30791550/limit-number-of-threads-in-numpy Given that experience, I wonder if multiple CPUs is something that users should have to ask for, rather than something that the library would attempt unilaterally without asking? Do you have an opinion either way? |
I'm probably the wrong person to ask, because most of my own packages are much less used and few people would get back to me. Personally, I'm fine if a package - especially, if it does number crunching such as numpy - is using all resources it can get. For GUI software, or stuff that is running in the "background" (e.g., realtime displays), I would agree though that a single-core use can be better. Although in my packages (number-crunching), I use all the resources that I can get by default, I added a function to change the number of OMP cores, which are used, e.g. in cysgp4. That only works with Cython's |
Thanks for sharing your thoughts! When I finally have time to circle back and do more investigation here, I'll specifically look into how the OpenMP pragmas might be able to be turned on or off, or at runtime parameterized with a number of CPUs. Maybe after figuring out why large arrays aren't any faster when it's turned on. |
@bwinkel Why to hardcode |
@bwinkel — I have finally found a bit of time to return to sgp4 and look over its open issues and pull requests. I'm still undecided on whether sgp4 should hammer user CPUs by default by turning on OpenMP, but I wanted to check in about the status of this pull request. Do we know whether there's a way for tests to know if a computation was done in parallel? If we do add compile options like this, it would be nice for the test suite to be able to verify that the compile options were, in fact, producing in-parallel computations when array operations take place. |
Honestly, I have no clue how to check this in a test, especially, if this should work platform independent. Regarding "hammering user CPUs", one could compile with MP support, but set the number of cores to be used to 1 by default and expose a function to the user to increase the number during runtime. With Cython one has several options to achieve this, which might give you some ideas. |
Hi,
I use
sgp4
in some of my projects and realized that the new accelerated wrapper is OpenMP enabled. However, at least for the conda packages, which I usually use, parallelization seems to be inactive.I noticed in your
setup.py
that the OpenMP associated compiler flag is commented out and that there is a comment, which asks whether there would be a way to find proper compiler arguments for the different platforms. In several of my own Cython projects, I'm doing this based on theplatform
module from the standard lib. This PR is a work-in-progress proposal to add OpenMP support in yoursetup.py
.Unfortunately, I could not test, if it really works on all platforms, as I have no idea how you build, e.g., your PyPI wheels. It doesn't seem to happen in your
.travis.yml
- do you do that manually?I should also add, that on my Linux machine, the compiler flag properly works (the warning during build time, that the pragma omp is ignored went away), but testing with a large array does still seem to be bound to a single CPU core. Maybe I'm missing something?
In any case, perhaps we can use this as a base to improve OpenMP handling in sgp4.