-
Notifications
You must be signed in to change notification settings - Fork 70
Generalize netcdf build systems #136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Generalizes compiler selection based on compiler used to build netcdf library
|
cc @mgduda, thanks for all the help last time. This PR is in response to challenges @josephzhang8 had getting these tools to compile. A companion PR should provide a way to build netcdf in a consistent manner although https://github.com/pwolfram/homebrew-mpas and https://github.com/pwolfram/mpas-conda-recipes/tree/parallel_netcdf suggest some ways to do this as tested on edison (thanks @xylar and @mark-petersen for hints on how to do this). A complete netcdf / parallel-netcdf / pio implementation is still in the works because there are challenges with environment settings on edison that impede a clean build via either homebrew or conda. |
|
This looks very helpful. I'd like to make sure I can build with it (on my laptop, mustang, wolf and edison) as well before it gets merged. |
|
|
||
| NCINCDIR=${NETCDF}/include | ||
| NCLIBDIR=${NETCDF}/lib | ||
| NETCDFLIB = $(shell nc-config --flibs) -lnetcdf_c++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--flibs should be --libs here. I can't get this to build on Edison without this change because there is not a version of NetCDF there that includes both Fortran and C++ libraries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good @xylar and thanks for catching this typo. By the way, if you build with either https://github.com/pwolfram/homebrew-mpas or https://github.com/pwolfram/mpas-conda-recipes/tree/parallel_netcdf that version should have both the C++ and Fortran bindings to netcdf. As far as I can tell only the netcdf library and its bindings are correctly installed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI @xylar, this is fixed but github hasn't updated the comment for whatever reason...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwolfram, Thanks for the tips on building the libraries. I appreciate the work you've put into those and I'll give them a try at NERSC as soon as I have the time.
|
@pwolfram, I only tested the I had difficulty on Edison because the NetCDF libraries I usually use ( module purge
module load PrgEnv-gnu # which complain about a lot of modules it can't find
module load netcdf/4.4.0
makeThis produces the executables but may be impractical for use in MPAS runs if the NetCDF libraries are dynamically linked, given that these are different libraries from those that can be used to build/run the model. Obviously this concern is outside the scope of this PR, I just wanted to mention what I had had to do to test the PR. |
|
@pwolfram, I realized that things are probably also not configured correctly on wolf or mustang. If I do: This is definitely not the correct version of NetCDF to be using. I think the problem is that, when loading the following NetCDF modules on IC machines we are not adding the NetCDF This really needs to be fixed on IC before you merge this PR so we make sure no one's tools suddenly stop working. |
|
@pwolfram, okay, I've made sure the Whoever reviews this should go through all the tools and make sure 1) they build and 2) they actually work. That's more than I have time to take on at the moment but if you don't find anyone willing to review it, check back with me later. |
|
@pwolfram: I can build and run the mask_creator tool, if that helps. I could do it on edison, wolf, and mustang (and maybe OLCF?). |
|
@milenaveneziani, to clarify, you're offering to try that? Rather than that you've already done it? |
Removes new c++ features that are inessential to operation of code.
a1da856 to
85c438c
Compare
|
Thanks for the comments @xylar, the |
|
Quick question, do these changes prevent $NETCDF from being used as the netcdf library location? If so is there a way to override the new behaviour and still have $NETCDF the netcdf library location? |
|
@xylar: sorry, after re-reading my comment I realize it's confusing.. Yes, I was offering to try that. I already ran the mask_creator on wolf, just by using the gnu libraries (not the setup of this PR). |
|
@akturner and @pwolfram, maybe the best way to deal with this would be to first look for @pwolfram, how feasible would that be? |
|
@akturner, thanks for catching this. After talking with you in person, we decided that we should check if @xylar, the goal here is to not break the current system and I think this approach should work-- do you agree? |
|
@pwolfram, I think you should still look for |
|
@pwolfram, let me amend the steps I would suggest:
Would that work? |
3f1b6c7 to
b7facea
Compare
|
@pwolfram, the logic seems right. It should be tested, though, and I don't have time just now. I'll put it on my list. |
|
Thanks @xylar. I tested this with my custom-built versions of netcdf with and without the
should fail I think because there is no information to build against netcdf. We may want to issue a warning for this-- would you agree? |
|
@pwolfram, I agree that case should probably fail. It would presumably fail with the current Makefile, right? |
|
@xylar, yes. |
|
@pwolfram Thank you for working on this. Compiling on many machines with various compilers and libraries is not fun, so we should all pitch it to ease the pain. I did a quick test of LANL IC, wolf gnu I get that warning, but all compiles and runs OK. I tested here: LANL IC, wolf intel and tested as before. Everything worked fine. |
|
NERSC edison, cray compiler in Makefile, use cray compile command It does not find this netcdf header file: I know I probably don't need PNETCDF and the compile flags might not be right for cray. Still, I spent 20 minutes on it and couldn't figure out how to compile, so this points out the difficulty of working on all these machines. |
|
Thanks, @xylar I see now. I wonder if we could just ask NERSC support to rebuild NETCDF with c++ support. It sounds like a reasonable request, and there is no time pressure. |
|
@mark-petersen, that might be a really good idea. At the very least they could tell us what they would recommend. Maybe submit a ticket about it? |
|
Well, with a few emails to NERSC consultants, we now have a netcdf library with c++ support. I am able to build and run on edison as follows: A few notes:
This is a good test to generalize the build system, if we can simply type "module load XXX" and "make" |
b7facea to
bcf2db0
Compare
|
@pwolfram, it might be a good idea to try to assign someone to review this PR. Otherwise it's not likely to move forward. Not that I'm volunteering.... |
|
As an FYI: I tried this on Edison and was able to compile the mask-creator tools using the workflow (and notes) that @mark-petersen indicated above. |
|
@xylar and @milenaveneziani, this has been a very controversial PR and to be honest, I'm not sure it is finished yet. Some good progress is made but there is likely more to do before it is ready to merge. The risk is introduction of bugs into the build systems if we move too quickly on this merge. As such, it is less clear if this will be a true added value merge if we prematurely merge. Could either of you take a look at this to see if it breaks anything for you? Also, it would be good if we can get @mgduda to test that this does not break his code for his applications. Once we know more fully that this doesn't cause problems I would be more comfortable about the merge. We don't want to break existing workflows. For example, I know the first cut of this broke @akturner's workflow, and we should have his sign-off before merging. |
|
@pwolfram: I am really not sure what to test.. |
|
Thanks @milenaveneziani, this is what makes this PR tricky. I'm guessing one way to approach this is to do the testing we can and fix things later if they get broken but I would prefer to use a better strategy if it exists. The minimal test is essentially that you can build without changing the makefile on every system you use for development. |
|
@pwolfram, I think the problem is that these various tools get used with quite varying frequency, by different users, and at different stages in our workflow. It might be best to apply these changes to a single set of tools (e.g. the mask creator, cell culler and mesh converter) and use that as the test bed. Other tools could follow, one at a time, so it would be clearer to the users of each tool what to test and when. It's unrealistic to expect everyone involved in all these tools to test this particular branch, even though it probably made sense to implement all these similar changes in one branch. |
|
@xylar, this is a great idea. If we need to split this out sooner than later @xylar and @milenaveneziani, which key parts need to follow this update for sub-PRs in the short term? |
|
@pwolfram, in my opinion, the mesh conversion tools would be the first priority. |
|
@xylar, @milenaveneziani, @mark-petersen and @akturner, I'm happy to return to this PR sometime soon. Do you all still agree that the best approach here is to split this into several smaller PRs, e.g., beginning with the mesh conversion tools? |
|
@pwolfram, yes, I still think it needs to be done one tool at a time because it's next to impossible to find everyone involved with every tool and have them test them all at once. I agree that mesh conversion tools should be a priority. |
|
#158 is the mesh conversion section of this PR. |
|
@pwolfram, the next subset of this PR that you might want to peel off would potentially be |
|
The changes to the global_scvt Makefile break the usability of this code for anyone who doesn't have |
|
The changes to grid_rotate also break builds when the netCDF library wasn't compiled with C++ support: |
|
@pwolfram, can we close the PR and (if you're so inclined) open PRs for individual tools one by one? I don't think this PR is going to happen as it is. |
|
I would agree that we should probably close this PR and create PRs to update individual tools. As I noted in earlier comments, the changes as proposed seem to break certain tools or generally, builds on certain platforms. What we could do is to discuss a general strategy to be used for detecting the location of the netCDF library and the necessary include paths. We could then adopt this strategy on a per-tool basis. @xylar and @pwolfram would there be any interest to discuss this via e-mail, a GitHub Issue, or on a future developers's telecon? |
|
@mgduda, I'm not clear on whether there is enough broad-spectrum need / interest here. The risk of breaking someones' workflow appear to outweigh potential benefits, hence its "zombie" state... do you see a clear path toward integration / cleanup? |
|
I think if there were some agreement on the "best" way to detect and use netCDF libraries in a Makefile, we could adopt that method in individual tools. Likely, whoever was the largest user of a particular tool -- or whoever had available time -- could handle this on a per-tool basis. However, I think we've all got quite a few items on our to-do lists, so I'm fine with simply closing this PR. |
|
I'm going to go ahead and make the executive decision to close this PR. |
This PR improves Makefiles and existing code for more robust compilation.