Skip to content

LANL machine files: add badger, remove wolf, pinto#240

Merged
eclare108213 merged 2 commits into
CICE-Consortium:masterfrom
eclare108213:mach
Nov 29, 2018
Merged

LANL machine files: add badger, remove wolf, pinto#240
eclare108213 merged 2 commits into
CICE-Consortium:masterfrom
eclare108213:mach

Conversation

@eclare108213

Copy link
Copy Markdown
Contributor

Updated machine files for LANL institutional computing.
This is not ready to merge yet -- see test failures below.

  • Developer(s): E. Hunke

  • Please suggest code Pull Request reviewers in the column at right.

  • Are the code changes bit for bit, different at roundoff level, or more substantial? BFB

  • To verify that this PR passes the initial QC tests, lease include the link to test results
    or paste in below the summary block from the bottom of the testing output.

  • Does this PR create or have dependencies on CICE or any other models?

  • Is the documentation being updated with this PR? (Y/N) no
    If not, does the documentation need to be updated separately at a later time? (Y/N) no?

I don't think we list specific machines in the documentation, other than as examples for how to run various tests, right?

Note: "Documentation" includes information on the wiki and .rst files in doc/source/,
which are used to create the online technical docs at https://readthedocs.org/projects/cice-consortium-cice/.

  • Other Relevant Details:
    Unfortunately I don't have a pinto or wolf baseline using the code after the latest nonBFB mods, so there's no comparison across machines. I'm getting 3 failures in the Icepack base_suite. Is this failing for anyone else, or is it due to this new machine I'm using? These are all restart failures, and not all restarts are failing.

#---
PASS badger_intel_restart_col_1x1_pondcesm build
PASS badger_intel_restart_col_1x1_pondcesm initialrun
PASS badger_intel_restart_col_1x1_pondcesm run
FAIL badger_intel_restart_col_1x1_pondcesm test
#---
PASS badger_intel_restart_col_1x1_pondtopo build
PASS badger_intel_restart_col_1x1_pondtopo initialrun
PASS badger_intel_restart_col_1x1_pondtopo run
FAIL badger_intel_restart_col_1x1_pondtopo test
#---
PASS badger_intel_restart_col_1x1_dyn build
PASS badger_intel_restart_col_1x1_dyn initialrun
PASS badger_intel_restart_col_1x1_dyn run
FAIL badger_intel_restart_col_1x1_dyn test

88 of 91 tests PASSED
3 of 91 tests FAILED
0 of 91 tests PENDING

@apcraig

apcraig commented Nov 29, 2018

Copy link
Copy Markdown
Contributor

You can see recent test results here,

https://github.com/CICE-Consortium/Test-Results/wiki/icepack_by_hash

for comparison. I don't think those are failing elsewhere. Looks like maybe something related to "1x1" build/run or something in serial infrastructure?

@eclare108213

Copy link
Copy Markdown
Contributor Author

All of the Icepack tests are serial, so it's not that. The question is whether I try to debug this now, or if we just go with it and I'll try to figure out what's going on later. My CICE base_suite appears to have run fine.

@apcraig

apcraig commented Nov 29, 2018

Copy link
Copy Markdown
Contributor

Of course, everything is 1x1. So, are you getting both the initial and restart to complete. Then the restart comparison is failing? Is it worth trying those tests again to see if it was just a hickup of some kind? I'm ok moving forward with the icepack release as is. I think the CICE testing is more important in some ways. But we should figure this out if it's repeatable. Also OK holding things up a bit if you want to look into it.

@eclare108213

Copy link
Copy Markdown
Contributor Author

The runs completed and they really are different. For sanity's sake, could you re-run these 3 tests? I'll do the same.

@apcraig

apcraig commented Nov 29, 2018

Copy link
Copy Markdown
Contributor

I'm running these tests now on conrad, more soon.

@eclare108213

Copy link
Copy Markdown
Contributor Author

It might be an initialization issue. I turned on the debug option for the tests, and they all passed. I think debug sets -g, which initializes everything to 0. This issue could have appeared when I removed the initializations from the tracer index query routine in #230, and maybe the other machines initialize stuff differently from what badger is doing. Pesky little beast (I saw one here in NM a few weeks ago, actually a pretty amusing critter to watch). Maybe I can find the right place to initialize these indices...

@apcraig

apcraig commented Nov 29, 2018

Copy link
Copy Markdown
Contributor

I ran the three tests on conrad on four compilers, all pass.


PASS conrad_intel_restart_col_1x1_pondcesm test 
PASS conrad_intel_restart_col_1x1_pondtopo test 
PASS conrad_intel_restart_col_1x1_dyn test 
PASS conrad_pgi_restart_col_1x1_pondcesm test 
PASS conrad_pgi_restart_col_1x1_pondtopo test 
PASS conrad_pgi_restart_col_1x1_dyn test 
PASS conrad_gnu_restart_col_1x1_pondcesm test 
PASS conrad_gnu_restart_col_1x1_pondtopo test 
PASS conrad_gnu_restart_col_1x1_dyn test 
PASS conrad_cray_restart_col_1x1_pondcesm test 
PASS conrad_cray_restart_col_1x1_pondtopo test 
PASS conrad_cray_restart_col_1x1_dyn test 

@eclare108213

Copy link
Copy Markdown
Contributor Author

One of the flags was not initialized with all the rest, but that wasn't the problem. I'll run another base_suite to make sure that change is okay. The test failures were occurring due to -O2, so I backed off to -O1. This is devilishly difficult to debug, so I'll make an issue and maybe we'll figure it out eventually.

@eclare108213

Copy link
Copy Markdown
Contributor Author

@apcraig

apcraig commented Nov 29, 2018

Copy link
Copy Markdown
Contributor

Great!

@apcraig

apcraig commented Nov 29, 2018

Copy link
Copy Markdown
Contributor

Feel free to merge when you're ready.

@eclare108213 eclare108213 merged commit fcf58ec into CICE-Consortium:master Nov 29, 2018
lettie-roach pushed a commit to lettie-roach/Icepack that referenced this pull request Oct 18, 2022
* move write stmts, cleanup unused vars

* replace sil_data_type and nit_data_type with bgc_data_type

* documentation

* use ocn_data_type for SSS
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants