Skip to content

Use parallel processing to speed up obs processing#733

Merged
CoryMartin-NOAA merged 3 commits into
feature/gdas-validationfrom
feature/gdas-validation-parallel
Nov 16, 2023
Merged

Use parallel processing to speed up obs processing#733
CoryMartin-NOAA merged 3 commits into
feature/gdas-validationfrom
feature/gdas-validation-parallel

Conversation

@CoryMartin-NOAA
Copy link
Copy Markdown
Contributor

Using python multiprocessing to generate obs in parallel. The current list of obs goes from 10+ minutes to completing in ~5.5 minutes.

# rm_p(yaml_output_file)

# run all bufr2ioda yamls in parallel
with mp.Pool(num_cores) as pool:
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, the easiest way to do this was to split them into python and YAML+executable groups. If they are roughly equal in size, that probably is okay, but we may want to combine them all into the same pool?

Copy link
Copy Markdown
Contributor

@RussTreadon-NOAA RussTreadon-NOAA left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with python multiprocessing. Is running in parallel from python the most efficient approach?

Copied ush/ioda/bufr2ioda/run_bufr2ioda.py to a working copy of gdas-validation. Ran gdasprepatmiodaobs. Log file indicates jobs ran in parallel.

^[[38;21m2023-11-16 00:54:20,329 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_gpsro_bufr_combined.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/gpsro_bufr_combined_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,329 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_satwind_scat.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/satwind_scat_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_adpsfc_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/adpsfc_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_adpupa_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/adpupa_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_conventional_prepbufr_ps.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/conventional_prepbufr_ps_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_sfcshp_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/sfcshp_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_satwind_amv_goes.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/satwind_amv_goes_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,331 - INFO     - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_acft_profiles_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/acft_profiles_prepbufr_2021080100.json^[[0m

The total run time for the parallel prepatmiodaobs was 06:28 (mm:ss)

The previous serial job took 11:06.

Nice reduction!

@CoryMartin-NOAA
Copy link
Copy Markdown
Contributor Author

@RussTreadon-NOAA I'm not sure if it's the most efficient, but I think this will work fine provided we don't need to run on multiple nodes. This was the fastest way to speed everything up. Next, we may wish to combine the pools so that the bufr2ioda.x threads run concurrently with the python ones. But that can be in a subsequent PR.

@CoryMartin-NOAA CoryMartin-NOAA merged commit 9fc4d73 into feature/gdas-validation Nov 16, 2023
@CoryMartin-NOAA CoryMartin-NOAA deleted the feature/gdas-validation-parallel branch November 16, 2023 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants