Use parallel processing to speed up obs processing#733
Conversation
| # rm_p(yaml_output_file) | ||
|
|
||
| # run all bufr2ioda yamls in parallel | ||
| with mp.Pool(num_cores) as pool: |
There was a problem hiding this comment.
Note, the easiest way to do this was to split them into python and YAML+executable groups. If they are roughly equal in size, that probably is okay, but we may want to combine them all into the same pool?
RussTreadon-NOAA
left a comment
There was a problem hiding this comment.
Not familiar with python multiprocessing. Is running in parallel from python the most efficient approach?
Copied ush/ioda/bufr2ioda/run_bufr2ioda.py to a working copy of gdas-validation. Ran gdasprepatmiodaobs. Log file indicates jobs ran in parallel.
^[[38;21m2023-11-16 00:54:20,329 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_gpsro_bufr_combined.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/gpsro_bufr_combined_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,329 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_satwind_scat.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/satwind_scat_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_adpsfc_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/adpsfc_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_adpupa_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/adpupa_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_conventional_prepbufr_ps.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/conventional_prepbufr_ps_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_sfcshp_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/sfcshp_prepbufr_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,330 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_satwind_amv_goes.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/satwind_amv_goes_2021080100.json^[[0m
^[[38;21m2023-11-16 00:54:20,331 - INFO - run_bufr2ioda.py: Executing /work2/noaa/da/rtreadon/gdas-validation/global-workflow/sorc/gdas.cd/ush/ioda/bufr2ioda/bufr2ioda_acft_profiles_prepbufr.py -c /work2/noaa/da/rtreadon/gdas-validation/comrot/gdas_eval_satwind_JEDI/gdas.20210801/00/obs/acft_profiles_prepbufr_2021080100.json^[[0m
The total run time for the parallel prepatmiodaobs was 06:28 (mm:ss)
The previous serial job took 11:06.
Nice reduction!
|
@RussTreadon-NOAA I'm not sure if it's the most efficient, but I think this will work fine provided we don't need to run on multiple nodes. This was the fastest way to speed everything up. Next, we may wish to combine the pools so that the |
Using python multiprocessing to generate obs in parallel. The current list of obs goes from 10+ minutes to completing in ~5.5 minutes.