Improve potential bottlenecks in `compute_Sv` #1200

lsetiawan · 2023-10-26T16:57:57Z

Below are the 2 potential places where the calibration code (compute_Sv) code can have bottleneck:

in get_vend_cal_params_power where a broadcasting is probably used to get a data variable into the "right" shape based on indexing:

echopype/echopype/calibrate/cal_params.py

Lines 299 to 304 in 593a7e1

    
           # Get param dataarray into correct shape 
        
           da_param = ( 
        
               vend[param] 
        
               .expand_dims(dim={"ping_time": idxmin["ping_time"]})  # expand dims for direct indexing 
        
               .sortby(idxmin.channel)  # sortby in case channel sequence differs in vend and beam 
        
           )

in harmonize_env_param_time where there is either a check for all the timestamps or an interpolation:

echopype/echopype/calibrate/env_params.py

Lines 53 to 66 in 593a7e1

    
           # If there's only 1 time1 value, 
        
           # or if after dropping NaN there's only 1 time1 value 
        
           if p["time1"].size == 1 or p.dropna(dim="time1").size == 1: 
        
               return p.dropna(dim="time1").squeeze(dim="time1").drop("time1") 
        
           # Direct assignment if all timestamps are identical (EK60 data) 
        
           elif np.all(p["time1"].values == ping_time.values): 
        
               return p.rename({"time1": "ping_time"}) 
        
           elif ping_time is None: 
        
               raise ValueError(f"ping_time needs to be provided for interpolating {p.name}") 
        
           else: 
        
               return p.dropna(dim="time1").interp(time1=ping_time)

Originally posted by @leewujung in #1165 (comment)

The text was updated successfully, but these errors were encountered:

anantmittal · 2023-11-03T21:18:13Z

pytest -vvrP echopype/tests/calibrate/test_cal_params.py::test_get_vend_cal_params_power
pytest -vvrP echopype/tests/calibrate/test_env_params.py::test_harmonize_env_param_time

anantmittal · 2023-11-25T21:20:58Z

@leewujung Could you explain a bit more about the potential bottleneck in get_vend_cal_params_power ? Do we want to parallelize specific operations using dask? Or do we want to refactor the code using numpy/xarray itself to improve runtime?

anantmittal · 2023-11-25T21:47:00Z

@leewujung Similar question for improvement in harmonize_env_param_time . Are we looking into parallelizing specific (e.g., np.all) operations?

leewujung · 2023-11-26T00:18:17Z

Do we want to parallelize specific operations using dask? Or do we want to refactor the code using numpy/xarray itself to improve runtime?

I think it is both. We want to enable handling data with lazy-loaded arrays if it's not already, and ensure that the operations are distributed efficiently when dask supports them (and if not, see where dask is heading on the ops, since these ops are not very specific).

In get_vend_cal_params_power, I worry that expand_dims and to a lesser degree sortby is expensive.

In harmonize_env_param_time, the np.all in one of the conditions may be slow if time dimension is large, and .interp may also be slow.

anantmittal · 2023-11-30T21:31:01Z

For harmonize_env_param_time ,

On running some experiments, np.all appears to be the fastest way when using just numpy. This link also shares the same thought.

leewujung · 2024-02-25T20:46:55Z

This harmonize_env_param_time component was addressed in #1235.

leewujung · 2024-05-01T13:58:16Z

We can close this now with #1235 and #1285 merged.

leewujung assigned anantmittal and lsetiawan Oct 31, 2023

leewujung mentioned this issue Oct 31, 2023

Explore inefficiencies in cal_params.py #1163

Closed

lsetiawan mentioned this issue Nov 7, 2023

Optimize Frequency Differencing with Dask #1198

Merged

anantmittal mentioned this issue Nov 25, 2023

Optimize harmonize_env_param_time #1235

Merged

anantmittal mentioned this issue Dec 1, 2023

Optimize get_vend_cal_params_power #1238

Closed

leewujung mentioned this issue Feb 25, 2024

Check to see if this masking xr.where parallelize #1165

Closed

leewujung added processing functions GSoC24 labels Feb 27, 2024

lsetiawan assigned anujsinha3 and unassigned anantmittal Mar 15, 2024

lsetiawan linked a pull request Mar 27, 2024 that will close this issue

Optimize get vend cal params power #1285

Merged

leewujung closed this as completed May 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve potential bottlenecks in `compute_Sv` #1200

Improve potential bottlenecks in `compute_Sv` #1200

lsetiawan commented Oct 26, 2023

anantmittal commented Nov 3, 2023

anantmittal commented Nov 25, 2023

anantmittal commented Nov 25, 2023

leewujung commented Nov 26, 2023

anantmittal commented Nov 30, 2023

leewujung commented Feb 25, 2024

leewujung commented May 1, 2024

Improve potential bottlenecks in compute_Sv #1200

Improve potential bottlenecks in compute_Sv #1200

Comments

lsetiawan commented Oct 26, 2023

anantmittal commented Nov 3, 2023

anantmittal commented Nov 25, 2023

anantmittal commented Nov 25, 2023

leewujung commented Nov 26, 2023

anantmittal commented Nov 30, 2023

leewujung commented Feb 25, 2024

leewujung commented May 1, 2024

Improve potential bottlenecks in `compute_Sv` #1200

Improve potential bottlenecks in `compute_Sv` #1200