Did a package ever come out of this hackathon? #13

hdrake · 2022-02-17T20:50:58Z

Hi! I am writing some code to do similar things to subsampled a high-resolution MITgcm simulation and don't want to reinvent the wheel. Two questions:

Did this ever result in a standalone package being developed (or is there a similar package that does this already?)
If not, is it alright if I use some code from here to make one?

dhruvbalwada · 2022-02-17T21:01:27Z

This did not ever make it to a standalone package.
Part of the reason was because we never thought of it.

Also because the if you look through the notebooks (e.g. https://github.com/oceanhackweek/ohw21-proj-model-subsampling/blob/main/Application2LLC.ipynb), you will see that xarray's interp function does almost everything we need (cell number 128).
Most of the extra code is there to really just there to take care of the mapping between lat/lon and this particular MITgcm simulations index based dimensions.

Do you think this still requires a package of its own? If so, will the package just work for MITgcm?

kdrushka · 2022-02-17T21:10:22Z

Quick answer: I have been developing it here: https://github.com/kdrushka/oceanliner
Would love to collaborate on this if you're interested. Or you can just poach code if you find it useful :)
I'm headed out of the office for the long weekend but happy to discuss next week!

(most up-to-date and functional code is in the testing folder ...)

dhruvbalwada · 2022-02-17T21:15:26Z

This looks nice, I had not realized that you were working on this.
I have starred it to see how it goes. :)

kdrushka · 2022-02-17T22:48:59Z

Thanks! Indeed, the xarray interp function does all the heavy lifting. I have mainly been working on the user inputs and outputs.

It is written for netcdf files that have been converted to netcdf (while retaining original variable names) but could easily be adapted to other formats (e.g., binary files).

hdrake · 2022-02-18T21:41:24Z

Thanks @dhruvbalwada and @kdrushka! What I had in mind was something in between both of your suggestions, i.e. a very light-weight wrapper for

survey = xr.Dataset(
    dict(
        lon = xr.DataArray(lon, dims='casts'),
        lat =  xr.DataArray(lat, dims='casts'),
        time =  xr.DataArray(time, dims='casts'),
    )
)
ds['THETA'].interp(survey)

that would robustly handle various use cases and provide some basic utilities.

Even if this code snippet isn't complicated, it's 5 lines longer than it should be and will have to be modified for every use case! I imagine something flexible enough to handle various permutations of common arguments like:

import xsample
sample(ds['THETA'], ds_survey['THETA'], inner=['XC', 'YC', 'time']) # for xr.Dataset with dimension
sample(ds['THETA, SALT'], df_survey, inner=['XC', 'YC', 'Z', 'time'], outer='point') # for pd.DataFrame
sample(ds['UVEL'], {'XG': XG, 'YC': YC}, outer='mooring') # for np.array

where outer just refers to the name of the broadcasted dimension and would default to either the dimension of the xr.Dataset or, if its a pd.DataFrame, a list, or dict, it will just broadcast to a new dimension cast with integer indices.

hdrake · 2022-02-18T21:42:54Z

Anecdotally, I am even having trouble getting this to compute() for my large MITgcm dataset, which I have to chunk in both time and depth. The performance is much worse than it should be, I think because of the inefficient dask task-tree going on under the hood... Did you run into any of these issues?

dhruvbalwada · 2022-02-18T22:46:07Z

I personally did not dig into the optimal behavior that much when we were working on this. But I do remember wondering about this, especially for datasets from the LLC4320 run, but I don't remember what the chunking on it was. Maybe @kdrushka knows more since she has been working with the cutouts for SWOT cross-over more?

Do you have a notebook with what you are trying to do? Maybe if you post a notebook in gist, which shows the chunking etc for your particular example, I can take a look if I get some time - or ask people around me who are much more superior at task-graphs etc.

iuryt · 2022-02-19T15:21:33Z

Thanks @dhruvbalwada and @kdrushka! What I had in mind was something in between both of your suggestions, i.e. a very light-weight wrapper for
survey = xr.Dataset(
    dict(
        lon = xr.DataArray(lon, dims='casts'),
        lat =  xr.DataArray(lat, dims='casts'),
        time =  xr.DataArray(time, dims='casts'),
    )
)
ds['THETA'].interp(survey)
that would robustly handle various use cases and provide some basic utilities.

Even if this code snippet isn't complicated, it's 5 lines longer than it should be and will have to be modified for every use case! I imagine something flexible enough to handle various permutations of common arguments like:
import xsample
sample(ds['THETA'], ds_survey['THETA'], inner=['XC', 'YC', 'time']) # for xr.Dataset with dimension
sample(ds['THETA, SALT'], df_survey, inner=['XC', 'YC', 'Z', 'time'], outer='point') # for pd.DataFrame
sample(ds['UVEL'], {'XG': XG, 'YC': YC}, outer='mooring') # for np.array
where outer just refers to the name of the broadcasted dimension and would default to either the dimension of the xr.Dataset or, if its a pd.DataFrame, a list, or dict, it will just broadcast to a new dimension cast with integer indices.

I like your suggestion and believe that this could be a nice solution for that. @kdrushka and @dhruvbalwada
Should we create a separate issue to tackle this implementation?
Depending on how this works out, we could also think on publishing it on pip and conda-forge.

kdrushka · 2022-02-24T02:21:59Z

@hdrake , I like your suggestion - this is much simpler than what I'm doing.

I am also struggling with memory - in part because I am still new to python/dask so I am mostly guessing at how to optimize the code. That said, it runs (albeit slowly) on the llc4320 cutouts I'm working with.

hdrake · 2022-02-26T14:12:19Z

I like your suggestion and believe that this could be a nice solution for that. @kdrushka and @dhruvbalwada Should we create a separate issue to tackle this implementation? Depending on how this works out, we could also think on publishing it on pip and conda-forge.

I agree it could be worth putting on pip / conda-forge if it works out, but in that case it should be extremely lightweight. As few dependencies as possible, no notebooks or figures in the Repo to keep it small, etc.

I'm a bit swamped with OSM this next week but will return to this in March.

kdrushka · 2022-02-28T23:24:44Z

I like your suggestion and believe that this could be a nice solution for that. @kdrushka and @dhruvbalwada Should we create a separate issue to tackle this implementation? Depending on how this works out, we could also think on publishing it on pip and conda-forge.

I agree it could be worth putting on pip / conda-forge if it works out, but in that case it should be extremely lightweight. As few dependencies as possible, no notebooks or figures in the Repo to keep it small, etc.

I'm a bit swamped with OSM this next week but will return to this in March.

Sure! Let me know if/how I can help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Did a package ever come out of this hackathon? #13

Did a package ever come out of this hackathon? #13

hdrake commented Feb 17, 2022

dhruvbalwada commented Feb 17, 2022

kdrushka commented Feb 17, 2022 •

edited

Loading

dhruvbalwada commented Feb 17, 2022

kdrushka commented Feb 17, 2022 •

edited

Loading

hdrake commented Feb 18, 2022 •

edited

Loading

hdrake commented Feb 18, 2022 •

edited

Loading

dhruvbalwada commented Feb 18, 2022

iuryt commented Feb 19, 2022

kdrushka commented Feb 24, 2022

hdrake commented Feb 26, 2022 •

edited

Loading

kdrushka commented Feb 28, 2022

Did a package ever come out of this hackathon? #13

Did a package ever come out of this hackathon? #13

Comments

hdrake commented Feb 17, 2022

dhruvbalwada commented Feb 17, 2022

kdrushka commented Feb 17, 2022 • edited Loading

dhruvbalwada commented Feb 17, 2022

kdrushka commented Feb 17, 2022 • edited Loading

hdrake commented Feb 18, 2022 • edited Loading

hdrake commented Feb 18, 2022 • edited Loading

dhruvbalwada commented Feb 18, 2022

iuryt commented Feb 19, 2022

kdrushka commented Feb 24, 2022

hdrake commented Feb 26, 2022 • edited Loading

kdrushka commented Feb 28, 2022

kdrushka commented Feb 17, 2022 •

edited

Loading

kdrushka commented Feb 17, 2022 •

edited

Loading

hdrake commented Feb 18, 2022 •

edited

Loading

hdrake commented Feb 18, 2022 •

edited

Loading

hdrake commented Feb 26, 2022 •

edited

Loading