[use case demonstration] Kvikio Direct-to-gpu -> xarray -> xbatcher -> ml model #87

jhamman · 2022-08-25T15:57:39Z

What is your issue?

Recent developments by @NVIDIA and @dcherian are opening the door for direct-to-gpu data loading in Xarray. This could mean that when combined with Xbatcher and the tensorflow or pytorch data loaders, a complete workflow from Zarr all the way to a ml model training could be accomplished without ever handling data on a CPU.

Here's a short illustration of the potential workflow:

import xarray as xr
import xbatcher

ds = xr.open_dataset(store, engine="kvikio", consolidated=False)

x_gen = xbatcher.BatchGenerator(ds[xvars], {'time': 10}) 
y_gen = xbatcher.BatchGenerator(ds[yvars], {'time': 10}) 

tf_dataset = xbatcher.loaders.keras.CustomTFDataset(x_gen, y_gen)

model.fit(tf_dataset, ...)

This would be awesome to demonstrate in a single example. Perhaps as a second tutorial on Xbatcher's documentation site.

xref: xarray-contrib/cupy-xarray#10

cc @dcherian, @negin513, and @weiji14

dcherian · 2022-08-25T17:02:02Z

I like how you tagged NVIDIA hahaha.

The RAPIDS folks (@jakirkham, @madsbk, @jacobtomlinson) were really interested in a blogpost about this stuff

weiji14 · 2022-08-25T17:12:52Z

👍 for a blog post. I'd be happy to contribute to a draft blog post as @dcherian suggested at a recent Pangeo meeting for https://medium.com/pangeo (or https://medium.com/rapids-ai), but probably need to wait for pydata/xarray#6874 and zarr-developers/zarr-python#934 to get merged and new xarray and Zarr releases first.

One issue with having this kvikio tutorial on xbatcher's documentation though is that we don't have GPUs in GitHub Actions CI or Readthedocs, so it can't be built dynamically 🙂 We'll either need to cache the outputs, or find another way or place to host the tutorial.

jhamman · 2022-08-25T18:25:40Z

I love the idea of a blog post here. Perhaps we publish the post in a few places at once (xarray's blog would also work).

One issue with having this kvikio tutorial on xbatcher's documentation though is that we don't have GPUs in GitHub Actions CI or Readthedocs, so it can't be built dynamically 🙂 We'll either need to cache the outputs, or find another way or place to host the tutorial.

I think its probably worth publishing a "cached" notebook here even though it won't be running by most folks. A strong disclaimer at the top stating the purpose will probably be sufficient to avoid confusion in the future.

dcherian · 2022-08-25T20:26:03Z

OK thanks for the prompt. I added a super brief intro blogpost here: xarray-contrib/xarray.dev#308 to get the word out. This blogpost could then just link to that for extra details.

weiji14 · 2022-09-02T12:41:16Z

One issue with having this kvikio tutorial on xbatcher's documentation though is that we don't have GPUs in GitHub Actions CI or Readthedocs, so it can't be built dynamically slightly_smiling_face We'll either need to cache the outputs, or find another way or place to host the tutorial.

I think its probably worth publishing a "cached" notebook here even though it won't be running by most folks. A strong disclaimer at the top stating the purpose will probably be sufficient to avoid confusion in the future.

At https://discourse.pangeo.io/t/statement-of-need-integrating-jupyterbook-and-jupyterhubs-via-ci/2705, there's some ideas on how to run 'expensive' (read: GPU required) notebooks via the Pangeo Binder Jupyter Hub. It'll be more work than the caching solution, but probably allows for easier reproducibility long-term for the wider community, especially if the GPU direct storage/kvikIO technology gets updated in the future and we need to re-run things for newer versions. Thoughts?

maxrjones · 2022-09-02T20:15:02Z

One issue with having this kvikio tutorial on xbatcher's documentation though is that we don't have GPUs in GitHub Actions CI or Readthedocs, so it can't be built dynamically slightly_smiling_face We'll either need to cache the outputs, or find another way or place to host the tutorial.

I think its probably worth publishing a "cached" notebook here even though it won't be running by most folks. A strong disclaimer at the top stating the purpose will probably be sufficient to avoid confusion in the future.

At https://discourse.pangeo.io/t/statement-of-need-integrating-jupyterbook-and-jupyterhubs-via-ci/2705, there's some ideas on how to run 'expensive' (read: GPU required) notebooks via the Pangeo Binder Jupyter Hub. It'll be more work than the caching solution, but probably allows for easier reproducibility long-term for the wider community, especially if the GPU direct storage/kvikIO technology gets updated in the future and we need to re-run things for newer versions. Thoughts?

I think the eventual goal should be to build the examples that are 'expensive' and cross-cutting in terms of software (e.g., Kvikio Direct-to-gpu -> xarray -> xbatcher -> ml model) as part of the Project Pythia cookbooks and link to those cookbooks from the individual package docs (e.g., xbatcher). But, as discussed on that thread some infrastructure developments are required before Project Pythia can support those examples. The notebook discussed here could be a great test case for the integration between JupyterHubs and JupyterBook and could be "cached" in xbatcher docs while that development happens.

weiji14 · 2022-09-05T15:26:52Z

Just on the infrastructure point, I noticed that GPU-enabled GitHub Actions is on the roadmap (github/roadmap#505), but unsure if this will be limited to Teams/Enterprise plans only as with https://github.blog/changelog/2022-09-01-github-actions-larger-runners-are-now-in-public-beta. In theory, this would allow for us to store an uncached version of the notebook and run it from time to time (though it will probably cost some $$).

Still, I think the Project Pythia cookbook method is worth pursuing, as the close integration with Pangeo Binder would allow users to actually run the example kvikIO notebook on the cloud. In practical terms, we could:

Wait for the PRs mentioned in Add Kvikio backend entrypoint cupy-xarray#10 to be merged, and releases made for xarray/cupy-xarray/zarr
Have a 'cached' kvikIO notebook
Have an un-cached kvikIO notebook using either
1. GitHub Actions GPU (if it becomes available)
2. Project Pythia infrastructure

joshmoore · 2022-09-08T09:31:13Z

@weiji14 commented 14 days ago
but probably need to wait for ... zarr-developers/zarr-python#934 to get merged and new xarray and Zarr releases first.

Now available in zarr-python 2.13.0a2 for testing.

dcherian · 2022-09-09T18:15:14Z

Is there a cloud provider that has the necessary GDS stuff set up?

weiji14 · 2022-09-10T20:53:22Z

Is there a cloud provider that has the necessary GDS stuff set up?

Tried running on Microsoft Planetary Computer (gpu-pytorch container), GPU direct storage doesn't work yet, but compatibility mode works. Below are results from python single-node-io.py (script from https://github.com/rapidsai/kvikio/blob/29c52f76035002d91f301895250c0ff14f18f50a/python/benchmarks/single-node-io.py):

----------------------------------
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
   WARNING - KvikIO compat mode   
      libcufile.so not used       
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
GPU               | Unknown (install pynvml)
GPU Memory Total  | Unknown (install pynvml)
BAR1 Memory Total | Unknown (install pynvml)
GDS driver        | N/A (Compatibility Mode)
GDS config.json   | /etc/cufile.json
----------------------------------
nbytes            | 10485760 bytes (10.00 MiB)
4K aligned        | True
pre-reg-buf       | True
diretory          | /tmp/tmp9a8nd5kz
nthreads          | 1
nruns             | 1
==================================
cufile read       |   4.28 GiB/s
cufile write      |  92.59 MiB/s
posix read        |   1.23 GiB/s
posix write       |   1.24 GiB/s

Could try to get in a PR to install the necessary GPU direct storage and kvikIO packages perhaps, they're usually pretty responsive. Edit: opened issue at microsoft/planetary-computer-containers#51.

weiji14 · 2022-09-10T21:06:29Z

Oh, and if we do get GPU direct storage setup on Microsoft Planetary Computer (on Azure West Europe), I have an idea to get a demo working with the https://github.com/carbonplan/cmip6-downscaling dataset (since it's also on Azure West Europe?). This may or may not require the multi-resolution issue at #93 to be resolved, but it looked like a good Zarr machine learning dataset to play with.

As a start, I did try this quickly:

xr.open_dataset(
    "https://cpdataeuwest.blob.core.windows.net/cp-cmip/version1/data/DeepSD/ScenarioMIP.CCCma.CanESM5.ssp245.r1i1p1f1.day.DeepSD.pr.zarr",
    engine="kvikio",
    consolidated=False,
)

but got a strange GroupNotFoundError: group not found at path '' (Using xr.open_zarr worked fine though). So realistically, still a few things to iron out on cupy-xarray and xarray perhaps, maybe a month or two's worth of work?

weiji14 · 2023-08-01T20:23:48Z

Ok, looks like I've severely underestimated how long this is going to take 😅 Hoping to get some time to work on this in October 2023 🤞, but just gonna make a TODO list on things that need to happen:

Documentation. Right now everything is in a blog post. There's been some related work at https://github.com/negin513/cupy-xarray-tutorials (not direct GPU, but CPU->GPU), which we could build on top of
Cloud infrastructure. Maybe start with one cloud provider (AWS?), and ensure that the disk partition, network connections and all that are setup properly to ensure low I/O latency.

Longer term, we'll also look into:

Non-Zarr file formats. May be a way to get this to work via kerchunk (see failed attempt at https://discourse.pangeo.io/t/accessing-nested-hdf5-file-from-http-via-kerchunk/3432/6), could maybe look into NetCDF, Cloud-Optimized GeoTIFFs, and others next.
More cloud providers - Document how to set things up on AWS/Azure/GCP/etc

dcherian · 2023-08-01T21:46:34Z

Maybe start with one cloud provider (AWS?), and ensure that the disk partition, network connections and all that are setup properly to ensure low I/O latency.

It may be a lot easier to experiment on NCAR systems once they can do it. @negin513 seems very interested in this kind of thing :)

maxrjones · 2023-08-02T00:12:46Z

thanks for creating the to-do list @weiji14! as we discussed earlier today, I'll also have some time in October to contribute and am particularly interesting in the kerchunk connections.

jakirkham · 2023-08-02T01:34:10Z

Starting with the name brand CSPs is a reasonable first step

While lesser known, CoreWeave has been putting in good effort to configuring hardware optimally

Though if you have your own system that you are planning to use long term, setting up there sounds good

weiji14 · 2023-08-02T02:47:34Z

Cool, the idea is to enable more people to run kvikIO/NVIDIA GPUDirect Storage, either on a local GPU, or in the cloud if they don't have one. That's why I'd like to start with the documentation, and we could experiment on NCAR first to understand how involved the configuration would be. Once we've figured out the config settings, we can then expand to other HPC or commercial cloud systems. That CoreWeave offering does look nice, though I can't see on their webpage if they do support NVIDIA GDS (would like to hope that they do)!

weiji14 · 2023-10-13T04:43:38Z

Have managed to run some benchmark experiments on a WeatherBench2/ERA5 subset comparing kvikIO (GPU-based) and zarr (CPU-based) engines at zarr-developers/zarr-benchmark#14. See also related discussion at zarr-developers/zarr-benchmark#14 where I describe the technical stuff in more detail. And yes, the benchmark code uses xbatcher too 😉

Initial results are that kvikIO takes ~25% less time to load data than zarr (though I'm not confident with that number yet, because the numbers change drastically between subsequent runs due to some strange factors like caching). I'll be giving a talk next week at FOSS4G SotM Oceania 2023 to get people excited about this, and hope that things can move forward a bit more 😄

KiranModukuri · 2023-10-25T21:26:05Z

@weiji14 can you please describe where these tests were run local Machine or in Cloud environment ?

weiji14 · 2023-10-25T21:52:00Z

Hi @KiranModukuri, yes, these tests were ran locally (using an NVIDIA RTX A2000 8GB GPU). I did try to set up a GCP container to run the benchmarks (WeatherBench2's ERA5 is at https://console.cloud.google.com/storage/browser/weatherbench2/datasets/era5), but was running into quota issues allocating GPUs on us-central1 where the dataset is stored.

jhamman added the use case label Aug 25, 2022

dcherian mentioned this issue Aug 25, 2022

Add kvikio blogpost xarray-contrib/xarray.dev#308

Merged

joshmoore mentioned this issue Sep 8, 2022

Add Kvikio backend entrypoint xarray-contrib/cupy-xarray#10

Draft

8 tasks

This was referenced Sep 10, 2022

NVIDIA GPU direct storage microsoft/planetary-computer-containers#51

Open

Add cupy to ml notebooks pangeo-data/pangeo-docker-images#322

Open

srib mentioned this issue Sep 23, 2022

GPU direct storage backend for mdio TGSAI/mdio-python#64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[use case demonstration] Kvikio Direct-to-gpu -> xarray -> xbatcher -> ml model #87

[use case demonstration] Kvikio Direct-to-gpu -> xarray -> xbatcher -> ml model #87

jhamman commented Aug 25, 2022

dcherian commented Aug 25, 2022

weiji14 commented Aug 25, 2022 •

edited

Loading

jhamman commented Aug 25, 2022

dcherian commented Aug 25, 2022 •

edited

Loading

weiji14 commented Sep 2, 2022 •

edited

Loading

maxrjones commented Sep 2, 2022

weiji14 commented Sep 5, 2022 •

edited

Loading

joshmoore commented Sep 8, 2022 •

edited

Loading

dcherian commented Sep 9, 2022

weiji14 commented Sep 10, 2022 •

edited

Loading

weiji14 commented Sep 10, 2022 •

edited

Loading

weiji14 commented Aug 1, 2023 •

edited

Loading

dcherian commented Aug 1, 2023

maxrjones commented Aug 2, 2023

jakirkham commented Aug 2, 2023

weiji14 commented Aug 2, 2023

weiji14 commented Oct 13, 2023

KiranModukuri commented Oct 25, 2023

weiji14 commented Oct 25, 2023

[use case demonstration] Kvikio Direct-to-gpu -> xarray -> xbatcher -> ml model #87

[use case demonstration] Kvikio Direct-to-gpu -> xarray -> xbatcher -> ml model #87

Comments

jhamman commented Aug 25, 2022

What is your issue?

dcherian commented Aug 25, 2022

weiji14 commented Aug 25, 2022 • edited Loading

jhamman commented Aug 25, 2022

dcherian commented Aug 25, 2022 • edited Loading

weiji14 commented Sep 2, 2022 • edited Loading

maxrjones commented Sep 2, 2022

weiji14 commented Sep 5, 2022 • edited Loading

joshmoore commented Sep 8, 2022 • edited Loading

dcherian commented Sep 9, 2022

weiji14 commented Sep 10, 2022 • edited Loading

weiji14 commented Sep 10, 2022 • edited Loading

weiji14 commented Aug 1, 2023 • edited Loading

dcherian commented Aug 1, 2023

maxrjones commented Aug 2, 2023

jakirkham commented Aug 2, 2023

weiji14 commented Aug 2, 2023

weiji14 commented Oct 13, 2023

KiranModukuri commented Oct 25, 2023

weiji14 commented Oct 25, 2023

weiji14 commented Aug 25, 2022 •

edited

Loading

dcherian commented Aug 25, 2022 •

edited

Loading

weiji14 commented Sep 2, 2022 •

edited

Loading

weiji14 commented Sep 5, 2022 •

edited

Loading

joshmoore commented Sep 8, 2022 •

edited

Loading

weiji14 commented Sep 10, 2022 •

edited

Loading

weiji14 commented Sep 10, 2022 •

edited

Loading

weiji14 commented Aug 1, 2023 •

edited

Loading