Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Distributed open_rasterio read error when URL contains permissions #3489

Closed
system123 opened this issue Nov 6, 2019 · 1 comment · Fixed by #7671
Closed

Distributed open_rasterio read error when URL contains permissions #3489

system123 opened this issue Nov 6, 2019 · 1 comment · Fixed by #7671

Comments

@system123
Copy link

MCVE Code Sample

I have a GeoTiff which is stored in an S3 bucket and accessible via a URL which contains authentication parameters. When opening the file with xarray.open_rasterio I am able to read the file's metadata and perform computations as expected. However, if I try and run these computations across a Dask LocalCluster or KubeCluster I can only read the metadata, the computations fail with a 403 error.

from dask.distributed import Client, LocalCluster
import xarray as xr

url = "https://dataset.s3.us-west-2.amazonaws.com/mosaic-dir/SAR-mosaic.tif?AWSAccessKeyId=XXXXXXXX&Expires=1573079289&Signature=XXXXXXXXXX&x-amz-security-token=XXXXX....."

client = Client(LocalCluster())
ds = xr.open_rasterio(url, chunks=5000)
ds.mean().compute()

Output

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    197             try:
--> 198                 file = self._cache[self._key]
    199             except KeyError:

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/lru_cache.py in __getitem__(self, key)
     52         with self._lock:
---> 53             value = self._cache[key]
     54             self._cache.move_to_end(key)

KeyError: [<function open at 0x7f18da7ba488>, ('MY URL',), 'r', ()]

During handling of the above exception, another exception occurred:

CPLE_HttpResponseError                    Traceback (most recent call last)
rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

rasterio/_shim.pyx in rasterio._shim.open_dataset()

rasterio/_err.pyx in rasterio._err.exc_wrap_pointer()

CPLE_HttpResponseError: HTTP response code: 403

During handling of the above exception, another exception occurred:

RasterioIOError                           Traceback (most recent call last)
<ipython-input-2-ee58a2575c0a> in <module>
      1 client = Client(LocalCluster())
      2 url = "MY URL"
----> 3 ds = xr.open_rasterio(url, chunks=5000)
      4 ds.mean().compute()

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/rasterio_.py in open_rasterio(filename, parse_coordinates, chunks, cache, lock)
    237 
    238     manager = CachingFileManager(rasterio.open, filename, lock=lock, mode="r")
--> 239     riods = manager.acquire()
    240     if vrt_params is not None:
    241         riods = WarpedVRT(riods, **vrt_params)

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/file_manager.py in acquire(self, needs_lock)
    178         An open file object, as returned by ``opener(*args, **kwargs)``.
    179         """
--> 180         file, _ = self._acquire_with_cache_info(needs_lock)
    181         return file
    182 

/srv/conda/envs/notebook/lib/python3.7/site-packages/xarray/backends/file_manager.py in _acquire_with_cache_info(self, needs_lock)
    202                     kwargs = kwargs.copy()
    203                     kwargs["mode"] = self._mode
--> 204                 file = self._opener(*self._args, **kwargs)
    205                 if self._mode == "w":
    206                     # ensure file doesn't get overriden when opened again

/srv/conda/envs/notebook/lib/python3.7/site-packages/rasterio/env.py in wrapper(*args, **kwds)
    443 
    444         with env_ctor(session=session):
--> 445             return f(*args, **kwds)
    446 
    447     return wrapper

/srv/conda/envs/notebook/lib/python3.7/site-packages/rasterio/__init__.py in open(fp, mode, driver, width, height, count, crs, transform, dtype, nodata, sharing, **kwargs)
    214         # None.
    215         if mode == 'r':
--> 216             s = DatasetReader(path, driver=driver, sharing=sharing, **kwargs)
    217         elif mode == 'r+':
    218             s = get_writer_for_path(path)(path, mode, driver=driver, sharing=sharing, **kwargs)

rasterio/_base.pyx in rasterio._base.DatasetBase.__init__()

RasterioIOError: HTTP response code: 403

Expected Output

<xarray.DataArray 'Band1' ()>
array(2681.77006093)
Coordinates:
    pol      <U2 'VV'

Output of xr.show_versions()

INSTALLED VERSIONS ------------------ commit: None python: 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] python-bits: 64 OS: Linux OS-release: 4.4.0-1079-aws machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8 libhdf5: 1.10.5 libnetcdf: 4.6.2

xarray: 0.14.0
pandas: 0.25.2
numpy: 1.17.3
scipy: 1.3.1
netCDF4: 1.5.1.2
pydap: installed
h5netcdf: 0.7.4
h5py: 2.10.0
Nio: None
zarr: 2.3.2
cftime: 1.0.4.2
nc_time_axis: None
PseudoNetCDF: None
rasterio: 1.0.25
cfgrib: None
iris: 2.2.0
bottleneck: 1.2.1
dask: 2.6.0
distributed: 2.6.0
matplotlib: 3.1.1
cartopy: 0.17.0
seaborn: 0.9.0
numbagg: None
setuptools: 41.6.0.post20191029
pip: 19.3.1
conda: None
pytest: None
IPython: 7.9.0
sphinx: None

@dcherian
Copy link
Contributor

We've deleted the internal rasterio backend in favor of rioxarray. If this issue is still relevant, please migrate the discussion to the rioxarray repo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants