DGGS

managing multiple CRS (local projections)

solution: DGGS (https://vimeo.com/935616988/d9de7a97e9 min 49:00)
UTM grid is not native to the sattelite
advantage of DGGS: one global CRS and still no loss in resolution
tiles of multiple local projections need to overlap. No comparision betwwen differnt zones -> DGGS is one global CRS
DGGS Cells wrap arround faces -> seamingless

data cube

vector world: sparse daat, few points from a couple of sites, tables with columns lon, lat and feature values, traditional databases postgis
raster world: dense data, images
Data Cube: https://esd.copernicus.org/articles/11/201/2020/
ESA and ECMWF using zarr for ARD
level 4 data: ARD
data cubes allow fast loading of data, crucial for AI

zarr

peta bytes of data
like COG and STAC but more flexible chunking
scaleable
paralell reads and writs

Abstract

Satellite images are traditionally stored and accessed in rectangular grids. Hereby, the surface of the earth is flattened to this 2D grid using a single global projection. This introduces distortions of shape and area, especially in more extreme latitudes. Discrete Global Grid Systems (DGGS) tessellate the surface of the earth with hierarchical cells of equal area, minimizing distortion and loading time of large geospatial datasets, which is crucial in spatial statistics and building Machine Learning models. Previous works focused on the creation of the grid themselves, as well as translating geographical coordinates to DGGS cell ids. Here we present a framework to store and access DGGS data.

Current state

Current specification: OGC Topic 21 / ISO 19170
Planned in future parts of the OGC specifications
- DGGS registry analog to EPSG.io
- DGGS data storage format

Definition of a DGGS

Related works

60 projections, one for each UTM zone
7 projections, one for each continent Equi7Grid (Bauer-Marschallinger et al., 2014)
20 projections: icosahedron

Google S2

cube with fixed orientation
Hilbert curve like index

Uber H3

hexagonal cells, fixed aperture of 7
icosahedron with fixed orientation
Gnomomic projection: Fast, distortions in both shape and area
Hierarchical index

DGGRID

DGGRID (Sahr et al.)

rHEALPix

(Gibb 2016)
FITS file format (Hivon et al. 2020)

openEAGER

openEAGER
has plug ins for elastic search and postGIS

Database management systems

One can store the data in a traditional relational database, e.g. PostGIS or ClickHouse (column based)
datase management system: Combines storage and analysis (e.g. SQL). We want to use all kinds of analysis languages instead
Storing DGGS cell data in such a database produces an enourmous overhead in space
report
PostGIS vs ClickHouse
PostGIS
- is an all prupose db -> Not optimized
- rowise (all properties of an element together). This is unsuitable for e.g. temperature mean, where we want to have all temperature values for all locations together in memory
- does not scale well (we need billions of rows)
- primarily single threaded
Clickhouse
- column based db like n-dimensional array
- Good for immutable data (e.g. log files), terrible at mutation
- Multi threaded

Cloud optimized GeoTIff (COG)

Uses pyramid
Tile id analog to cell id in DGGS
Area and shape distortion depend on the projection. For instance, MODIS tiles use sinusoidal projection resulting in tiles having a equal area
https://developers.planet.com/docs/planetschool/an-introduction-to-cloud-optimized-geotiffs-cogs-part-1-overview/

DWD / MPI-M ICON grid

ICOsahedral Nonhydrostatic model for wheather forfacsting
developed by MPI for Meterology and DWD since 2001
triangular grid on an icosahedron
equal area, no issue of meridian convergence e.g. at the poles
triangles have few neighbors: small discretization stencil, less communication, easier to parallalize in computing the diff. equations
triangles are the most simplest polygon: Used to sescribe 3D meshes in vis, games, ...
triangles are always flat in 3D: Easier to reason, faster compulations
triangles have perfect sub-division and perfect nesting
- Think about a low res cloud terrain model and a high res soil model. Perfect nesting is required for integration to converve mass, energy and momentum
hexagons have an undesirable geostrophic mode in modelling winds affected by coriolis force Niˇckovi´c et al. (2002)
Wan et al. 2013
Wheather models are a good example for DGGS (actual grid system where parent cells need to talk to child cells), Sattelite image ML is more a DGG (multi res is not that important unless multiple products of different resolutions are integrated)
No projection e.g. Gnomonic or ISEA, just subdivide great circles. In a normal DGGS, the cell boundaries are re-projected to the sphere. ICON just re-rpjects the 20 base triangles of the icosahedron.
ICON as a DGGS (Jubair et al. 2016)
Grids for models from Germany (ICON), US (MPAS) and Japan (NICAM) are very similar (Jubair et al. 2016)
ICON stores cell data in an unstructured way
ICON data is dsigned for simulation, nnot for visualization
climate modelling is limited by CPU/GPU and not IO (high performance and not high throughput), this data storage format might tot be the first priority
Neighbors of a triangle vertex are the vertices of a hexagon centered around rthat triangle vertex. Hex traversal to get neighbors (Jubair et al. 2016)](https://diglib.eg.org/xmlui/bitstream/handle/10.2312/vmv20161355/161-168.pdf?sequence=1)
rotate the pentagon outside the himalaya (high velocoty due to orograpphy -> eror in numerical simulations)

Shapes

alll shapes can be created from the same base grid (composing triangles into hexagons)
triangle
- only aperture 4
- most simple
- perfect nesting
diamond
- only aperture 4
- oerfect nesting
hexagon
- apertures 3,4 and 7
- inconsistent coriolis force (See ICON)

Indices

Most common packages, OGC definition): 1D string
Issue: Need to squeeze 2D surface to 1D index using space filling curves
These curves are streets restrictimg our movements! (Convolutions, BBox queries etc.)
Hilbert curve: Google S2
Peano curve: Geohash
https://dl.acm.org/doi/pdf/10.1145/93605.98742
https://www.youtube.com/watch?v=z3PaGIQTFSE
Coordinate transformations introduces error in physical floating point geo coordinates , especially a problem at global scale https://discourse.pangeo.io/t/example-which-highlights-the-limitations-of-netcdf-style-coordinates-for-large-geospatial-rasters/4140. DGGS is discrete: no floating point errors

Dimensionality

Ways of inexing DGGS data:

1D indices
- space filling curves
  - Google S2: Similar to Hilbert curves
- prefix codes
  - fast: get parents and distance (how many bits are shared)
  - slow: BBox (need polyfill, still better than point in any polygon queries)
  - examples
    - Uber H3: 3 bits per resolution
    - Generalized place index based on generalized balanced ternary, (Sahr)
    - Microsoft Bing Quadkeys
- this is an embedding to reduce 2D space in just one dimension while preserving good neighborship. This is helpful in Deep Learning (Similar to embedding)
- good for points: The binary search tree is a fractal as well
2D indies
- geographical grids using lon/lat
- (x, y) in a plane of a foldable figure of the polyhedron (e.g. DGGRID projtri)
- Good for polygons: Just need to check the boundaries, i.e. 2 intervals with 4 points for a bounding box. Every index between the interval borders are contained as well.
3D indices
- no projection at all :)
1D index of HEALPIX can be ordered in two different ways
- RING: Good for global patterns, Fourier Transformation
- Nested: Good for local patterns, Wavelet Transformation, Neighbor search
2D index for hexagonal convolutions
- HexagDLy uses offset coordinates
- Uber uses axial coordinates
- HEXACONV gives theoretical background about this
  - group convolution instead of just translational convolution: Rotational equivariance
  - higher degree of symmetry: Less parameters needed to train the network
  - uses offset coords for most efficient data storage [Foo et al.](https://www.sciencedirect.com/science/article/pii/S2352711018302723#b12
- downside of axial coordinates: Space inefficient due to paralellogram. We don't care here because we store global data and zarr allows empty chunks

Storage index

should be 2D to be optimized for ANN kernels (Bounding box queries) and chunking (allowing tiling), index interval is always a continous interval in geo space as well
PROJTRI: (face, x, y) triangles that need further processing to be stored in a matrix
Q2DI: (quad, i, j)
- hexagons alter between pointy top (Class I, i axis horizontal) and flat top (Class II, j axis vertical)
- integers are lossless to compute
- files from DGGRID are small
- But some points have are on a 3rd face
- Already a rectangle (Rhombus shear done by DGGRID)
Q2DD: (quad, x, y) floats are big and lossy, only 2 faces per map, rhombus

Spatiotemporal DGGS

Natural coordinate dimensions are both space and time
Both can be aggregated and can be stored in different resolutions
Important use case: Simulation outputs where all time and space points are available
Save a separate dataset for each combination of time and space resolution in GEMS DKRZ healpix data
DGGS shttps://docs.ogc.org/as/20-040r3/20-040r3.html

File format

based on zarr: Cloud optimized, multi dimenional, flexible chunking

Map storage

data is stored in a list of maps
each map is a rectangular data cube
each map may contain multiple neighboring polyhedron faces (e.g. 2 triangles of an icosahedron)
each map has a matrix as locations. This allows more local chunking i.e. tiling. One could also make an array of arrays with incremental lengths for a triangular face on an icosahedron, but this would only allow stripes but not tiles.

Grid definition

The polyhedron to be used (One of the 5 platonic solids)
The orientation of the polyhedron relative to the sphere
The radius of the sphere (e.g. authalic earth)
list of faces
mappings from faces to maps
grid for each map: origin vector, unit vectors (grid lengths), n grid points for each direction, list of undef points
Programm to be used for grid construction (e.g. DGGRID v 7.8)
Other meta data as Climate and Forecast (CF) attributes

Extending DGGS

Up to now: The grid defines only the 2D surface of the earth.

Depth

Need altitude e.g. to represent ocean tides
Now: Equal volume of cells
Depth: Put a vector of values at cell center: https://www.mdpi.com/2220-9964/9/4/233
https://www.mdpi.com/2220-9964/9/4/233

Flux

Want to describe movement from one cell to a neighbor.
Need to index not cells but edges between two cells
Useful to describe streams of water, wind and molecules
Uber H3 can be extended to address edges (not implemented yet)

Vector fields

Any works on representing speed and force relative to cells?
We have a vector (x,y,z) on cartesian coordinates for every point in cartesean 3D space. Do we need to reproject the vector to axes of the DGGS? Which axis in a hexagonal grid if data i stored in offset coords?

Time pyramids

What agout providing cached aggragations not only ove space (resolution) but also time?

Rectangular data

Data should be shaped in tensors with 2 spatial dimensions
- To save it in a n dimensional array
- To use it in CNN
- To view it using bbox queries

Neural Networks

quasi hexagonal kernel combining multiple rectangular kernels
Hexaconv just one rectangular kernel, axial coords, some edges have weight 0

Staggering

Arakawa and Lamb 1977
Collins et al. 2013
A unstaggered: Just store the variable at the center of a cell
B t to E: store also variables at corner or mid points of edges
C: e.g. store wind speed at center and masses at vertices, becomes more and more popular
vertical staggering: e.g. no flux at top or bottom
Can we just overlay ISEA4H, ISEA4D and ISEA4T to archive staggering?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

NOTES.md

NOTES.md

DGGS

managing multiple CRS (local projections)

data cube

zarr

Abstract

Current state

Definition of a DGGS

Related works

Google S2

Uber H3

DGGRID

rHEALPix

openEAGER

Database management systems

Cloud optimized GeoTIff (COG)

DWD / MPI-M ICON grid

Shapes

Indices

Dimensionality

Storage index

Spatiotemporal DGGS

File format

Map storage

Grid definition

Extending DGGS

Depth

Flux

Vector fields

Time pyramids

Rectangular data

Neural Networks

Staggering

Files

NOTES.md

Latest commit

History

NOTES.md

File metadata and controls

DGGS

managing multiple CRS (local projections)

data cube

zarr

Abstract

Current state

Definition of a DGGS

Related works

Google S2

Uber H3

DGGRID

rHEALPix

openEAGER

Database management systems

Cloud optimized GeoTIff (COG)

DWD / MPI-M ICON grid

Shapes

Indices

Dimensionality

Storage index

Spatiotemporal DGGS

File format

Map storage

Grid definition

Extending DGGS

Depth

Flux

Vector fields

Time pyramids

Rectangular data

Neural Networks

Staggering