Pandas interface for observation timeseries #4

lkangas · 2020-09-18T12:05:39Z

Having the data output as a flat pandas dataframe greatly improves usability and versatility of the open data output.

Example of turning a result from a weather observations multipointcoverage result into a DF:

rows = []

for date in obs.data.keys():
    for loc in obs.data[date].keys():
        row = obs.data[date][loc].copy()
        for key in row.keys():
            row[key] = row[key]['value']
        row['date'] = date
        row['location'] = loc
        rows.append(row)

df = pd.DataFrame(rows)```

adriennn · 2021-03-02T21:58:35Z

the above doesn't work as such, the times columns is a list, so you'll need to do check if the entry bein processed has the 'values' key.

mikaelhg · 2022-08-04T07:53:14Z

Here's what I've come up with:

args = ['timeseries=True', f"starttime={start_time}", f"endtime={end_time}"]
obs = download_stored_query(query, args=args)

cols = set([v for p in obs.data for v in obs.data[p]])
cols.remove('times')

dfs = []
for name in obs.data:
    data = {k: obs.data[name][k]['values'] for k in cols}
    idx = pd.DatetimeIndex(name='hour', data=obs.data[name]['times'])
    idx0 = pd.CategoricalIndex(name='place', data=[name]*idx.size)
    df = pd.DataFrame(data=data, index=[idx0, idx], columns=cols, dtype='float64')
    dfs.append(df)

df = pd.concat(dfs)

df.attrs.update({'location_metadata': obs.location_metadata})

df.to_parquet('data/airquality.parquet')

If you wanted to compress this, unfortunately Pandas and numpy don't offer compressed fixed point formats, but you can use int16 and multiply the values by 10 to achieve that.

Alternative indexing:

    mi = pd.MultiIndex.from_product([[name], obs.data[name]['times']], names=['place', 'hour'])
    df = pd.DataFrame(data=data, index=mi, columns=cols, dtype='float64')

pnuu self-assigned this Oct 6, 2020

pnuu added the enhancement New feature or request label Oct 6, 2020

pnuu added this to the v0.4.0 milestone Oct 6, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas interface for observation timeseries #4

Pandas interface for observation timeseries #4

lkangas commented Sep 18, 2020 •

edited

Loading

adriennn commented Mar 2, 2021

mikaelhg commented Aug 4, 2022 •

edited

Loading

Pandas interface for observation timeseries #4

Pandas interface for observation timeseries #4

Comments

lkangas commented Sep 18, 2020 • edited Loading

adriennn commented Mar 2, 2021

mikaelhg commented Aug 4, 2022 • edited Loading

lkangas commented Sep 18, 2020 •

edited

Loading

mikaelhg commented Aug 4, 2022 •

edited

Loading