Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up PandasDataset further #2441

Merged
merged 4 commits into from
Nov 23, 2022
Merged

Conversation

lostella
Copy link
Contributor

@lostella lostella commented Nov 16, 2022

Description of changes: More speedup following #2435. Testing on Python 3.8.13 with the data from #2363, using the following code:

from pathlib import Path
from time import time
import pandas as pd
from tqdm import tqdm

from gluonts.dataset.pandas import PandasDataset

df = pd.read_parquet(Path(__file__).resolve().parent / "long_df_sample.parquet")

t0 = time()
ds = PandasDataset.from_long_dataframe(
    dataframe=df,
    item_id="item_id",
    timestamp="timestamp",
    freq="M",
)
t1 = time()
print(f"construction time: {t1 - t0}")

N = 3

t0 = time()
for _ in range(N):
    for entry in tqdm(ds):
        pass
t1 = time()
print(f"average iteration time: {(t1 - t0)/N}")

Before the PR:

construction time: 0.3739509582519531
100%|█████████████████████████████████████████████| 25000/25000 [00:10<00:00, 2294.39it/s]
100%|█████████████████████████████████████████████| 25000/25000 [00:06<00:00, 3660.79it/s]
100%|█████████████████████████████████████████████| 25000/25000 [00:06<00:00, 3618.71it/s]
average iteration time: 8.413320620854696

After the PR:

construction time: 0.3727140426635742
100%|█████████████████████████████████████████████| 25000/25000 [00:08<00:00, 2825.26it/s]
100%|█████████████████████████████████████████████| 25000/25000 [00:05<00:00, 4383.24it/s]
100%|█████████████████████████████████████████████| 25000/25000 [00:05<00:00, 4380.71it/s]
average iteration time: 6.958552281061809

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Please tag this pr with at least one of these labels to make our release process faster: BREAKING, new feature, bug fix, other change, dev setup

@lostella lostella changed the title Speed up PandasDataset Speed up PandasDataset further Nov 16, 2022
test/dataset/test_pandas.py Outdated Show resolved Hide resolved
@lostella lostella requested a review from jaheba November 16, 2022 15:43
@lostella
Copy link
Contributor Author

I think this requires improving tests before it can be merged: for each use case for PandasDataset, we should assert that each dataset element is exactly what we expect (field names, types, shapes).

@lostella lostella added the BREAKING This is a breaking change (one of pr required labels) label Nov 16, 2022
@lostella lostella changed the title Speed up PandasDataset further Speed up PandasDataset further Nov 17, 2022
@lostella lostella added enhancement New feature or request pending v0.11.x backport This contains a fix to be backported to the v0.11.x branch and removed BREAKING This is a breaking change (one of pr required labels) labels Nov 17, 2022
@lostella lostella marked this pull request as draft November 17, 2022 09:07
@lostella lostella marked this pull request as ready for review November 23, 2022 10:26
@lostella lostella enabled auto-merge (squash) November 23, 2022 12:35
@lostella lostella merged commit 89bc84c into awslabs:dev Nov 23, 2022
@lostella lostella deleted the even-faster-pandas branch November 24, 2022 08:58
lostella added a commit to lostella/gluonts that referenced this pull request Nov 24, 2022
@lostella lostella mentioned this pull request Nov 24, 2022
lostella added a commit that referenced this pull request Nov 24, 2022
* Add test cases for `PandasDataset`, fix missing assertion (#2453)

* Speed up `PandasDataset` further (#2441)

* Fix MANIFEST.in (#2456)

* fix path for backport in MANIFEST.in
@lostella lostella removed the pending v0.11.x backport This contains a fix to be backported to the v0.11.x branch label Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants