Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

build(deps): update datasets requirement from <=2.12.0,>=2.0.0 to >=2.0.0,<=2.13.0 in /requirements #1609

Merged

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Jun 19, 2023

Updates the requirements on datasets to permit the latest version.

Release notes

Sourced from datasets's releases.

2.13.0

Dataset Features

  • Add IterableDataset.from_spark by @​maddiedawson in huggingface/datasets#5770

    • Stream the data from your Spark DataFrame directly to your training pipeline
    from datasets import IterableDataset
    from torch.utils.data import DataLoader
    ids = IterableDataset.from_spark(df)
    ids = ids.map(...).filter(...).with_format("torch")
    for batch in DataLoader(ids, batch_size=16, num_workers=4):
    ...

  • IterableDataset formatting for PyTorch, TensorFlow, Jax, NumPy and Arrow:

    from datasets import load_dataset
    ids = load_dataset("c4", "en", split="train", streaming=True)
    ids = ids.map(...).with_format("torch")  # to get PyTorch tensors - also works with tf, np, jax etc.

  • Add IterableDataset.from_file to load local dataset as iterable by @​mariusz-jachimowicz-83 in huggingface/datasets#5893

    from datasets import IterableDataset
    ids = IterableDataset.from_file("path/to/data.arrow")

  • Arrow dataset builder to be able to load and stream Arrow datasets by @​mariusz-jachimowicz-83 in huggingface/datasets#5944

    from datasets import load_dataset
    ds = load_dataset("arrow", data_files={"train": "train.arrow", "test": "test.arrow"})

Experimental

General improvements and bug fixes

... (truncated)

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot dependabot bot requested review from ethanwharris and Borda as code owners June 19, 2023 12:04
@dependabot dependabot bot added the enhancement New feature or request label Jun 19, 2023
@dependabot dependabot bot requested a review from a team June 19, 2023 12:04
@Borda Borda enabled auto-merge (squash) June 19, 2023 12:27
@codecov
Copy link

codecov bot commented Jun 19, 2023

Codecov Report

Merging #1609 (4d45ad7) into master (61ba676) will increase coverage by 22%.
The diff coverage is n/a.

Additional details and impacted files
@@           Coverage Diff            @@
##           master   #1609     +/-   ##
========================================
+ Coverage      62%     84%    +22%     
========================================
  Files         291     291             
  Lines       12876   12876             
========================================
+ Hits         7972   10792   +2820     
+ Misses       4904    2084   -2820     

@dependabot dependabot bot force-pushed the dependabot-pip-requirements-datasets-gte-2.0.0-and-lte-2.13.0 branch from 3aad2f5 to 4263d3f Compare June 19, 2023 14:25
Updates the requirements on [datasets](https://github.com/huggingface/datasets) to permit the latest version.
- [Release notes](https://github.com/huggingface/datasets/releases)
- [Commits](huggingface/datasets@2.0.0...2.13.0)

---
updated-dependencies:
- dependency-name: datasets
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot force-pushed the dependabot-pip-requirements-datasets-gte-2.0.0-and-lte-2.13.0 branch from 4263d3f to 4d45ad7 Compare June 19, 2023 15:44
@mergify mergify bot removed the has conflicts label Jun 19, 2023
@Borda Borda merged commit 18ff71e into master Jun 19, 2023
@Borda Borda deleted the dependabot-pip-requirements-datasets-gte-2.0.0-and-lte-2.13.0 branch June 19, 2023 16:38
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant