Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fallback to arrow defaults when loading dataset with custom features that aren't registered locally #7223

Open
alex-hh opened this issue Oct 12, 2024 · 0 comments

Comments

@alex-hh
Copy link
Contributor

alex-hh commented Oct 12, 2024

Describe the bug

Datasets allows users to create and register custom features.

However if datasets are then pushed to the hub, this means that anyone calling load_dataset without registering the custom Features in the same way as the dataset creator will get an error message.

It would be nice to offer a fallback in this case.

Steps to reproduce the bug

load_dataset("alex-hh/custom-features-example")

(Dataset creation process - must be run in separate session so that NewFeature isn't registered in session in which download is attempted:)

from dataclasses import dataclass, field
import pyarrow as pa
from datasets.features.features import register_feature

from datasets import Dataset, Features, Value, load_dataset
from datasets import Feature

@dataclass
class NewFeature(Feature):
    _type: str = field(default="NewFeature", init=False, repr=False)
    def __call__(self):
        return pa.int32()

def examples_generator():
    for i in range(5):
        yield {"feature": i}

ds = Dataset.from_generator(examples_generator, features=Features(feature=NewFeature()))
ds.push_to_hub("alex-hh/custom-features-example")
register_feature(NewFeature, "NewFeature")

Expected behavior

It would be nice, and offer greater extensibility, if there was some kind of graceful fallback mechanism in place for cases where user-defined features are stored in the dataset but not available locally.

Environment info

3.0.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant