Skip to content

Conversation

Abdul-Omira
Copy link

Add a new 'Streaming best practices' page with practical patterns and pitfalls for large-scale/production use of IterableDataset. Includes examples for batched map with remove_columns, deterministic shuffling with set_epoch, multi-worker sharding, checkpoint/resume, and persistence to Parquet/Hub. Linked from How-to > General usage, next to Stream.

Abdulwahab Omira and others added 2 commits August 22, 2025 19:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant