Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support controllable CutSet.mux weights in multiprocess dataloading #1266

Merged
merged 10 commits into from
Jan 23, 2024

Conversation

pzelasko
Copy link
Collaborator

@pzelasko pzelasko commented Jan 22, 2024

Several changes:

  • as in the title, it's possible to use torch's shared memory tensor to supply mux weights and change them in a way that syncs across processes
  • simplified DurationBatcher sampling logic (no changes in sampling behavior)
  • fixed inconsistency between time constraint exceeded() and close_to_exceeding() (I think it was reported in some issue)
  • leveraging dill for CutSet/Sampler inter-process serialization now has to be explicitly enabled with LHOTSE_DILL_ENABLED=1; the library is now less dependent on dill for making CutSet transforms work across main/dataloading processes (you'd only need it if you as the user provide lambdas instead of global functions / partials to map/filter-style functions)

@pzelasko pzelasko added this to the v1.20.0 milestone Jan 22, 2024
@pzelasko
Copy link
Collaborator Author

Merging, the test that failed is flaky (will have to figure out how to fix it separately, it apparently relies on downloading silero vad...)

@pzelasko pzelasko merged commit c678849 into master Jan 23, 2024
7 of 8 checks passed
@pzelasko pzelasko deleted the feature/controllable-weights branch January 23, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant