Skip to content

DataLoader transducers reducible API #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 16, 2022

Conversation

ancapdev
Copy link
Contributor

With the set of runtime options available to DataLoader, the iteration protocol is not type stable.

This PR adds Transducers reducible support to DataLoader, to invert control flow, and type stabilize over iteration, as an alternative API.

Not directly related, I've added some inbounds propagation where I found it missing.


dloader = DataLoader(1:1000; batchsize = 2, shuffle = true)
@test copy(Map(x -> x[1]), Vector{Int}, dloader) != collect(1:2:1000)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So now the @inferred tests in this file are passing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't, I've not made any changes to the behavior of the iterator API. Good point on inference in general though, I changed __foldl__ to infer correctly and added a test for it.

@codecov-commenter
Copy link

codecov-commenter commented Apr 15, 2022

Codecov Report

Merging #75 (0746521) into main (4243d09) will decrease coverage by 1.19%.
The diff coverage is 76.92%.

@@            Coverage Diff             @@
##             main      #75      +/-   ##
==========================================
- Coverage   90.36%   89.16%   -1.20%     
==========================================
  Files          13       13              
  Lines         498      526      +28     
==========================================
+ Hits          450      469      +19     
- Misses         48       57       +9     
Impacted Files Coverage Δ
src/eachobs.jl 83.33% <67.85%> (-13.55%) ⬇️
src/batchview.jl 81.25% <100.00%> (ø)
src/observation.jl 86.95% <100.00%> (ø)
src/obsview.jl 65.78% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4243d09...0746521. Read the comment docs.

@ancapdev
Copy link
Contributor Author

Found remaining allocations during iteration was due to the _check_numobs() call. Eliminated that from _batchrange() and made it elide for @inbounds on getobs().

Any chance for final review and merge soon?

@ToucheSir
Copy link
Contributor

I don't understand the Fold-iverse very well, but could this help simplify some of the code in https://github.com/JuliaML/MLUtils.jl/blob/main/src/parallel.jl as well?

@ancapdev
Copy link
Contributor Author

I don't understand the Fold-iverse very well, but could this help simplify some of the code in https://github.com/JuliaML/MLUtils.jl/blob/main/src/parallel.jl as well?

I suspect it may, but I haven't used the parallel transducers API myself yet

@CarloLucibello
Copy link
Member

I don't quite have the feeling of how often in practice the transducer protocol would be used and if that it is worth carrying another package dependence

@lorenzoh
Copy link
Contributor

lorenzoh commented May 1, 2022

Haven't followed this closely, just wanted to point out that the code in parallel.jl already uses the foldiverse (just a more imperative frontend in FLoops.jl) and as such we also already have a transitive dependency on Transducers.jl.

@ancapdev
Copy link
Contributor Author

ancapdev commented Jun 4, 2022

Any chance to consider merging this, please?

@CarloLucibello
Copy link
Member

Since we already indirectly depend on Transducers.jl it would be ok to also support its API, so we can do this. But the dataloader implementation has changed a bit in the meanwhile, there are merge conflicts. Also depending on the outcome of the discussion in #90 the DataLoader could be moved out of this repo.

@CarloLucibello
Copy link
Member

@ancapdev we decided to keep the Transducer dependency, if you want to rebase this PR we can merge it

@CarloLucibello CarloLucibello merged commit 74f7d4e into JuliaML:main Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants