Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Improve readers by parallelizing I/O and compute operations #5401

Draft
wants to merge 37 commits into
base: dev
Choose a base branch
from

Conversation

ypatia
Copy link
Member

@ypatia ypatia commented Dec 6, 2024

TODO

[sc-59605]

TYPE: NO_HISTORY | FEATURE | BUG | IMPROVEMENT | DEPRECATION | C_API | CPP_API | BREAKING_BEHAVIOR | BREAKING_API | FORMAT
DESC:

ypatia and others added 30 commits November 29, 2024 15:28
This removes the read from waiting on all I/O operations and instead
moves the I/O task to be owned by the datablock itself. If the I/O
threadpool task is valid, we block on data access. This lets I/O and
compute be interleaved by only blocking on data when its ready to be
processed and allows for better background data loading.
This allows for copying the task/future an enabled multiple threads to
check the status of the task in a thread-safe manner.
…checking.

While the ThreadPool::SharedTask is designed to be used by multiple
threads, its designed for copying. The data structure itself is not
thread safe.

A recursive mutext is needed because some functions like load_chunk_data
call back into filtered_data() and would deadlock. This could be handled by
also release the locking in load_chunk_data(), but a recursive_mutex is
used for better safety against deadlocks.
This is needed because we need to access the data buffer from inside the
unfiltering task to unfilter into. We can't block on unfiltering being
done from inside the unfiltering task so we need different accessors
which let us bypass the check on if the unfiltering task is completed.
This is needed because zip_coordinates is called from the unfilter task
itself.
@ypatia ypatia marked this pull request as ready for review December 7, 2024 15:05
@ypatia ypatia closed this Dec 7, 2024
@ypatia ypatia reopened this Dec 7, 2024
@ypatia ypatia marked this pull request as draft December 7, 2024 15:09
@ypatia ypatia marked this pull request as ready for review December 9, 2024 08:32
Base automatically changed from yt/sc-59606/threadpool_with_tasks to dev December 9, 2024 08:36
@ypatia ypatia marked this pull request as draft December 9, 2024 10:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants