-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Async write for decompression #2975
Conversation
This should probably support compression too. And be overridden if the number of threads is explicitly set with |
That's a good point. |
Yeah, I think we should combine the functionality with We can default compression to Currently, Point being, we should think carefully about the interactions between |
This seems strange to me, likely an overlook. Another candidate policy could be "last one wins", meaning that Lastly, a last policy could be to refuse to work in presence of a discrepancy. I tend to prefer the first policy ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this is looking great!
Could you also add some benchmark figures to the PR summary?
programs/fileio.c
Outdated
POOL_ctx* writerPool; | ||
ZSTD_pthread_mutex_t writeJobsMutex; | ||
void* jobs[DECOMPRESSION_MAX_WRITE_JOBS]; | ||
volatile int availableWriteJobs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
availableWriteJobs
does not need to be volatile.
But can you add a newline before writeJobsMutex
and after availableWriteJobs
, and a comment explaining that the mutex protects jobs
and availableWriteJobs
?
programs/fileio.c
Outdated
@@ -2147,6 +2244,82 @@ FIO_fwriteSparseEnd(const FIO_prefs_t* const prefs, FILE* file, unsigned storedS | |||
} } | |||
} | |||
|
|||
static void FIO_writePoolSetDstFile(write_pool_ctx_t *ctx, FILE* dstFile) { | |||
assert(ctx!=NULL); | |||
// We can change the dst file only if we have finished writing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: These need to be /* */
comments. //
comments are C99.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
programs/fileio.c
Outdated
return job; | ||
} | ||
|
||
static void FIO_writePoolCreateThreadPool(write_pool_ctx_t *ctx, const FIO_prefs_t *prefs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: naming: Can you name the functions FIO_WritePool_createThreadPool()
? Or just WritePool_createThreadPool()
since they're all static
. That makes it clear its the function createThreadPool()
for the object/concept/group WritePool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
programs/fileio.c
Outdated
} | ||
} | ||
|
||
static write_pool_ctx_t* FIO_writePoolCreate(FIO_prefs_t* const prefs) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WritePool_create()
programs/fileio.c
Outdated
@@ -2147,6 +2244,82 @@ FIO_fwriteSparseEnd(const FIO_prefs_t* const prefs, FILE* file, unsigned storedS | |||
} } | |||
} | |||
|
|||
static void FIO_writePoolSetDstFile(write_pool_ctx_t *ctx, FILE* dstFile) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
WritePool_setDstFile()
, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
programs/fileio.c
Outdated
FIO_WritePoolWriteJobExecute(job); | ||
} | ||
|
||
static void FIO_writePoolQueueWriteJobAndGetNextAvailable(write_job_t **job) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all the functions you intend to be "public", e.g. not the helper functions only used by the other WritePool
functions, could you add documentation? You could also add docs for the helpers, but that is less necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
I agree that If we decide that |
If we decide that There is a catch though. Thoughts ? |
Could you add a test featuring multi-frames files ? Bonus point for files with multiple heterogeneous frames (also currently supported). |
I've actually tested this manually, but it makes sense to add an automated test. Will do it. |
Updating here that we've spoken about
|
tests/playTests.sh
Outdated
testAsyncIO lz4 | ||
addTwoFrames lz4 | ||
fi | ||
cat tmp_uncompressed | shasum > tmp2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use $MD5SUM
to designate the checksum program,
which generally translates into md5sum
, but not always (freeBSD uses gmd5sum
, macos uses md5 -r
, etc.).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, pushed a fix.
7d64b55
to
2ce5c4c
Compare
I like the new code path. The last issue is about the It may seem a unrelated secondary topic, but we'll nonetheless need a solution to this situation before the merge. |
c3817cc
to
bfc188d
Compare
|
- Added --[no-]asyncio flag for CLI decompression. - Replaced dstBuffer in decompression with a pool of write jobs. - Added an ability to execute write jobs in a separate thread. - Added an ability to wait (join) on all jobs in a thread pool (queue and running).
bfc188d
to
30152d4
Compare
Changes:
--[no-]asyncio
flag for CLI decompression.Benchmarked total runtime (real time) of decompression on 3 machines, these are the runtime improvements in %:
Notes: