Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzz large offsets through sequence compression api #3447

Merged

Conversation

daniellerozenblit
Copy link
Contributor

@daniellerozenblit daniellerozenblit commented Jan 23, 2023

This PR introduces a few changes to the sequence compression api fuzzer in order to better test cases with large offsets without generating many MB of input data.

  • Rather than generating a new dictionary of max size 256KB with each fuzzer call, we now generate a huge global dictionary of size 1 << ZSTD_WINDOWLOG_MAX_32 that is reused between calls. The dictionary is calloc'd in order to fit the memory constraints of the oss-fuzz environment.
  • We now use a maximum window size of ZSTD_WINDOWLOG_MAX, rather than ZSTD_WINDOWLOG_MAX_32.
  • We now generate the literalBuffer using a randomly generated seed, rather than using the same seed (0) each time.

This PR also fixes a bug exposed by #3439 and ensures that the seqStore bounds check is accurate forZSTD_copySequencesToSeqStoreNoBlockDelim().

  • We now increment the sequence idx after checking whether or not we have room to store an additional sequence.

Comment on lines 334 to 342
FILE* dictFile;
ZSTD_compressionParameters cParams;

/* Generate a large dictionary file and mmap to buffer */
generateDictFile(ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, producer);
dictFile = fopen(ZSTD_FUZZ_DICT_FILE, "r");
dictBuffer = mmap(NULL, ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, PROT_READ, MAP_PRIVATE, fileno(dictFile), 0);
FUZZ_ASSERT(dictBuffer);
fclose(dictFile);
Copy link
Contributor

@terrelln terrelln Jan 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daniellerozenblit lets simplify this a little bit and just calloc() the dictBuffer.

Suggested change
FILE* dictFile;
ZSTD_compressionParameters cParams;
/* Generate a large dictionary file and mmap to buffer */
generateDictFile(ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, producer);
dictFile = fopen(ZSTD_FUZZ_DICT_FILE, "r");
dictBuffer = mmap(NULL, ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, PROT_READ, MAP_PRIVATE, fileno(dictFile), 0);
FUZZ_ASSERT(dictBuffer);
fclose(dictFile);
dictBuffer = calloc(ZSTD_FUZZ_GENERATED_DICT_MAXSIZE, 1);

This should drastically reduce the memory usage, and should make the initialization basically free, as long as we never write to this memory (which we don't).

Any generality we lose through this simplification should be worth it for the simplicity. And we can always extend it later if we need to.

@daniellerozenblit daniellerozenblit force-pushed the fuzz-sequence-compression branch from 260dc7e to 7fc00c1 Compare January 23, 2023 18:42
@daniellerozenblit daniellerozenblit force-pushed the fuzz-sequence-compression branch from eeb6bd5 to fe06ffa Compare January 23, 2023 21:44
@@ -6327,7 +6327,7 @@ ZSTD_copySequencesToSeqStoreNoBlockDelim(ZSTD_CCtx* cctx, ZSTD_sequencePosition*
/* Move to the next sequence */
endPosInSequence -= currSeq.litLength + currSeq.matchLength;
startPosInSequence = 0;
idx++;
idx++; /* Next Sequence */
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this check is right when the else branch is taken and we split the sequence (we set finalMatchSplit = 1; on line 6352. Since we don't increment idx but still store a sequence.

You could simplify this a bit by moving the idx++ to the bottom of the loop and doing:

if (!finalMatchSplit)
    ++idx;

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks, great catch!

@daniellerozenblit daniellerozenblit force-pushed the fuzz-sequence-compression branch from fe06ffa to 7d600c6 Compare January 24, 2023 14:41
@daniellerozenblit daniellerozenblit marked this pull request as ready for review January 24, 2023 16:29
@daniellerozenblit daniellerozenblit marked this pull request as draft January 24, 2023 16:43
@daniellerozenblit daniellerozenblit marked this pull request as ready for review January 24, 2023 16:59
Copy link
Contributor

@terrelln terrelln left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome!

@daniellerozenblit daniellerozenblit merged commit f3255bf into facebook:dev Jan 25, 2023
@daniellerozenblit daniellerozenblit deleted the fuzz-sequence-compression branch March 8, 2023 15:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants