Skip to content

LibGfx/JBIG2Writer+Tests/LibGfx: Refinement of symbol strip refines#26510

Merged
nico merged 1 commit intoSerenityOS:masterfrom
nico:jbig2-refine-strip
Dec 29, 2025
Merged

LibGfx/JBIG2Writer+Tests/LibGfx: Refinement of symbol strip refines#26510
nico merged 1 commit intoSerenityOS:masterfrom
nico:jbig2-refine-strip

Conversation

@nico
Copy link
Contributor

@nico nico commented Dec 29, 2025

This adds writer support for refinement of a symbol dictionary entry that is itself a strip refinement.

This case is a bit awkward for the writer: To write the refinement, we need the bitmap that's being refined. But since that's a strip refine, we don't have that bitmap, we only have the parts it consists of.

As #26411 explains, the approach of calling into the loader to do the strip decoding doesn't work here: We need the decoded strip refinement while writing the strip itself. So we'd have to send a partially encoded strip to the loader, which isn't something we can currently do.

But since strip refinements only use a limited subset of full text strips (always a background color of 0, always a reference corner of topleft, always a composition operator of "or"), reimplementing the loader's logic for this case, while conceptually not nice, is not a lot of code. So let's just do that.

Using this, also add a test for this feature. I believe this tests the last missing refinement case :^)

The test uses regular symbols for the eyes (symbols 0 and 1) and the mouth (symbol 6).

The nose, 100x100 pixels, is conceptually split into 60x60, 40x60, 60x40, 40x40 tiles. Since symbols are sorted by increasing height, the latter two are symbols 2 and 3, the former are symbols 4 and 5.

Symbol 3 isn't actually the bottom right part of the nose, but contains the pixel data of an eye. This is for testing refinement below.

To test strip background color filling, the 60x60 tile is actually 59x60 pixels. And to test refinement, the top left tile is actually solid white instead of the top left part of the nose.

The first refinement symbol (id 7) refines the top left tile to a 59x60 tile that's the top left corner of the nose, but shifted several pixels to the right.

The second refinement symbol (id 8) is a strip refinement. It puts symbols 7 and 5 next to each other, to produce the top half of the nose (with the left part being shifted to the side).

The third refinement symbol (id 9) is a simple refinement of symbol 8 refines it to the actual top of the nose (to fix the shift, and to test refinement of strips in a symbol dictionary).

The fourth refinement symbol (id 10) is a strip refinement that puts symbol 9 in the first strip, with an offset of 1, and symbols 2 and 3 in a second strip. This means symbol 10 is almost the nose, but its bottom right contains the pixel data of an eye, since that's what symbol 3 contained.

Finally, the text region refines symbol 10 to the actual nose pixels, to test refinement of strips from a text region.

Whew!

Since the text region refines to the nose, it's useful to locally remove that refinement to test that the reference bitmap looks as expected. Similarly, it's useful to locally make the final strip refinement put symbol 8 instead of symbol 9 in the first strip, to make sure the input to symbol 9 looks as expected.

The test file decodes fine in all viewers I've tried (PDFium, Preview.app, pdf.js, mutool). (This case is much more awkward for the writer than for the loader, after all.)

@github-actions github-actions bot added the 👀 pr-needs-review PR needs review from a maintainer or community member label Dec 29, 2025
@BuggieBot
Copy link
Member

Hello!

One or more of the commit messages in this PR do not match the SerenityOS code submission policy, please check the lint_commits CI job for more details on which commits were flagged and why.
Please do not close this PR and open another, instead modify your commit message(s) with git commit --amend and force push those changes to update this PR.

This adds writer support for refinement of a symbol dictionary entry
that is itself a strip refinement.

This case is a bit awkward for the writer: To write the refinement, we
need the bitmap that's being refined. But since that's a strip refine,
we don't have that bitmap, we only have the parts it consists of.

As SerenityOS#26411 explains, the approach of calling into the loader to do
the strip decoding doesn't work here: We need the decoded strip
refinement while writing the strip itself. So we'd have to send
a partially encoded strip to the loader, which isn't something we
can currently do.

But since strip refinements only use a limited subset of full text
strips (always a background color of 0, always a reference corner
of topleft, always a composition operator of "or"), reimplementing
the loader's logic for this case, while conceptually not nice, is
not a lot of code. So let's just do that.

Using this, also add a test for this feature. I believe this tests
the last missing refinement case :^)

The test uses regular symbols for the eyes (symbols 0 and 1) and the
mouth (symbol 6).

The nose, 100x100 pixels, is conceptually split into 60x60, 40x60,
60x40, 40x40 tiles. Since symbols are sorted by increasing height, the
latter two are symbols 2 and 3, the former are symbols 4 and 5.

Symbol 3 isn't actually the bottom right part of the nose, but
contains the pixel data of an eye. This is for testing refinement
below.

To test strip background color filling, the 60x60 tile is actually
59x60 pixels. And to test refinement, the top left tile is actually
solid white instead of the top left part of the nose.

The first refinement symbol (id 7) refines the top left tile to a 59x60
tile that's the top left corner of the nose, but shifted several pixels
to the right.

The second refinement symbol (id 8) is a strip refinement. It puts
symbols 7 and 5 next to each other, to produce the top half of the nose
(with the left part being shifted to the side).

The third refinement symbol (id 9) is a simple refinement of symbol 8
refines it to the actual top of the nose (to fix the shift, and to
test refinement of strips in a symbol dictionary).

The fourth refinement symbol (id 10) is a strip refinement that puts
symbol 9 in the first strip, with an offset of 1, and symbols 2 and 3
in a second strip. This means symbol 10 is almost the nose, but its
bottom right contains the pixel data of an eye, since that's what
symbol 3 contained.

Finally, the text region refines symbol 10 to the actual nose pixels,
to test refinement of strips from a text region.

Whew!

Since the text region refines to the nose, it's useful to locally remove
that refinement to test that the reference bitmap looks as expected.
Similarly, it's useful to locally make the final strip refinement put
symbol 8 instead of symbol 9 in the first strip, to make sure the input
to symbol 9 looks as expected.

The test file decodes fine in all viewers I've tried (PDFium,
Preview.app, pdf.js, mutool). (This case is much more awkward for the
writer than for the loader, after all.)
@nico nico force-pushed the jbig2-refine-strip branch from 7cc7bad to 1b82acc Compare December 29, 2025 00:23
@nico nico merged commit d15f574 into SerenityOS:master Dec 29, 2025
20 of 21 checks passed
@nico nico deleted the jbig2-refine-strip branch December 29, 2025 12:28
@github-actions github-actions bot removed the 👀 pr-needs-review PR needs review from a maintainer or community member label Dec 29, 2025
nico added a commit to nico/serenity that referenced this pull request Jan 27, 2026
When we don't export some symbols, symbol indices are different in
different symbol dictionary sections.

For example, in the included test, segment 1 exports 7 symbols, and
segment 2 refers to it, but only exports 3 of these 7 symbols, and it
only exports 1 of the 4 symbols itself defines. So within segment 2,
indices 0-10 are valid.  But in segment 3, which refers to segment 2,
only indices 0-3 are valid, and they refer to the symbols that have
indices 0, 1, 6, and 10 in segment 2.

This is mostly a problem for the writer (see SerenityOS#26510 / d15f574).

When the text region refines its symbol 3, the code used to look up the
symbol ids that it referred to. But those indices were in the namespace
of segment 2, but we used the symbol list from segment 3.

As a fix, don't store symbol dictionary results as a
SymbolDictionarySegmentData::HeightClass::Symbol but just as a
BilevelImage.  For regular symbols and regular refined symbols, it
doesn't make a difference.  Strip-refined symbols are converted to
bitmaps eagerly instead of only when they are refined, but in return
this needs less code, it handles indices correctly, and chains of
strip-refined symbols are no longer O(n^2) (...not that these happen in
practice).

The new test is the same as bitmap-symbol-symbolrefine-textrefine.json
(or bitmap-symbol-symhuffrefine-textrefine.json), but the last
symbol_region sets `export_flags_for_referred_to_symbols` to not export
temporary symbols from its referred-to symbol dictionaries, and it sets
`"exported"` to false for its internal symbols (and symbol_ids are
adjusted to match).
nico added a commit to nico/serenity that referenced this pull request Jan 27, 2026
When we don't export some symbols, symbol indices are different in
different symbol dictionary sections.

For example, in the included test, segment 1 exports 7 symbols, and
segment 2 refers to it, but only exports 3 of these 7 symbols, and it
only exports 1 of the 4 symbols itself defines. So within segment 2,
indices 0-10 are valid.  But in segment 3, which refers to segment 2,
only indices 0-3 are valid, and they refer to the symbols that have
indices 0, 1, 6, and 10 in segment 2.

This is mostly a problem for the writer (see SerenityOS#26510 / d15f574).

When the text region refines its symbol 3, the code used to look up the
symbol ids that it referred to. But those indices were in the namespace
of segment 2, but we used the symbol list from segment 3.

As a fix, don't store symbol dictionary results as a
SymbolDictionarySegmentData::HeightClass::Symbol but just as a
BilevelImage.  For regular symbols and regular refined symbols, it
doesn't make a difference.  Strip-refined symbols are converted to
bitmaps eagerly instead of only when they are refined, but in return
this needs less code, it handles indices correctly, and chains of
strip-refined symbols are no longer O(n^2) (...not that these happen in
practice).

The new test is the same as bitmap-symbol-symbolrefine-textrefine.json
(or bitmap-symbol-symhuffrefine-textrefine.json), but the last
symbol_region sets `export_flags_for_referred_to_symbols` to not export
temporary symbols from its referred-to symbol dictionaries, and it sets
`"exported"` to false for its internal symbols (and symbol_ids are
adjusted to match).
nico added a commit to nico/serenity that referenced this pull request Jan 27, 2026
When we don't export some symbols, symbol indices are different in
different symbol dictionary sections.

For example, in the included test, segment 1 exports 7 symbols, and
segment 2 refers to it, but only exports 3 of these 7 symbols, and it
only exports 1 of the 4 symbols itself defines. So within segment 2,
indices 0-10 are valid.  But in segment 3, which refers to segment 2,
only indices 0-3 are valid, and they refer to the symbols that have
indices 0, 1, 6, and 10 in segment 2.

This is mostly a problem for the writer (see SerenityOS#26510 / d15f574).

When the text region refines its symbol 3, the code used to look up the
symbol ids that it referred to. But those indices were in the namespace
of segment 2, but we used the symbol list from segment 3.

As a fix, don't store symbol dictionary results as a
SymbolDictionarySegmentData::HeightClass::Symbol but just as a
BilevelImage.  For regular symbols and regular refined symbols, it
doesn't make a difference.  Strip-refined symbols are converted to
bitmaps eagerly instead of only when they are refined, but in return
this needs less code, it handles indices correctly, and chains of
strip-refined symbols are no longer O(n^2) (...not that these happen in
practice).

The new test is the same as bitmap-symbol-symbolrefine-textrefine.json
(or bitmap-symbol-symhuffrefine-textrefine.json), but the last
symbol_region sets `export_flags_for_referred_to_symbols` to not export
temporary symbols from its referred-to symbol dictionaries, and it sets
`"exported"` to false for its internal symbols (and symbol_ids are
adjusted to match).
nico added a commit to nico/serenity that referenced this pull request Jan 27, 2026
When we don't export some symbols, symbol indices are different in
different symbol dictionary sections.

For example, in the included test, segment 1 exports 7 symbols, and
segment 2 refers to it, but only exports 3 of these 7 symbols, and it
only exports 1 of the 4 symbols itself defines. So within segment 2,
indices 0-10 are valid.  But in segment 3, which refers to segment 2,
only indices 0-3 are valid, and they refer to the symbols that have
indices 0, 1, 6, and 10 in segment 2.

This is mostly a problem for the writer (see SerenityOS#26510 / d15f574).

When the text region refines its symbol 3, the code used to look up the
symbol ids that it referred to. But those indices were in the namespace
of segment 2, but we used the symbol list from segment 3.

As a fix, don't store symbol dictionary results as a
SymbolDictionarySegmentData::HeightClass::Symbol but just as a
BilevelImage.  For regular symbols and regular refined symbols, it
doesn't make a difference.  Strip-refined symbols are converted to
bitmaps eagerly instead of only when they are refined, but in return
this needs less code, it handles indices correctly, and chains of
strip-refined symbols are no longer O(n^2) (...not that these happen in
practice).

The new test is the same as bitmap-symbol-symbolrefine-textrefine.json
(or bitmap-symbol-symhuffrefine-textrefine.json), but the last
symbol_region sets `export_flags_for_referred_to_symbols` to not export
temporary symbols from its referred-to symbol dictionaries, and it sets
`"exported"` to false for its internal symbols (and symbol_ids are
adjusted to match).
nico added a commit that referenced this pull request Jan 27, 2026
When we don't export some symbols, symbol indices are different in
different symbol dictionary sections.

For example, in the included test, segment 1 exports 7 symbols, and
segment 2 refers to it, but only exports 3 of these 7 symbols, and it
only exports 1 of the 4 symbols itself defines. So within segment 2,
indices 0-10 are valid.  But in segment 3, which refers to segment 2,
only indices 0-3 are valid, and they refer to the symbols that have
indices 0, 1, 6, and 10 in segment 2.

This is mostly a problem for the writer (see #26510 / d15f574).

When the text region refines its symbol 3, the code used to look up the
symbol ids that it referred to. But those indices were in the namespace
of segment 2, but we used the symbol list from segment 3.

As a fix, don't store symbol dictionary results as a
SymbolDictionarySegmentData::HeightClass::Symbol but just as a
BilevelImage.  For regular symbols and regular refined symbols, it
doesn't make a difference.  Strip-refined symbols are converted to
bitmaps eagerly instead of only when they are refined, but in return
this needs less code, it handles indices correctly, and chains of
strip-refined symbols are no longer O(n^2) (...not that these happen in
practice).

The new test is the same as bitmap-symbol-symbolrefine-textrefine.json
(or bitmap-symbol-symhuffrefine-textrefine.json), but the last
symbol_region sets `export_flags_for_referred_to_symbols` to not export
temporary symbols from its referred-to symbol dictionaries, and it sets
`"exported"` to false for its internal symbols (and symbol_ids are
adjusted to match).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants