Skip to content

Xref table expose#1292

Merged
BobLd merged 10 commits into
UglyToad:masterfrom
vafle228:xref-table-expose
May 18, 2026
Merged

Xref table expose#1292
BobLd merged 10 commits into
UglyToad:masterfrom
vafle228:xref-table-expose

Conversation

@vafle228

@vafle228 vafle228 commented May 12, 2026

Copy link
Copy Markdown
Contributor

Description

This PR addresses a long-standing limitation for developers needing to implement incremental PDF updates (e.g., for digital signatures, form filling, or appending objects without full re‑saving). Currently, the PdfDocument.Structure exposes limited low‑level access, because the CrossReferenceTable (old property) is missing, and the core token writer lacks support for writing a custom xref table with /Prev trailer.

The changes in this PR restore and extend the low-level API to fully support reading and writing both normal xref tables and xref streams, as well as working with trailer dictionaries. This makes it possible to:

  • Access the raw CrossReferenceTablePart entries (object number, offset, generation, type).
  • Read/write the trailer dictionary (including /Prev, /Size, /Root, /Info).
  • Append new objects while keeping the original xref section intact.
  • Implement full incremental PDF update workflows.

Changes Made

  1. CrossReferenceTable is now fully exposed via PdfDocument.Structure.CrossReferenceTable
    (previously it was removed). It provides:
    • ObjectOffsets – mapping from IndirectReference to XrefLocation.
    • Parts - single instance of xref table
  2. TrailerDictionary now exposes /Prev, /Size, /Root, /Info (and all raw dictionary entries).
  3. TokenWriter has been extended with a new method:
    public void WriteCrossReferenceTable(
        IReadOnlyDictionary<IndirectReference, long> objectOffsets,
        IndirectReference catalogToken,
        Stream outputStream,
        IndirectReference? documentInformationReference,
        long? prevTableLocation);   // 👈 new parameter for incremental save
  4. Added AdvancedMerge.cs example of low level tools usage

@BobLd BobLd left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can also un-comment the asserts in PdfMergerTests

Comment thread src/UglyToad.PdfPig.Tokens/DictionaryToken.cs Outdated
public IReadOnlyDictionary<IndirectReference, long> ObjectOffsets { get; }
public IReadOnlyDictionary<IndirectReference, XrefLocation> ObjectOffsets { get; }

public long Offset { get; private set; }

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for private set;, read only

Comment thread src/UglyToad.PdfPig/CrossReference/CrossReferenceTablePart.cs Outdated
public class CrossReferenceTable
{
private readonly Dictionary<IndirectReference, long> objectOffsets;
private readonly List<CrossReferenceTablePart> parts;

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to CrossReferenceTablePart[], no need for a list from what I see

foreach (var objectOffset in objectOffsets)
this.objectOffsets = objectOffsets.ToDictionary(kvp => kvp.Key, kvp => kvp.Value);
var result = new List<CrossReferenceOffset>();

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use a CrossReferenceOffset[] instead

for (var i = 1; i < parts.Count; i++)
{
result[objectOffset.Key] = objectOffset.Value;
result.Add(new CrossReferenceOffset(parts[i].Offset, parts[i - 1].Previous));

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you double check that it should not be the below instead:

foreach (var part in parts)
{
      result.Add(new CrossReferenceOffset(part.Offset, part.Previous));
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank for that

@BobLd

BobLd commented May 13, 2026

Copy link
Copy Markdown
Collaborator

I have review your PR, let me know if any question, great job!

@BobLd

BobLd commented May 13, 2026

Copy link
Copy Markdown
Collaborator

Also, review AdvancedMerge and make sure it's referenced into Program.cs

@vafle228 vafle228 requested a review from BobLd May 14, 2026 08:02
@vafle228

Copy link
Copy Markdown
Contributor Author

@BobLd Thank you for the review! I’ve applied all the suggested changes:

  • Replaced Empty() method with static readonly Empty field.
  • Removed unnecessary private set; from read‑only properties.
  • Changed List<CrossReferenceTablePart> to array.
  • Simplified CrossReferenceOffset creation logic using foreach.
  • Verified AdvancedMerge is now referenced in Program.cs for the samples.

Could you please take another look when you have time? Thanks again!

@BobLd BobLd left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also review the AdvanceMerge example? It produces a blank page. Can you also add this example as an integration test, if possible?

Comment thread examples/Program.cs Outdated
using var output = new FileStream(Path.Combine(filesDirectory, "AdvancedMergeResult.pdf"), FileMode.Create);
using var input = File.Open(Path.Combine(filesDirectory, "Single Page Simple - from google drive.pdf"), FileMode.Open);

input2.CopyToAsync(output);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

input2.CopyToAsync(output); is not awaited, input2.CopyTo(output); should be enough

/// <param name="catalogToken">The object representing the catalog dictionary which is referenced from the trailer dictionary.</param>
/// <param name="outputStream">The output stream to write to.</param>
/// <param name="documentInformationReference">The object reference for the document information dictionary if present.</param>
/// /// <param name="prevTableLocation">The offset to the previous xref table if present</param>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate ///

Comment thread examples/Program.cs Outdated
() =>
{
using var input2 = File.Open(Path.Combine(filesDirectory, "EmptyPdf.pdf"), FileMode.Open);
using var output = new FileStream(Path.Combine(filesDirectory, "AdvancedMergeResult.pdf"), FileMode.Create);

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's not write the output document to the integration tests folder. new FileStream("AdvancedMergeResult.pdf") should be enough

@BobLd

BobLd commented May 16, 2026

Copy link
Copy Markdown
Collaborator

@vafle228 I've added some more comments, let me know if any question. Can you also add an integration test based on AdvanceMerge? Thanks again

vafle228 added 2 commits May 18, 2026 10:41
- Also added nice property for calculating Prev
@vafle228 vafle228 requested a review from BobLd May 18, 2026 07:40
@vafle228

Copy link
Copy Markdown
Contributor Author

@BobLd Thank you for the new comments! My bad with the AdvancedMerge.cs example. I forgot to link resources, so readers couldn't render the content properly. I fixed this in the last two commits. Also I added the integration test, as you asked for. Let me know if there are any issues left.

@BobLd

BobLd commented May 18, 2026

Copy link
Copy Markdown
Collaborator

note to self:

Trailer-merging semantics changed silently. The deleted builder special-cased /Size so a linearized PDF's first trailer wins. The new FirstPassParser walks all parts and prefers the dictionary that has /Root. This is reasonable but is a behavior change, worth a targeted test on a linearized file to confirm /Size hasn't regressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants