Skip to content

Eliminate redundant CT_Sst.si parallel list in SharedStringsTable write path#1793

Merged
tonyqus merged 3 commits into
masterfrom
copilot/refactor-shared-strings-table-write-path
Apr 7, 2026
Merged

Eliminate redundant CT_Sst.si parallel list in SharedStringsTable write path#1793
tonyqus merged 3 commits into
masterfrom
copilot/refactor-shared-strings-table-write-path

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 6, 2026

  • Understand codebase structure
  • Add internal static bool UseDirectWrite { get; set; } = true; to SharedStringsTable
  • In WriteTo, branch on UseDirectWrite: when false rebuild sst.si from strings and call _sstDoc.Save() (old path)
  • Update SSTWriteBenchmark.cs: split into two benchmark methods — one per path — for side-by-side comparison
  • Build succeeds — 0 errors
  • All 7 TestSharedStringsTable tests pass
Original prompt

Background

PR #1728 introduced lazy-loading and a streaming XmlReader-based parser for SharedStringsTable. However, the write path still has a redundant intermediate layer:

  • SharedStringsTable.strings (a List<CT_Rst>) holds all shared string entries
  • _sstDoc.GetSst().si (another List<CT_Rst>) holds the exact same object references in parallel
  • On write, WriteTo() goes through _sstDoc.Save()sst.siCT_Rst.Write(), when it could directly iterate strings

Changes Required

1. ooxml/XSSF/Model/SharedStringsTable.cs

a) WriteTo — bypass _sstDoc entirely:

Replace:

public void WriteTo(Stream out1)
{
    EnsureLoaded();
    CT_Sst sst = _sstDoc.GetSst();
    sst.count = count;
    sst.uniqueCount = uniqueCount;
    _sstDoc.Save(out1);
}

With a direct streaming write that iterates strings instead of sst.si:

public void WriteTo(Stream out1)
{
    EnsureLoaded();
    var sw = new StreamWriter(out1, Encoding.UTF8);
    sw.Write("<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"yes\" ?><sst xmlns=\"http://schemas.openxmlformats.org/spreadsheetml/2006/main\"");
    sw.WriteAttribute("count", count);
    sw.WriteAttribute("uniqueCount", uniqueCount);
    sw.Write('>');
    foreach (CT_Rst si in strings)
        si.Write(sw, "si");
    sw.Write("</sst>");
    sw.Flush();
}

b) ReadFromStreamViaXmlReader — remove the dual-write to sst.si:

In the EndElement"si" case (around line 366-372), remove the sst.si.Add(currentSi) line, keeping only strings.Add(currentSi):

case "si":
    if (currentSi != null)
    {
        // sst.si.Add(currentSi);  ← DELETE THIS LINE
        strings.Add(currentSi);
    }
    currentSi = null;
    break;

Since sst.si is no longer consumed by WriteTo, the local CT_Sst sst = _sstDoc.GetSst(); variable at the top of ReadFromStreamViaXmlReader is also no longer needed and can be removed. The _sstDoc initialization lines at the top of ReadFromStreamViaXmlReader can also be removed (the field _sstDoc is still needed for the UseXmlReader = false / DOM fallback path via ReadFromStreamViaXmlDocument, so do NOT remove the _sstDoc field itself).

c) AddEntry — remove the dual-write to sst.si:

Remove the _sstDoc.GetSst().si.Add(newSt) line:

uniqueCount++;
CT_Rst newSt = new CT_Rst();
// _sstDoc.GetSst().si.Add(newSt);  ← DELETE THIS LINE
newSt.Set(st);
int idx = strings.Count;
stmap[s] = idx;
strings.Add(newSt);
_dirty = true;
return idx;

2. benchmarks/NPOI.Benchmarks/LargeSSTBenchmark.cs

Add a new benchmark method that measures the write path for a dirty SST — create a workbook with many unique shared strings and write it to a stream. This lets us compare allocation and throughput before and after the change.

Add a new benchmark class or new methods to LargeSSTBenchmark:

/// <summary>
/// Creates a workbook with <see cref="WriteRowCount"/> rows of unique string cells
/// (forcing SST dirty) and writes it to a MemoryStream.
/// Measures the write-side cost of SharedStringsTable serialization.
/// </summary>
[Benchmark]
public void XSSFWorkbookWriteLargeSst()
{
    using var workbook = new XSSFWorkbook();
    var sheet = workbook.CreateSheet();
    for (int i = 0; i < WriteRowCount; i++)
        sheet.CreateRow(i).CreateCell(0).SetCellValue("UniqueString_" + i);
    using var ms = new MemoryStream();
    workbook.Write(ms, leaveOpen: true);
}

[Params(10_000, 100_000)]
public int WriteRowCount { get; set; }

Place this as a new benchmark class SSTWriteBenchmark in a new file benchmarks/NPOI.Benchmarks/SSTWriteBenchmark.cs with [ShortRunJob] and [MemoryDiagnoser] attributes.

Invariants to preserve

  • Do NOT remove UseXmlReader, ReadFromStreamViaXmlDocument, _sstDoc field, or any related code — these must remain intact for the DOM fallback path.
  • The EnsureLoaded() call in WriteTo must remain — it ensures lazy-loaded SST is populated before writing.
  • All existing tests must continue to pass.
  • The ReadFromStreamViaXmlDocument path still populates _sstDoc.GetSst().si (not strings) — this is a known pre-existing limitation of that path and should NOT be changed in this PR.

Goal

Eliminate the redundant parallel List<CT_Rst> population in the streaming read path and write path, reducing per-entry allocations and indirect call overhead during WriteTo.

The following is the prior conversation context from the user's chat exploration (may be truncated):

Assistant: 现在我对整个写入路径有了完整的认识。让我分析一下:

Assistant: 现在我已经掌握了完整的代码结构,可以给出详尽的分析和方案建议。


当前状态分析

PR #1728 已经做了以下优化:

  1. 读取侧:流式 XmlReader 解析替代 DOM,跳过了 XmlDocument 对象
  2. 脏标记(_dirty:未修改的 SST 直接透传原始字节,跳过整个序列化

写入侧_dirty = true 时)的路径仍然存在一个冗余的中间层:

WriteTo(Stream)
  → EnsureLoaded()          // 确保已加载 CT_Rst 列表
  → _sstDoc.GetSst()        // 拿到 CT_Sst(sst.si 与 strings 两个列表同步存储)
  → _sstDoc.Save(stream)    //...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

*This pull request was created from Copilot chat.*
>

…te path

- WriteTo: bypass _sstDoc.Save(), iterate strings list directly for streaming write
- ReadFromStreamViaXmlReader: remove _sstDoc init and sst.si.Add(currentSi) dual-write
- AddEntry: remove _sstDoc.GetSst().si.Add(newSt) dual-write
- CT_Rst.Write: change internal to public so ooxml assembly can call it directly
- Add SSTWriteBenchmark for write-side performance comparison

Agent-Logs-Url: https://github.com/nissl-lab/npoi/sessions/89e6fe6d-1f73-414c-9962-5790e867a50e

Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>
Copilot AI changed the title [WIP] Refactor write path for SharedStringsTable to eliminate redundancy Eliminate redundant CT_Sst.si parallel list in SharedStringsTable write path Apr 6, 2026
Copilot AI requested a review from tonyqus April 6, 2026 23:08
- SharedStringsTable.UseDirectWrite = true (default): direct streaming write from strings list
- UseDirectWrite = false: legacy path — rebuild sst.si from strings, call _sstDoc.Save()
- SSTWriteBenchmark: split into XSSFWorkbookWriteLargeSstDirectWrite and XSSFWorkbookWriteLargeSstLegacy for side-by-side comparison

Agent-Logs-Url: https://github.com/nissl-lab/npoi/sessions/9a4c0e1a-4eda-4e84-8552-ea404c7db16c

Co-authored-by: tonyqus <772561+tonyqus@users.noreply.github.com>
@tonyqus tonyqus modified the milestone: NPOI 2.8.1 Apr 7, 2026
@tonyqus tonyqus marked this pull request as ready for review April 7, 2026 01:10
@tonyqus tonyqus merged commit 7c86779 into master Apr 7, 2026
1 of 2 checks passed
@tonyqus
Copy link
Copy Markdown
Member

tonyqus commented Apr 7, 2026

/oc run new sst write benchmark and show me the result

@opencode-agent
Copy link
Copy Markdown
Contributor

opencode-agent Bot commented Apr 7, 2026

Results from SSTWriteBenchmark (.NET 10, ShortRun, 3 iterations):

Method Rows Mean Allocated
DirectWrite 10K 77.29 ms 34.85 MB
Legacy 10K 67.28 ms 35.1 MB
DirectWrite 100K 870.98 ms 329.96 MB
Legacy 100K 883.75 ms 331.96 MB

At 100K rows, the direct write path is ~1.4% faster and allocates ~2 MB less (~0.6% reduction). The difference is modest because the benchmark measures full workbook.Write(), and cell/row creation dominates. The allocation delta comes purely from eliminating the redundant sst.si list repopulation — at 100K strings that's ~2 MB of CT_Rst references not allocated.

The 10K results are noisy (ShortRun, only 3 iterations) — the ~15% speed regression there is within the error margin.

New%20session%20-%202026-04-07T04%3A58%3A38.252Z
opencode session  |  github run

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants