Skip to content

Conversation

oleibman
Copy link
Collaborator

Fix #4537. PhpSpreadsheet currently changes apostrophes in text values to '. This is perfectly valid Xml. Issue was opened because R does not handle this correctly; this is unquestionably a bug on R's part. So I was not inclined to do anything about it. However ...

User suggested a change to how htmlspecialchars was called. Investigating the use of that routine in PhpSpreadsheet, I found that there was some double escaping going on for cells whose type was set to TYPE_INLINE - htmlspecialchars escaped the string correctly, but it was later written as Xml using a method which escaped the data a second time. So, a real bug in PhpSpreadsheet after all.

There was one call to htmlspecialchars in Shared\XmlWriter. I replaced writeRaw(htmlspecialchars(...)) with text(...). And one call in Writer\Xlsx\Worksheet, the source of the double escaping bug above; the call to htmlspecialchars can just be eliminated there.

Making those changes, the only remaining calls to htmlspecialchars are in Writer\Html, where they belong. As a bonus, apostrophes now wind up unescaped, so R will be satisfied (even though they should fix their bug).

This is:

  • a bugfix
  • a new feature
  • refactoring
  • additional unit tests

Checklist:

Fix PHPOffice#4537. PhpSpreadsheet currently changes apostrophes in text values to `'`. This is perfectly valid Xml. Issue was opened because R does not handle this correctly; this is unquestionably a bug on R's part. So I was not inclined to do anything about it. However ...

User suggested a change to how `htmlspecialchars` was called. Investigating the use of that routine in PhpSpreadsheet, I found that there was some double escaping going on for cells whose type was set to `TYPE_INLINE` - `htmlspecialchars` escaped the string correctly, but it was later written as Xml using a method which escaped the data a second time. So, a real bug in PhpSpreadsheet after all.

There was one call to `htmlspecialchars` in `Shared\XmlWriter`. I replaced `writeRaw(htmlspecialchars(...))` with `text(...)`. And one call in `Writer\Xlsx\Worksheet`, the source of the double escaping bug above; the call to `htmlspecialchars` can just be eliminated there.

Making those changes, the only remaining calls to `htmlspecialchars` are in `Writer\Html`, where they belong. As a bonus, apostrophes now wind up unescaped, so R will be satisfied (even though they should fix their bug).
@oleibman oleibman enabled auto-merge July 22, 2025 03:52
@oleibman oleibman added this pull request to the merge queue Jul 22, 2025
Merged via the queue into PHPOffice:master with commit 994b072 Jul 22, 2025
13 of 14 checks passed
@oleibman oleibman deleted the issue4537 branch July 22, 2025 04:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

Saving XLSX: apros is changed to '

1 participant