Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML search: make the search index immutable at load-time #13098

Open
jayaddison opened this issue Nov 3, 2024 · 6 comments
Open

HTML search: make the search index immutable at load-time #13098

jayaddison opened this issue Nov 3, 2024 · 6 comments
Labels
html search type:proposal a feature suggestion

Comments

@jayaddison
Copy link
Contributor

jayaddison commented Nov 3, 2024

Is your feature request related to a problem? Please describe.
When Sphinx projects are built as HTML, they generally include a search index file loaded by the included JavaScript search functionality.

Search is by-nature a feature that accepts user-controlled input. I have been considering this recently during #13096, and also as part of a related (not-yet-published) security report I've filed that fortunately I've subsequently concluded contains no actual vulnerability.

I think it could be worthwhile to make the contents of the HTML search index immutable and to remove references to the JavaScript Object prototype from them, with the objective of reducing the possibility that user-controlled input could affect the index and therefore the integrity/quality of results.

Describe the solution you'd like

  • When loading the contents of the searchindex.js file:
    • Remove prototypes from objects (with the exception of Array objects (?)).
    • Freeze the entries in the index so that they become immutable/unmodifiable.

Describe alternatives you've considered

  • Adjusting the format of the searchindex.js file from JavaScript script to a JSON file.
    • JSON.parse emits objects that include JavaScript object prototype references and that are mutable (non-frozen), so in isolation this appears not to be a complete solution
    • The JSON.rawJSON function was also considered; this resolves the concerns with JSON.parse. However: it appears not to be widely-deployed yet (it is not available in Firefox or Safari, for example. It became available in Chrome 114 and Edge 114, both released ~Jun 2023).
  • Prototype-removal / freezing could be relocated to occur within the searchindex.js file itself.
    • This could be implemented as an alternative, or in conjunction with the feature proposed here.

Additional context
I plan to offer a pull request to implement this feature alongside this feature request.

Also, I'm aware that other JavaScript on the page could potentially bypass a mechanism like this; and contrastingly, perhaps there are valid use-cases for modifying the search index at runtime. Despite those, I feel that it could be worth removing the prototypes and making the index read-only, but this could be debatable.

Edit: add a question-mark asking whether allowing Array prototype references to be retained is OK or not.

@jayaddison
Copy link
Contributor Author

...
Describe alternatives you've considered
...
* The JSON.rawJSON function was also considered; this resolves the concerns with JSON.parse. However: it appears not to be widely-deployed yet (it is not available in Firefox or Safari, for example. It became available in Chrome 114 and Edge 114, both released ~Jun 2023).

Self-correction: JSON.rawJSON appears to be limited to primitive types; it cannot construct arrays or objects (in fact, this is highlighted by some bold text in the introduction on that page, too. I missed both of those previously).

@jayaddison
Copy link
Contributor Author

...
Describe alternatives you've considered
...

  • The JSON.rawJSON function was also considered; this resolves the concerns with JSON.parse. However: it appears not to be widely-deployed yet (it is not available in Firefox or Safari, for example. It became available in Chrome 114 and Edge 114, both released ~Jun 2023).

Self-correction: JSON.rawJSON appears to be limited to primitive types; it cannot construct arrays or objects (in fact, this is highlighted by some bold text in the introduction on that page, too. I missed both of those previously).

There is a tc39 (ECMAscript technical committee) proposal for a JSON.parseImmutable(...) function that appears relevant, though: https://github.com/tc39/proposal-json-parseimmutable

@jayaddison
Copy link
Contributor Author

Remove prototypes from objects (with the exception of Array objects).

I'm having second thoughts about this exception case. Perhaps we should in fact remove the prototype from Array objects too (this would require some other small changes). I'll wait to gather feedback about that.

@jayaddison
Copy link
Contributor Author

There is a tc39 (ECMAscript technical committee) proposal for a JSON.parseImmutable(...) function that appears relevant, though: https://github.com/tc39/proposal-json-parseimmutable

A challenge with this approach would be that a record-and-tuple based representation would no longer be a subset of JSON, so we'd require a parser to adapt our loading of a searchindex from file. I've raised a somewhat-related question/proposal at tc39/proposal-record-tuple#391 (tl;dr - would a single # character be enough to annote a many-object datastructure as immutable? if so, we could simply adjust our PREFIX value to Search.setIndex(#).

@wlach
Copy link
Contributor

wlach commented Nov 14, 2024

FWIW I'm skeptical about this. As you noted, any other JavaScript loaded on the page could still mess with the index (including overwriting it altogether with whatever), no matter how we "freeze" it. I don't really see the use case here?

@jayaddison
Copy link
Contributor Author

Reducing the likelihood with which the search index content could be modified or provide access to other parts of the application, either accidentally or maliciously, is essentially what I'm requesting.

User input here necessarily follows code paths that access the datastructure -- and I believe that's safe as-currently-implemented -- but I'd have greater confidence in that if I knew (again, to a reasonable degree; I understand it's not guaranteed) that the search index contents were immutable and absent of prototype references.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
html search type:proposal a feature suggestion
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants