Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting appears to be lost for code-blocks from markdown sources with Sphinx v8.2.0+/e65bbb96a and myst_parser #13207

Closed
jfbu opened this issue Jan 4, 2025 · 11 comments

Comments

@jfbu
Copy link
Contributor

jfbu commented Jan 4, 2025

Describe the bug

After updating Sphinx on Jan 4, 2025, it seems highlighting for code-blocks originating in markdown sources is not applied (myst_parser is used). I first observed this in a more complicated way in a big project with extra hacks to customize rendering, but could reduce it to a fairly minimal one using only myst_parser and highlight_language (the latter because due to original code-blocks use latex but the problem is not related to the latex builder).

How to Reproduce

Unzip the small attached project (see #13207 (comment)) and execute make html O="-E" with current release Sphinx 8.1.0 or a checkout of current dev at at least Sphinx v8.2.0+/e65bbb96a. In the latter case, highlighting is lost for the code in markdown file. With parent commit Sphinx v8.2.0+/c7eaf175e, the highlighting is there.

Environment Information

Sphinx version: 8.2.0+/e65bbb96a
Python version: 3.12.3 (CPython)
Docutils version: 0.21.2 
Pygments version: 2.18.0
Jinja2 version: 3.1.4
MySTparser version: 3.0.1

UPDATE: the problem also shows with myst-parser 4.0.0.

Sphinx extensions

extensions = [
    'myst_parser',
]

Additional context

Relates #13151

@jfbu jfbu added the type:bug label Jan 4, 2025
@jfbu
Copy link
Contributor Author

jfbu commented Jan 4, 2025

Sorry I forgot to attach zip file here it is
foo.zip

@jfbu
Copy link
Contributor Author

jfbu commented Jan 4, 2025

I am adding a simpler mwe not using highlight_language, to show problem arises also with default Python as highlight.
13207_b.zip

@jfbu jfbu changed the title Highlighting appears to be lost for code-blocks from markdown sources with Sphinx v8.2.0+/e65bbb96a and myst_parser 3.0.1 Highlighting appears to be lost for code-blocks from markdown sources with Sphinx v8.2.0+/e65bbb96a and myst_parser Jan 4, 2025
@electric-coder
Copy link

electric-coder commented Jan 4, 2025

I looked at the MRE but I didn't build it because I'm not a myst user. Having read the conf.py and the testhl.md I notice that no language code is specified in the source fence (the myst documentation always includes the language code at the opening of the source fence)... So how would Sphinx or myst know for what specific language to apply syntax highlight in that block? IOW:

    # test highlighting for code-block in markdown source

    This is a `markdown` file.

    Is this code-block highlighted?

    ```
    def foo(x):
        return x * x
    ```

usually wouldn't this have to be written as(?):

    ```python
    def foo(x):
        return x * x
    ```

Just pointing this out as its worth mentioning.

@jfbu
Copy link
Contributor Author

jfbu commented Jan 4, 2025

@electric-coder for info, with python added explicitly as you suggest, indeed the highlighting appears. But it was not needed prior to merge of #13151, and furthermore, the value of highlight_language was formerly obeyed. To provide the context, in the big project with this html output where the problem first showed, a specific custom lexer was used, and some extras added to the highlight code an additional "output rendition" using a distant LaTeX rendering server. I originally suspected somehow our usage of setup(app: Sphinx) in our big conf.py and support files had gone awry, but was relieved in the end to reduce to the MRE. To be honest I have not tried at all yet to understand what is the root cause, not knowing if one should look on Sphinx or MyST-parser side first.

@electric-coder
Copy link

electric-coder commented Jan 4, 2025

with python added explicitly as you suggest, indeed the highlighting appears.

(...)

prior to merge of PR 13151 (...) the value of highlight_language was formerly obeyed.

Standard MD source fences aren't part of the usual reST options for Showing code examples. I'm making a further educated guess that PR 13151 caused something to break on the myst side, likely the mechanism that reads and applies the more general Sphinx highlight_language config value to their MD source fences.

not knowing if one should look on Sphinx or MyST-parser side first.

Given how massive PR #13151 and the underlying issue #13072 are, it might be better to cross-post this at the myst repo as they'll want a heads up and in all likelihood it'll be easier for them to apply a fix/update on their side.

but was relieved in the end to reduce to the MRE

I know that feeling :|

@AA-Turner
Copy link
Member

AA-Turner commented Jan 5, 2025

The simplest reproducer is thus:

from sphinx.environment import _CurrentDocument
temp_data = {}
current_document = _CurrentDocument()
assert temp_data.get('highlight_language', ...) is ...
assert current_document.get('highlight_language', ...) == ''  # not ...!

#13151 changed a few instances of temp_data.get('name', config.fallback) to current_document.name or config.fallback, as the defaults are falsy. What I didn't account for is that get will now always return a value, instead of the fallback.

-node['language'] = self.env.temp_data.get(
-    'highlight_language', self.config.highlight_language
-)
+node['language'] = (
+    self.env.current_document.highlight_language
+    or self.config.highlight_language
+)

A

@AA-Turner
Copy link
Member

cc @chrisjsewell @picnixz

Myst could apply the following change:

-            name = self.sphinx_env.temp_data.get(
-                "highlight_language", self.sphinx_env.config.highlight_language
-            )
+            name = (
+                self.sphinx_env.temp_data.get("highlight_language")
+                or self.sphinx_env.config.highlight_language
+            )

I'm struggling initially to think of a nice way to change _CurrentDocument to preserve the get behaviour. Perhaps we could maintain an internal set of attributes that have been set via __set{attr,item}__? This is a bit annoying, as I had wanted _CurrentDocument to be transparently backwards-compatible to the temp_data dictionary.

Let me know any thoughts/ideas.

A

@picnixz
Copy link
Member

picnixz commented Jan 5, 2025

Let me know any thoughts/ideas.

How about reverting the _CurrentDocument PR for now until we find a good alternative?

Perhaps we could maintain an internal set of attributes that have been set via set{attr,item}

It will probably defeat the purpose of the new class. However, what you can do is to have an _UNSET("default") to be used. If you do doc.attr you'll get the default. If you do doc.get("attr", "other") and it's _UNSET, you would get "other" instead of the default. And if you do doc.get("attr") I think we should still return None to be backward compatible =/

At least, you won't need to maintain a separate set of attributes.

@AA-Turner
Copy link
Member

How about reverting the _CurrentDocument PR for now until we find a good alternative?

I'd prefer not to take drastic action...

Note that temp_data has never been documented, so we can rely on the 'not public API' defence. I know that Chris disagrees with me on this one, though.

I haven't yet found an instance of 'breakage' (failure to build), for example the problem here with myst is a quality-of-life one rather than critical. But it is less than ideal.

A

@chrisjsewell
Copy link
Member

Heya guys, FYI myst only uses it once, so I'm ok to put in backport logic there for this

@AA-Turner
Copy link
Member

Closing as MyST will fix.

A

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants