-
-
Notifications
You must be signed in to change notification settings - Fork 317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow optional namespace before tag name. #648
base: master
Are you sure you want to change the base?
Conversation
This especially makes me inclined to adjust the spec; at the very least we should be consistent, and I imagine it would be rare indeed that CommonMark users are including text like @jgm Thoughts? This is a very small and low-impact change. |
I've been trying to find some direct examples which might support this proposal. In the current spec, there is a mention of DocBook. Well, it turns out the latest iteration of DocBook uses namespaces. https://docbook.org/docs/howto/howto.html#introduction-ns In addition, I can think of use cases combining things like |
As your babelmark example shows, commonmark conforming parsers treat |
Good catch! Technically, it should be possible to disambiguate links from tags because tags should either be self-closing or or have a matching closed tag. That being said, I think it's an important consideration. Playing around with the issue you raise, it appears this issue exists even without namespaced tags: https://babelmark.github.io/?text=%3Csamuel%40oriontransfer.net%3E requires the parse to have some basic level of disambiguation (i.e. as defined by the spec). |
Maybe additional disambiguation rules are required for https://babelmark.github.io/?text=%3Cht%3A%2F%2Fwww.google.com%3E vs https://babelmark.github.io/?text=%3Ch%3A%2F%2Fwww.google.com%3E Not sure what logic is being applied here. |
The |
Do you think this takes precedence over autolinks? |
Maybe one solution, as we have a list of block level HTML tags, is to have a list of recognised schemes for |
I personally don’t prefer adding another list of supported values to CM, as it increases the minimum memory/size footprint of all conforming markdown parsers. For docbook: having an namespace (such as given with
I would personally rather remove constructs that aren’t used in HTML anymore: such as CDATA and processing instructions, than start supporting more XML |
Generally I'm very sympathetic to the request; however, there's the issue of autolinks. See https://talk.commonmark.org/t/what-is-the-point-of-limiting-uri-schemes-in-autolinks/555/13 |
Thank's for everyone's feedback.
My tag structure needs to use more elaborate namespaces, e.g. <content:youtube-video id="..." />
<gallery:photos path="..." /> The namespaces are "mounted" by the rendering engine and cause tag expansion before generating the final HTML. So naturally, I want to write some markdown like this: ## Introduction Video
<content:youtube-video id="..." /> I'm okay with the following, completely unambiguous forms:
and
However to disambiguate the latter, the block parser would need to scan ahead and the inline parser would need to look for a matching closing tag. The need for a space on the self closing form is a bit of a hack, but acceptable, because in most cases, if you provide attributes, it's not needed. |
One other option is we could expect users to provide a list of namespaces which could be available as metadata: e.g.
This would completely disambiguate the parse. |
@jgm can I get your feedback on the above proposal? |
The spec has declarative style: it says, "such and such is an X." Not: first, try parsing as an X, and if that doesn't work, try as a Y. So the ambiguity of So I don't yet see a proposal that would work with the current spec. We could easily modify things to allow one-character namespaces, but you don't seem satisfied with that. Making parsing parameterized on a list of namespaces provided by the user would also break the style of the spec -- currently it's self-contained and doesn't depend on externally provided lists -- unless the list is defined in the document itself. I take it that's what you were gesturing at with
but of course this already has a clearly defined meaning in commonmark (reference link definitions). I wonder whether your needs might be met in other ways? For example,
are already parsed as raw HTML. |
It means standard way for everything else and a different way for markdown. I cannot copy code from existing document and insert it into markdown without rewriting the namespaces.
Fair enough, it makes sense. I assume you mean that the parser should not need to backtrack? I was under the (maybe wrong) impression that auto links already have to deal with this, i.e. if there is whitespace, it cannot be an auto-link. As I suggested, I'd be happy with this, i.e.
Yes, namespace are a kind of link, so I think this usage would be reasonable. Yes, it would need to be self contained and preceed the usage IMHO. Maybe a different syntax would be okay, e.g. following the
Generators that understand XML could use these links when generating the output document. Nested HTMLOne solution which appears to work is something like this:
However I would like to avoid inserting additional divs. It also doesn't work for inline elements:
Taking your example, why not allow something like this:
or more specifically
It should extract one entire HTML block. I don't know how you could do it for inline elements. I think using SummaryHere are the possible options:
For 2c, here is what I think:
Is Markdown an exclusively HTML output format, or do we want to support other kinds of mappings? i.e. should we be able to generate other formats (DocBook, SVG, MathML, etc). Because these formats have provisions for and practically speaking include XMLNS as part of their specification. So either we go all in with "HTML5" as the baseline and reject the above formats, or we try to figure out how to be compliant with XML/XMLNS so that the above formats fit naturally and without "markdown specific" adaptations (i.e. using PI "tags"). |
I reviewed in my own code what would be required to use 1-character namespaces. Firstly, we'd need some way in the markdown document to hook this up, e.g.
Otherwise there is no way to attach the logic without making assumptions. Internally, the code would look like:
So we would need to expose enough bits (e.g. |
I've made a fork to experiment with the changes and since this is blocking downstream work, I'll use this fork in my own projects so I can gain some experience with the potential changes and report back. https://github.com/ioquatix/markly Only two specs failed, and they both relate to the usage of single character namespaces. So either the current spec is under-specified, my implementation is not working as expected, or the changes (allowing |
@ioquatix Could you solve this by doing the inverse? Right now, you have XML embedded in Markdown. What if you have a proper XML document, parse it, and then treat some elements inside it as Markdown?
True! I feel the spec is currently pulled into a split between XML and HTML. In some cases, XML things are supported and not all HTML is, but in other, HTML semantics are applied and not all XML is supported. I think it would make sense to go in a clear direction. And the reason I prefer HTML is because it is, similar to Markdown, a format made for authors, without errors. Whereas XML is strict and with errors. |
Yes, all of this is allowed, though many implementations (including the reference implementations) are "safe by default" and only pass this through as a link if you specify the "unsafe" option. Forbidding a scheme in autolinks is pretty weak security -- you could always use it in a regular link. Forbidding a scheme in any links is also pretty weak security -- if you're worried about security, there are many other things you need to consider as well. |
It's not about parsing, it's about specifying. The spec says: such-and-such counts as inline HTML. So we have an ambiguity problem if the same string also counts as an autolink. We could solve that by specifying a precedence explicitly in the spec (ugly, as it goes against the declarative style of the spec), or by making the spec for inline HTML more complex so that |
I agree with all your points, but I also have the same problem I started with. So, let's explore with the fork for now. It seems that making the single change didn't break a single spec unexpectedly, so at least we are in a situation where we should add more specs to break my fork, or add more specs to support it's existance. There is one spec that fails somewhat expectedly:
But this seems like the specific specification of single character namespaces as outlined above. Is it currently explicitly denied? Was that the intention? |
So far in my fork I have not encountered any unexpected behaviour, and I feel as if it's working as I'd expect w.r.t. namespaced tags. |
@ioquatix Do you need namespaces to work for HTML blocks or HTML inlines (or both)? If HTML blocks is enough, I think the ambiguity problem goes away. Block parsing happens before inline parsing, so if the block is parsed as a HTML block, no inline parsing will happen and content in the block can not be confused with autolinks. It would mean that the definition for open tag is different for HTML blocks and inline blocks, but maybe that's acceptable. |
I guess I want both. But I understand where you are coming from and maybe it's a good first step. That being said, my modified parser doesn't seem to have any issues thus far with "real world" markdown. |
I am still using this in production on an admittedly small dataset and haven't run into any issues. I think limiting auto-linking to a well-known (perhaps configurable per parser) subset of schemes is a logical approach. The only change required was adding |
This is a proposed change to the markdown spec to allow namespaces within HTML tags.
It should be noted:
The motivation for this is to allow XML to pass through markdown documents unchanged. This can include things like SVG, MathML, etc. In my particular case, I have a post processing step which expands namespaced tags into templates.
An alternative to this proposal could be:
That proposal is more similar to how attributes are defined, and easier to parse, but also not completely accurate w.r.t. the XMLNS specification.
See gjtorikian/commonmarker#123 for a potential implementation.
cc @kivikakk