-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate that description
is single line
#30
Comments
@FFY00 what is the rationale here? This breaks all our project installs via pdm-backend (which was updated yesterday to 0.8.0rc1). Since when are multiline descriptions illegal? |
The spec says they are one line: https://packaging.python.org/en/latest/specifications/core-metadata/#summary And putting multiple lines originally broke the format: mesonbuild/meson-python#67 (comment) But it looks like we were joining the newlines correctly since 0.6 in c74f4c5. @frostming, thoughts? |
A multiline "description" would be |
You are correct regarding the single line summary/description field However due to the previous behavior it is effectively a breaking change |
@frostming can decide here (also @dnicolodi or @rgommers for meson-python). If it should be warning for a release, then error, or have an opt-in/out by backends, etc, happy to leave it to them. If I were deciding, I'd look at the numbers on PyPI of multiline |
Since there is already a report of breaking even with a pre-release, a deprecation rather than a hard error seems to be warranted. For |
I've just created a milestone for |
This is vendored into pdm-backend, so it's actually a normal release of pdm-backend that found this, not a pre-release. Scikit-build-core will be vendoring in the next release too: scikit-build/scikit-build-core#703 - partially because we guarantee reproducible SDists, and those can be affected by changes in pyproject-metadata. Also I want to ship METADATA 2.2 soon, and don't want to wait for all distros to update. |
I'm not sure I follow. Is this about how the field is serialized in the |
It's also interesting that only the core-metadata summary field is specified to be single-line, not the TOML description field - it is only specified to be a TOML string. TOML strings are generally allowed to be multiline. Personally I also find this documentation way too obscure for something that is user-facing -- how is the average pyproject.toml user supposed to find a core-metadata specification documentation snippet (which is not even clear to be associated with the pyproject.toml description field to the uninitiated). |
I think normalizing the description field in a way similar to description = '\n'.join(description.splitlines()) is the best way forward. The multi-line value support currently in pyproject-metadata (linked by @henryiii a few comments above) does not seem right according to the metadata specification (summary should be a single line). Is |
This is validating that a user does not enter a multiline string in It is not clear, however, that So currently description = """a
b""" Becomes:
Which is Doing this hides the single-line nature of Summary, though, and can confuse a user into thinking that the new lines are preserved. And encourages much longer Summary's. If it was done earlier, I don't think anyone would complain about I'm rather curious as to how other backends (hatchling, poetry1, setuptools, for example) handle this. By the way, I'm not sure Footnotes
|
I think you meant: description = ' '.join(description.splitlines()) Technically, new lines are allowed in strings in toml. So you could put them in all the fields. IMO, that should be validated and error out, but it could be done. Many of them (like name) can't have spaces, so that rules out new lines too. license is free-form, so it probably can have newlines. If I understand the email-format correctly (which I probably don't), new lines are not possible without workarounds like the seven spaces + pipe one, which is only implemented for |
@henryiii When reading the core metadata specification, it is very hard for me to distinguish which ones are format requirements and which ones are validation requirements. The specification says that the format is RFC822-like. And RFC822 does not preserve newlines in any field (except the message body). Thus it is very weird to say that Summary is not allowed to contain newlines: the serialization format does not allow it. Therefore, I get that this is not a content validation requirement, but a serialization format requirement (a restriction of what RFC822 would otherwise allow). This is where my conclusion that the current pyproject-metadata behavior is not correct. But obviously there is quite a lot of speculation regarding interpretation of the specification. |
Ops. Yes, indeed. Too much multitasking... |
I agree that the main issue here is the relatively late introduction of the new validation. Poetry's [tool.poetry.description] does not complain about multiline descriptions. The serialized metadata summary is single line. |
IMO, this is a standardization question. The guidelines should either be updated to describe what to do with multiline descriptions or to require a single line. If the latter, validate-pyproject and schemstore could validate that these are a single line, that might be a good first step. |
It's a decade since I've concerned myself with this kind of details, but my understanding is the same. Newlines are not preserved in email messages headers. |
I think easing back on this is fine, the question really is should it be a warning with a future error, or do we keep the newline behavior (and should we use the string join over trying to preserve them in the metadata's structure like we do now). I think a post on the Python discourse would be in order to ask. @pfmoore might know? |
I would not introduce a warning at this point. Normalizing newlines to spaces seems natural enough to make it happen without warning, unless newlines are explicitly forbidden. Either replacing newlines with spaces or preserving newlines in the serialization format as in the current code (knowing that they would be substituted with spaces when the METADATA file is read) are fine for me. The latter may be a violation of the specification, but it seems that nothing breaks on this violation, thus it is inconsequential. |
In my view, the current specifications are clear. As to whether this project should validate the field, or whether it should deprecate multi-line values, with or without a warning, that is a project decision. Projects specifying a multi-line description are technically in violation of the standards, and tools could fail as a result, but in practical terms the violation seems to do little harm. So I’m not sure that I personally see the benefit of being strict here. I agree that there should be better user facing documentation describing how to write your |
@pfmoore As indicated in previous comments, the format used for core metadata does not allow to encode a value containing a newline characters. Newlines must be followed by indentation and are treated as a continuation of the previous line. This kind of encoding is used when the value to be recorded becomes too long (I haven't checked where If the core metadata specification refers to the value before serialization, specifying that it should not contain newlines seems superfluous as newlines cannot be encoded, thus must be rejected a priori. If it refers to the encoded value, should it also be interpreted as an implicit limit on the length of the value? |
The core metadata spec refers to the (abstract) value of the metadata, independent of serialisation. Specifying that The email serialisation format is regrettably somewhat under-specified, as noted in the spec itself, precisely because the email format was never intended for this use and therefore doesn't cover all the nuances we need. That's a shame, but simply a fact of life that we have to deal with. The formats the standards are based on were defined in simpler times, when nobody felt the need for this level of precision. |
IMO, if implementations are allowed to reject multiline The problem with allowing multiple line descriptions is that then there is no assistance at all in keeping this short, and you end up with things like this (selecting the three longest These projects obviously confused the |
On PyPI a couple of weeks ago, here's a count of the new lines:
(-1 means the description is empty) That means 0.2% of all projects use newlines in the description field. |
Out of curiosity, here's a randomly selected one new line project (I can list the projects for each of those numbers of newlines): And the PyPI description is cut in half, as expected for setuptools: https://pypi.org/project/truelearn You can verify that setuptools just deletes everything after a new line: |
That example suggests that the newline is somewhat incidental, included purely to wrap the text somewhere convenient. So for that case, replacing But of course, that may not always be true, and "in the face of ambiguity, refuse the temptation to guess"... With regard to "if one project rejects newlines, all should otherwise interoperability suffers" - yes, in theory. In practice, though, changing backends is never simply changing the |
I’m not referring to changing backends, but tools that can read the pyproject.toml as well. A lot of work went into making sure that arbitrary tools can read this, and it has enabled some great things (like front ends allowing any backend, GitHub dependency graphs, Ruff and UV pull information here, validate-pyproject, etc. Tools like that cannot know what a multiline description will do; error, join (current behavior here), or truncate (setuptools). It may not be all that common for them to care about the description, but the fact remains that they can’t tell exactly what it will be. I think that pyproject-metadata should not be opinionated, unless the specs are opinionated. Personally, I will check this field and throw a clear error inside scikit-build-core if it’s multi line, as that will help users understand this field. |
OK. I think we're talking past each othere here - my apologies. As far as standards are concerned, a project with a newline in the Tools have to make their own decisions as to how they handle invalid projects. Most are likely to take the stance that they will reject them, simply because that's easier. However, some tools might prefer (for backward compatibility reasons, or otherwise) to handle invalid projects. Yes, that means that if your project (or a project you're using) is invalid, some tools will fail. That's what happens once you stray outside of the standards. As far as Footnotes
|
Three build backends that I know of use pyproject-metadata: scikit-build-core, meson-python, and pdm-backend. I think that’s our major users. Others might use it directly for building utilities of some sort, but I think it’s mostly used through those build backends. I’m happy to go with whatever the other backend maintainers like if there’s no standard suggestion: no warning, warning, or error. I’ll make it an error for scikit-build-core regardless. The current standards don’t say much about this field at all, could that at least be updated to say it’s a single line summary? |
They say it's mapped to |
the specification document opens with a description of the serialization format, thus I was under the impression that the specification covers both the data specification and the serialization format, thus the data does not exist in isolation from the serialization format. |
Major thanks for the user-friendly fix in 0.8! |
No description provided.
The text was updated successfully, but these errors were encountered: