define handling of plus and space#261
Conversation
|
@matt-phylum this is more than nice research and work you did there! 🙇 Let me review this in details and come with feedback! |
|
I found links to three more implementations in the readme file of this repository and updated the description to include them. packageurl-java merged some PRs that fixed most of the problems with spaces and plus signs so it's been updated here. |
|
I've published a slightly cleaned up version of the code used to collect the data for the above table in https://github.com/phylum-dev/purl-survey. Hopefully it's useful for future cross-implementation testing like this. |
|
+1 to this. I just discovered that syft will not encode I think you could probably add this https://github.com/anchore/packageurl-go library to this list since that's what syft is using. |
|
Hi @matt-phylum . Thank you for your PR. When you have the chance, could you please resolve the conflicts referred to below? |
|
@matt-phylum Hey, now that we clarified the encoding, this is still relevant... do you want to get a quick stab at updating this PR to the latest spec? Otherwise @johnmhoran or I can handle it. |
|
@matt-phylum Thanks for your PR and thoughtful analysis. It looks like your proposed character-encoding changes concerning the plus '+' and the space ' ' have been addressed by PRs we've merged in the last month or so. I'm going to go ahead and close this PR, but please let me know if you spot anything that still needs to be addressed. Two of your related proposed changes concern PURL-TYPES.rst and test-suite-data.json. We've split our work between the PURL standard -- the core -- and the larger PURL specification, of which the standard is a part. Since we're handling work on the types and the test suite as part of the spec, not the standard, please open a new PR for each of these so we can address them individually. [edit] I forgot to mention -- I'll take care of the issue @pombredanne opened concerning the plus '+' in the |
This is a big change (in terms of impact, not lines), and I'm not entirely sure its the correct change, but something needs to be done in this area.
Problem
The PURL spec describes qualifiers as being an
&delimited sequenced of=delimited key value pairs where the value is percent encoded, and the section on encoding describes a minimal set of characters that are supposed to be percent encoded in different contexts. This looks a lot like x-www-formurlencoded, but x-www-formurlencoded encodes almost all characters besides the ascii alphanumeric set, and has a special behavior where ' ' is encoded as '+'.I am aware of thirteen PURL implementations:
That means if you're working with qualifiers that have spaces or plus signs in their values, it's fairly likely that software using a different implementation of PURL will interpret the PURLs differently. This may also happen if your PURLs have plus signs in their versions (deb) or spaces in their names (swid) and the implementation incorrectly decodes the name and version as if they were qualifier values (see below).
6/14 of the implementations are decoding '+' as ' ', so here's why I think it's better for the spec to specify '+' is '+':
Proposal
The spec is updated to be specific, new tests are added to the test suite, and incorrectly escaped examples in the package types spec are updated to be consistent.
Unfortunately, this requires changes to at least 7/14 implementations to get everything aligned, but they should be minor changes.