diff --git a/PURL-SPECIFICATION.rst b/PURL-SPECIFICATION.rst index 32aa48f8..490dac4c 100644 --- a/PURL-SPECIFICATION.rst +++ b/PURL-SPECIFICATION.rst @@ -132,7 +132,7 @@ The rules for each component are: - **type**: - The package ``type`` MUST be composed only of ASCII letters and numbers, - '.', '+' and '-' (period, plus, and dash). + period '.', plus '+', and dash '-'. - The ``type`` MUST start with an ASCII letter. - The ``type`` MUST NOT be percent-encoded. - The ``type`` is case insensitive. The canonical form is lowercase. @@ -176,25 +176,30 @@ The rules for each component are: - **qualifiers**: - - The ``qualifiers`` string is prefixed by a '?' separator when not empty - - This '?' is not part of the ``qualifiers`` - - This is a query string composed of zero or more ``key=value`` pairs each - separated by a '&' ampersand. A ``key`` and ``value`` are separated by the equal - '=' character - - These '&' are not part of the ``key=value`` pairs. - - ``key`` must be unique within the keys of the ``qualifiers`` string - - ``value`` cannot be an empty string: a ``key=value`` pair with an empty ``value`` - is the same as no key/value at all for this key - - For each pair of ``key`` = ``value``: - - - The ``key`` must be composed only of ASCII letters and numbers, '.', '-' and - '_' (period, dash and underscore) - - A ``key`` cannot start with a number - - A ``key`` must NOT be percent-encoded - - A ``key`` is case insensitive. The canonical form is lowercase - - A ``key`` cannot contain spaces - - A ``value`` must be a percent-encoded string - - The '=' separator is neither part of the ``key`` nor of the ``value`` + - The ``qualifiers`` component MUST be prefixed by an unencoded question + mark '?' separator when not empty. This '?' separator is not part of the + ``qualifiers`` component. + - The ``qualifiers`` component is composed of one or more ``key=value`` + pairs. Multiple ``key=value`` pairs MUST be separated by an + unencoded ampersand '&'. This '&' separator is not part of an + individual ``qualifier``. + + - A ``key`` and ``value`` MUST be separated by the unencoded equal sign '=' + character. This '=' separator is not part of the ``key`` or ``value``. + - A ``value`` MUST NOT be an empty string: a ``key=value`` pair with an + empty ``value`` is the same as if no ``key=value`` pair exists for this + ``key``. + + - For each ``key=value`` pair: + + - The ``key`` MUST be composed only of lowercase ASCII letters and numbers, + period '.', dash '-' and underscore '_'. + - A ``key`` MUST start with an ASCII letter. + - A ``key`` MUST NOT be percent-encoded. + - Each ``key`` MUST be unique among all the keys of the ``qualifiers`` + component. + - A ``value`` MAY be composed of any character and all characters MUST be + encoded as described in the "Character encoding" section. - **subpath**: @@ -206,9 +211,11 @@ The rules for each component are: in the canonical form - Each ``subpath`` segment MUST be a percent-encoded string - When percent-decoded, a segment: + - MUST NOT contain a '/' - MUST NOT be any of '..' or '.' - MUST NOT be empty + - The ``subpath`` MUST be interpreted as relative to the root of the package @@ -486,3 +493,12 @@ License ~~~~~~~ This document is licensed under the MIT license + +Definitions +~~~~~~~~~~~ + +[ASCII] See, e.g., + + - American National Standards Institute, "Coded Character Set -- 7-bit + American Standard Code for Information Interchange", ANSI X3.4, 1986. + - https://en.wikipedia.org/wiki/ASCII. diff --git a/faq.rst b/faq.rst index 41307f76..23989d23 100644 --- a/faq.rst +++ b/faq.rst @@ -6,7 +6,7 @@ Scheme **QUESTION**: Can the ``scheme`` component be followed by a colon and two slashes, like a URI? -No. Since a ``purl`` never contains a URL Authority, its ``scheme`` should not be suffixed with double slash as in 'pkg://' and should use 'pkg:' instead. Otherwise this would be an invalid URI per RFC 3986 at https://tools.ietf.org/html/rfc3986#section-3.3:: +**ANSWER**: No. Since a ``purl`` never contains a URL Authority, its ``scheme`` should not be suffixed with double slash as in 'pkg://' and should use 'pkg:' instead. Otherwise this would be an invalid URI per RFC 3986 at https://tools.ietf.org/html/rfc3986#section-3.3:: If a URI does not contain an authority component, then the path cannot begin with two slash characters ("//"). @@ -24,9 +24,10 @@ For example, although these two purls are strictly equivalent, the first is in c pkg://gem/ruby-advisory-db-check@0.12.4 + **QUESTION**: Is the colon between ``scheme`` and ``type`` encoded? Can it be encoded? If yes, how? -The "Rules for each ``purl`` component" section provides that "[t]he ``scheme`` MUST be followed by an unencoded colon ':'. +**ANSWER**: The "Rules for each ``purl`` component" section provides that the ``scheme`` MUST be followed by an unencoded colon ':'. In this case, the colon ':' between ``scheme`` and ``type`` is being used as a separator, and consequently should be used as-is, never encoded and never requiring any decoding. Moreover, it should be a parsing error if the colon ':' does not come directly after 'pkg'. Tools are welcome to recover from this error to help with malformed purls, but that's not a requirement. @@ -37,10 +38,11 @@ Type **QUESTION**: What behavior is expected from a purl spec implementation if a ``type`` contains a character like a slash '/' or a colon ':'? -The "Rules for each purl component" section provides that +**ANSWER**: The "Rules for each purl component" section provides that the +package ``type`` - [t]he package ``type`` MUST be composed only of ASCII letters and numbers, - '.', '+' and '-' (period, plus, and dash) + MUST be composed only of ASCII letters and numbers, period '.', plus '+', + and dash '-'. As a result, a purl spec implementation must return an error when encountering a ``type`` that contains a prohibited character.