Skip to content

Commit

Permalink
[breaking] Add support for ISO standards, add JPEG XL ISO/IEC spec (#…
Browse files Browse the repository at this point in the history
…1192)

To add ISO/IEC standards related to JPEG XL (see #1089), there needs to be a
way to add an entry in browser-specs that has a canonical URL but no actual
nightly URL, because ISO standards are not public (see also #1191).

This update amends the code to allow and create entries without a `nightly`
property when needed (only for ISO standards for now). The code also retrieves
the name of the group that develops an ISO standard.

This is a BREAKING CHANGE because the `nightly` property used to be mandatory.
Projects that expect to find a `nightly.url` property need to be updated to
only look at the root `url` property or to skip the entry altogether.

First ISO spec added to the list is JPEG XL, which will appear as:

```json
{
  "url": "https://www.iso.org/standard/85253.html",
  "seriesComposition": "full",
  "shortname": "iso18181-2",
  "series": {
    "shortname": "iso18181-2",
    "currentSpecification": "iso18181-2",
    "title": "Information technology — JPEG XL image coding system — Part 2: File format",
    "shortTitle": "JPEG XL: File Format"
  },
  "shortTitle": "JPEG XL: File Format",
  "organization": "ISO/IEC",
  "groups": [
    {
      "name": "ISO/IEC JTC 1/SC 29",
      "url": "https://www.iso.org/committee/45316.html"
    }
  ],
  "title": "Information technology — JPEG XL image coding system — Part 2: File format",
  "source": "specref",
  "categories": [
    "browser"
  ],
  "standing": "good"
}
```

Worth noting: the absence of a `nightly` property means that there is no place
to store the status of the spec, which could in theory be "Under development"
or "Published" for ISO specs (there are additional stages in the ISO process
but they are probably not worth capturing in any case). An alternative would be
to have a `nightly.status` property, but that seems clumsy.
  • Loading branch information
tidoust authored Feb 6, 2024
1 parent 0b11216 commit 003c904
Show file tree
Hide file tree
Showing 18 changed files with 208 additions and 46 deletions.
12 changes: 9 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,10 @@ Each specification in the list comes with the following properties:

The versioned (but not dated) URL for the spec. For W3C specs published as
TR documents, this is the TR URL. For WHATWG specs, this is the URL of the
living standard. In other cases, this is the URL of the latest Editor's Draft.
living standard. For specs developed by an organization that does not provide
a public version of the spec such as ISO, this is the URL of the page that
describes the spec on the organization's web site. In other cases, this is the
URL of the latest Editor's Draft.

The URL should be relatively stable but may still change over time. See
[Spec identifiers](#spec-identifiers) for details.
Expand Down Expand Up @@ -339,7 +342,8 @@ for all specifications in the CSS Fonts series.
For specs that are not part of a series of specs, this matches the
[`nightly.url`](#nightlyurl) property.

The `nightlyUrl` property is always set.
The `nightlyUrl` property is always set when the [`nightly`](#nightly) property
is set.


### `seriesVersion`
Expand Down Expand Up @@ -466,7 +470,9 @@ The `pages` property is only set for specs identified as multipage specs.
An object that represents the latest Editor's Draft of the spec, or the living
standard when the concept of Editor's Draft does not exist.

The `nightly` property is always set.
The `nightly` property is always set unless the spec does not have a public
version available through a URL. For instance, ISO specs are not publicly
available.


#### `nightly.url`
Expand Down
2 changes: 1 addition & 1 deletion schema/index.json
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"formerNames": { "$ref": "definitions.json#/$defs/formerNames" }
},
"required": [
"url", "shortname", "series", "seriesComposition", "nightly",
"url", "shortname", "series", "seriesComposition",
"title", "shortTitle", "source", "organization", "groups", "categories",
"standing"
],
Expand Down
5 changes: 5 additions & 0 deletions specs.json
Original file line number Diff line number Diff line change
Expand Up @@ -664,6 +664,11 @@
"https://wicg.github.io/webpackage/loading.html",
"https://wicg.github.io/webusb/",
"https://wicg.github.io/window-controls-overlay/",
{
"url": "https://www.iso.org/standard/85253.html",
"shortname": "iso18181-2",
"shortTitle": "JPEG XL: File Format"
},
"https://www.rfc-editor.org/rfc/rfc2397",
{
"url": "https://www.rfc-editor.org/rfc/rfc4120",
Expand Down
21 changes: 13 additions & 8 deletions src/build-index.js
Original file line number Diff line number Diff line change
Expand Up @@ -240,6 +240,7 @@ async function runInfo(specs) {
// level to the nightly URL when it's not already there (note the resulting
// URL should always exist given the way the CSS drafts server is setup)
if (res.seriesVersion &&
res.nightly &&
res.nightly.url.match(/\/drafts\.(?:csswg|fxtf|css-houdini)\.org/) &&
!res.nightly.url.match(/\d+\/$/)) {
res.nightly.url = res.nightly.url.replace(/\/$/, `-${res.seriesVersion}/`);
Expand Down Expand Up @@ -278,14 +279,16 @@ async function runInfo(specs) {

// If we're reusing last published discontinued info,
// forget alternate URLs and rebuild them from scratch.
if (res.__last?.standing === 'discontinued' &&
(!res.standing || res.standing === 'discontinued')) {
res.nightly.alternateUrls = [];
}
else if (!res.nightly.alternateUrls) {
res.nightly.alternateUrls = [];
if (res.nightly) {
if (res.__last?.standing === 'discontinued' &&
(!res.standing || res.standing === 'discontinued')) {
res.nightly.alternateUrls = [];
}
else if (!res.nightly.alternateUrls) {
res.nightly.alternateUrls = [];
}
res.nightly.alternateUrls = res.nightly.alternateUrls.concat(computeAlternateUrls(res));
}
res.nightly.alternateUrls = res.nightly.alternateUrls.concat(computeAlternateUrls(res));

return res;
});
Expand Down Expand Up @@ -350,7 +353,9 @@ async function runFilename(index, { previousIndex, log }) {

async function checkSpec(spec) {
log(`- find filenames for ${spec.shortname}`);
spec.nightly.filename = spec.nightly.filename ?? await determineSpecFilename(spec, "nightly");
if (spec.nightly) {
spec.nightly.filename = spec.nightly.filename ?? await determineSpecFilename(spec, "nightly");
}
if (spec.release) {
spec.release.filename = spec.release.filename ?? await determineSpecFilename(spec, "release");
}
Expand Down
3 changes: 2 additions & 1 deletion src/check-base-url.js
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ const problems = specs
// A subset of the IETF RFCs are crawled from their httpwg.org rendering
// see https://github.com/tobie/specref/issues/672 and
// https://github.com/w3c/browser-specs/issues/280
.filter(s => !s.nightly.url.startsWith('https://httpwg.org') &&
.filter(s => s.nightly &&
!s.nightly.url.startsWith('https://httpwg.org') &&
!s.nightly.url.startsWith('https://www.ietf.org/') &&
!s.nightly.url.startsWith('https://dcthetall.github.io/CHIPS-spec/'))
.filter(s => (s.release && s.url !== s.release.url) || (!s.release && s.url !== s.nightly.url))
Expand Down
8 changes: 5 additions & 3 deletions src/compute-repository.js
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ function getFirstFoundInTree(paths, ...items) {
* info won't include the source file).
*/
module.exports = async function (specs, options) {
if (!specs || specs.find(spec => !spec.nightly || !spec.nightly.url)) {
if (!specs) {
throw "Invalid list of specifications passed as parameter";
}
options = options || {};
Expand Down Expand Up @@ -177,7 +177,9 @@ module.exports = async function (specs, options) {
}

// Compute GitHub repositories with lowercase owner names
const repos = specs.map(spec => parseSpecUrl(spec.nightly.repository ?? spec.nightly.url));
const repos = specs.map(spec => spec.nightly ?
parseSpecUrl(spec.nightly.repository ?? spec.nightly.url) :
null);

if (options.githubToken) {
// Fetch the real name of repository owners (preserving case)
Expand All @@ -201,7 +203,7 @@ module.exports = async function (specs, options) {
}
}
}
else if (spec.nightly.url.match(/\/httpwg\.org\//)) {
else if (spec.nightly?.url.match(/\/httpwg\.org\//)) {
const draftName = spec.nightly.url.match(/\/(draft-ietf-(.+))\.html$/);
spec.nightly.repository = 'https://github.com/httpwg/http-extensions';
spec.nightly.sourcePath = `${draftName[1]}.md`;
Expand Down
6 changes: 5 additions & 1 deletion src/compute-shortname.js
Original file line number Diff line number Diff line change
Expand Up @@ -175,7 +175,11 @@ function completeWithSeriesAndLevel(shortname, url, forkOf) {
// Shortnames of WebGL extensions sometimes end up with digits which are *not*
// to be interpreted as level numbers. Similarly, shortnames of ECMA specs
// typically have the form "ecma-ddd", and "ddd" is *not* a level number.
if (seriesBasename.match(/^ecma-/) || url.match(/^https:\/\/registry\.khronos\.org\/webgl\/extensions\//)) {
// And that's the same for ISO standards which end with plenty of non-level
// digits, as in "iso18181-2".
if (seriesBasename.match(/^ecma-/) ||
seriesBasename.startsWith("iso") ||
url.match(/^https:\/\/registry\.khronos\.org\/webgl\/extensions\//)) {
return {
shortname: specShortname,
series: { shortname: seriesBasename }
Expand Down
4 changes: 2 additions & 2 deletions src/compute-standing.js
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ const unofficialStatuses = [
* the spec.
*/
module.exports = function (spec) {
if (!spec || !spec.nightly?.status) {
if (!spec) {
throw "Invalid spec object passed as parameter";
}

Expand All @@ -29,7 +29,7 @@ module.exports = function (spec) {
return spec.standing;
}

const status = spec.release?.status ?? spec.nightly.status;
const status = spec.release?.status ?? spec.nightly?.status;
if (status === "Discontinued Draft") {
return "discontinued";
}
Expand Down
2 changes: 1 addition & 1 deletion src/determine-testpath.js
Original file line number Diff line number Diff line change
Expand Up @@ -151,7 +151,7 @@ module.exports = async function (specs, options) {
return info;
}

if (spec.url.startsWith("https://tc39.es/proposal-")) {
if (spec.url.startsWith("https://tc39.es/proposal-") || !spec.nightly) {
// TODO: proposals may or may not have tests under tc39/test262, it would
// be good to have that info here. However, that seems hard to assess
// automatically and tedious to handle as exceptions in specs.json.
Expand Down
66 changes: 66 additions & 0 deletions src/fetch-groups.js
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,62 @@ const Octokit = require("./octokit");
const parseSpecUrl = require("./parse-spec-url.js");


/**
* Retrieve the information about the exact organization and the group that
* develops an ISO specification from the description page on the ISO web site.
*
* The group's name and URL appear in the page under the General Information
* heading.
*
* Note: It would be better to use Puppeteer to parse the HTML, but that seems
* a bit overkill (and would not be future proof either since the HTML page
* does not contain good non-text anchors to extract the info we need.
*/
async function setISOGroupFromPage(spec, options) {
const res = await fetch(spec.url, options);
if (res.status !== 200) {
throw new Error(`Could not retrieve ISO page ${spec.url}, status code is ${res.status}`);
}
const html = await res.text();
let startPos = html.indexOf('<h3>General information</h3>');
if (startPos === -1) {
throw new Error(`Cannot find General information heading in ISO page ${spec.url}`);
}
startPos = html.indexOf('Technical Committee&nbsp;:', startPos);
if (startPos === -1) {
throw new Error(`Cannot find technical committee information in ISO page ${spec.url}`);
}
startPos = html.indexOf('<a ', startPos);
if (startPos === -1) {
throw new Error(`Cannot find technical committee anchor in ISO page ${spec.url}`);
}
startPos = html.indexOf('href="', startPos);
let endPos = html.indexOf('"', startPos + 'href="'.length);
if (startPos === -1 || endPos === -1) {
throw new Error(`Cannot find technical committee href in ISO page ${spec.url}`);
}
const groupUrl = html.substring(startPos + 'href="'.length, endPos);
startPos = html.indexOf('>', endPos);
endPos = html.indexOf('<', startPos);
if (startPos === -1 || endPos === -1) {
throw new Error(`Cannot find technical committee name in ISO page ${spec.url}`);
}
const groupName = html.substring(startPos + 1, endPos).trim();

if (groupName.startsWith('ISO/IEC')) {
spec.organization = 'ISO/IEC';
}
else {
spec.organization = 'ISO';
}

spec.groups = [{
name: groupName,
url: (new URL(groupUrl, 'https://www.iso.org')).href
}];
}


/**
* Exports main function that takes a list of specs (with a url property)
* as input, completes entries with an "organization" property that contains the
Expand Down Expand Up @@ -82,6 +138,14 @@ module.exports = async function (specs, options) {
Unknown group type found in https://datatracker.ietf.org/doc/${ietfName[1]}/doc.json`);
}
}

// For ISO documents, retrieve the group info from the HTML page
// (NB: it would be cleaner to use Puppeteer here)
const isoName = spec.url.match(/https:\/\/www\.iso\.org\//);
if (isoName) {
await setISOGroupFromPage(spec, options);
}

if (!spec.groups) {
throw new Error(`Cannot extract any useful info from ${spec.url}`);
}
Expand Down Expand Up @@ -137,6 +201,8 @@ module.exports = async function (specs, options) {
}]
}



// All specs that remain should be developed by some W3C group.
spec.organization = spec.organization ?? "W3C";

Expand Down
28 changes: 24 additions & 4 deletions src/fetch-info.js
Original file line number Diff line number Diff line change
Expand Up @@ -265,10 +265,19 @@ async function fetchInfoFromSpecref(specs, options) {
specrefStatusMapping[info.status] ??
info.status ??
"Editor's Draft";
results[name] = {
nightly: { url: nightly, status },
title: info.title
};
if (nightly?.startsWith("https://www.iso.org/")) {
// The URL is to a page that describes the spec, not to the spec
// itself (ISO specs are not public).
results[name] = {
title: info.title
}
}
else {
results[name] = {
nightly: { url: nightly, status },
title: info.title
};
}
}
});
});
Expand Down Expand Up @@ -530,6 +539,17 @@ async function fetchInfoFromSpecs(specs, options) {
};
}
}
else if (spec.url.startsWith("https://www.iso.org/")) {
const isoTitle = await page.evaluate(_ => {
const meta = document.querySelector('head meta[property="og:description"]');
return meta ? meta.getAttribute('content').trim() : null;
});
if (isoTitle) {
return {
title: isoTitle
};
}
}

const titleAndStatus = await page.evaluate(_ => {
// Extract first heading when set
Expand Down
2 changes: 1 addition & 1 deletion src/find-specs.js
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ const hasMoreRecentLevel = (s, url, loose) => {
return false;
}
};
const hasUntrackedURL = ({spec: url}) => !specs.find(s => s.nightly.url.startsWith(trimSlash(url))
const hasUntrackedURL = ({spec: url}) => !specs.find(s => s.nightly?.url.startsWith(trimSlash(url))
|| (s.release && trimSlash(s.release.url) === trimSlash(url)))
&& !specs.find(s => hasMoreRecentLevel(s, url, url.match(/\/drafts\./) && !url.match(/\/w3\.org/) // Because CSS specs have editors draft with and without levels, we look loosely for more recent levels when checking with editors draft
));
Expand Down
6 changes: 6 additions & 0 deletions test/compute-repository.js
Original file line number Diff line number Diff line change
Expand Up @@ -80,6 +80,12 @@ describe("compute-repository module", async () => {
"https://github.com/httpwg/http-extensions");
});

it("handles specs without nightly URLs", async () => {
const spec = { url: "https://www.iso.org/standard/85253.html" };
const result = await computeRepo([spec]);
assert.equal(result[0].nightly, undefined);
});

it("returns null when repository cannot be derived from URL", async () => {
assert.equal(
await computeSingleRepo("https://example.net/repoless"),
Expand Down
8 changes: 8 additions & 0 deletions test/compute-shortname.js
Original file line number Diff line number Diff line change
Expand Up @@ -178,6 +178,10 @@ describe("compute-shortname module", () => {
assertSeries("ecma-402", "ecma-402");
});

it("preserves ISO spec numbers", () => {
assertSeries("iso18181-2", "iso18181-2");
});

it("preserves digits at the end of WebGL extension names", () => {
assertSeries("https://registry.khronos.org/webgl/extensions/EXT_wow32/", "EXT_wow32");
});
Expand Down Expand Up @@ -242,6 +246,10 @@ describe("compute-shortname module", () => {
assertNoSeriesVersion("ecma-402");
});

it("does not confuse an ISO spec number with a series version", () => {
assertNoSeriesVersion("iso18181-2");
});

it("does not confuse digits at the end of a WebGL extension spec with a series version", () => {
assertNoSeriesVersion("https://registry.khronos.org/webgl/extensions/EXT_wow32/");
});
Expand Down
17 changes: 5 additions & 12 deletions test/compute-standing.js
Original file line number Diff line number Diff line change
Expand Up @@ -43,23 +43,16 @@ describe("compute-standing module", () => {
assert.strictEqual(computeStanding(spec), "discontinued");
});

it("returns `good` for an ISO spec", function () {
const spec = { url: "https://www.iso.org/standard/85253.html" };
assert.strictEqual(computeStanding(spec), "good");
});

it("returns the standing that the spec says it has", function () {
const spec = {
standing: "good",
nightly: { status: "Unofficial Proposal Draft" }
};
assert.strictEqual(computeStanding(spec), "good");
});

it("throws if spec object is empty", () => {
assert.throws(
() => computeStanding({}),
/^Invalid spec object passed as parameter$/);
});

it("throws if spec object does not have a nightly.status property", () => {
assert.throws(
() => computeStanding({ url: "https://example.org/" }),
/^Invalid spec object passed as parameter$/);
});
});
Loading

0 comments on commit 003c904

Please sign in to comment.