schema/ListedLicense: Punt license text into <text> #452

wking · 2017-10-19T22:32:10Z

As it stands, there's not an easy way to extract the license text from the XML format without removing <crossRefs>, <notes>, <standardLicenseHeader>, and other siblings. With this change (which, if accepted, I still have to propagate into src/) we collect it in <text>, just like we already collect the header text in <standardLicenseHeader>.

This is a WIP, because of the pending src/ migration. If this PR gets a green light, I'll go ahead and migrate src/ before we merge it.

wking · 2017-10-19T23:19:55Z

I've added a WIP commit showing the changes to Apache-2.0 and u-boot-exception-2.0. If you remove all the other files you can use gulp validate to confirm the changes pass the new schema.

By differentiating between formattedAltText..., which may contain <alt> and <optional>, and formattedFixedText..., which may not. The optionalType is using formattedAltTextGroup, because nesting variable content inside an optional element can be useful [1,2]. The <alt> element, on the other hand, specifies all the variable options in its match attribute, so there's no need for nesting variable elements inside the alt body. Also require fixed text for <notes>, since those aren't templates. And drop the unused notesType, since they're vanilla formatted text. We almost certainly want to drop titleText and copyrightText fro the formatted*TextGroup choices, but I'm leaving that for [3]. [1]: spdx#393 [2]: spdx#462 [3]: spdx#452

wking · 2017-11-08T05:05:47Z

Rebased around #475 with 619e2f7 → 3eee08f.

goneall · 2017-11-15T04:40:08Z

I think this is a good improvement (it will simplify some of the code in the tools).

One consideration is the name of the field. In the JSON and RDF/XML formats of the license data, the license text uses a property name of licenseText (see the Accessing SPDX License Document for reference). The exceptions use licenseExceptionText.

For consistency, we could use the same terms.

wking · 2017-11-15T05:05:02Z

One consideration is the name of the field. In the JSON and RDF/XML formats of the license data, the license text uses a property name of licenseText (see the Accessing SPDX License Document for reference). The exceptions use licenseExceptionText.

I'm generally in favor of simple names (like text) where the full context is available via the element context. So <license><text>…</text></license> would be license text and <exception><text>…</text></exception> would be exception text. But I'm not going to block on that preference; if you don't find my reasoning convincing and really prefer licenseText and licenseExceptionText, let me know, and I'll update this PR.

goneall · 2017-11-15T05:13:58Z

I'm generally in favor of simple names (like text)

I was also thinking of the advantages of a simpler name. It would actually be easier to implement with a simple name since there is common code for parsing the license and exception XML.

The history on the longer names comes from the use in tag/value which doesn't always have a hierarchical structure to pull context from. Since we don't really have that issue here, I'm leaning toward text myself. Let's just confirm on the legal call on 21 Nov. to make sure no one has any heartburn over this approach. We can also close on adding this in before the release.

bradleeedmondson · 2017-11-15T14:50:07Z

I'm generally in favor of simple names (like text) where the full context is available via the element context. So … would be license text and … would be exception text.

+1 for this approach, as long as we're sure this doesn't introduce ambiguity with other formats downstream from the XML. In other words, the RDF/JSON/etc. still has the complete context.

wking · 2017-11-15T15:03:08Z

...as long as we're sure this doesn't introduce ambiguity with other formats downstream from the XML.

Whatever tool is converting from XML can translate tag names too, so I don't think this will be a problem.

goneall · 2017-11-15T17:31:46Z

as long as we're sure this doesn't introduce ambiguity with other formats downstream from the XML.

I'll make sure the tool retains the same format when translating to the JSON and other formats.

wking · 2017-12-29T18:37:06Z

Rebased around #519's required addition with 3eee08f → a31d982.

bradleeedmondson · 2018-01-12T22:47:13Z

Marking for 3.1 per @wking and @goneall request

wking · 2018-01-12T22:57:32Z

I can rebase this and inject the remaining <text> tags whenever. Let me know when would be easiest for review.

goneall · 2018-01-13T22:06:46Z

Let me know when would be easiest for review.

Two considerations on timing I can think of - possible merge conflicts and impact on the tooling.

I'm thinking we start rebasing and reviewing soon.

For the possible merge conflicts with existing PR's: Most of the PR's are @wking and I can take care of any conflict with #570 so those won't be an issue. The other PR's are #587 , #551 and #489. This seems a small enough set to be manageable.

For the tooling: The tool which tests the licenses texts will need to be updated with this XML schema change. I suggest we review and merge these changes before we enable the license text testing in PR #593 This way, I can update the tool before we start using it.

As it stands, there's not an easy way to extract the license text from the XML format without removing <crossRefs>, <notes>, <standardLicenseHeader>, and other siblings. With this change (which, if accepted, I still have to propagate into src/) we collect it in <text>, just like we already collect the header text in <standardLicenseHeader>. Also use <all> instead of <choice> for <LicenseType> children. With that, <crossRefs> could have occured multiple times, etc. With this change, it can only occur once, although the children can still appear in any order (we'd use <sequence> if we cared about child order).

Catch up with the previous commit's schema change.

wking · 2018-01-15T18:58:23Z

Rebased onto master and added <text> to all the other licenses and exceptions with a31d982 → c478ba1. I've removed the “WIP” from the PR subject, and this is ready for review and merging.

goneall

Whew! All looks good.

goneall · 2018-01-16T03:10:35Z

Thanks @wking !

wking force-pushed the license-text-schema branch from 403d643 to c8a395d Compare October 19, 2017 22:38

wking force-pushed the license-text-schema branch from c8a395d to 619e2f7 Compare October 19, 2017 23:20

wking mentioned this pull request Oct 21, 2017

OFL-1.1 standard license header #451

Closed

wking mentioned this pull request Nov 6, 2017

schema/ListedLicense: Do not allow <alt> or <optional> inside <alt> #475

Merged

wking force-pushed the license-text-schema branch from 619e2f7 to 3eee08f Compare November 8, 2017 05:05

jlovejoy added the technical issue label Dec 19, 2017

jlovejoy added this to the Later Release milestone Dec 21, 2017

zvr added the XML schema change label Dec 21, 2017

This was referenced Dec 26, 2017

New GNU identifiers #553

Merged

*: Collapse to a single note entry #566

Merged

wking force-pushed the license-text-schema branch from 3eee08f to a31d982 Compare December 29, 2017 18:35

wking mentioned this pull request Dec 29, 2017

Allow standardLicenseHeader under text #581

Merged

This was referenced Jan 9, 2018

WIP: SPDX template matching licensee/licensee#254

Closed

GPL-3.0*: Copy the standard header from the license body #578

Merged

bradleeedmondson modified the milestones: Later Release, 3.1 release Jan 12, 2018

src: Port to <text>

c478ba1

Catch up with the previous commit's schema change.

wking force-pushed the license-text-schema branch from a31d982 to c478ba1 Compare January 15, 2018 18:57

wking changed the title ~~WIP: schema/ListedLicense: Punt license text into <text>~~ schema/ListedLicense: Punt license text into <text> Jan 15, 2018

goneall approved these changes Jan 16, 2018

View reviewed changes

goneall merged commit dc18392 into spdx:master Jan 16, 2018

wking deleted the license-text-schema branch January 16, 2018 04:21

This was referenced Jan 16, 2018

Add back LLVM exception for release 3.1 #570

Merged

Makefile: Use Travis to run canonical-matching tests #593

Merged

This was referenced Jan 16, 2018

schema/ListedLicense: Drop isDeprecated #583

Closed

schema/ListedLicense: Document our XML semantics #586

Merged

AGPL-1.0: Deprecate AGPL-1.0 in favor of -only and -or-later forms #599

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

schema/ListedLicense: Punt license text into <text> #452

schema/ListedLicense: Punt license text into <text> #452

wking commented Oct 19, 2017

wking commented Oct 19, 2017

wking commented Nov 8, 2017

goneall commented Nov 15, 2017

wking commented Nov 15, 2017

goneall commented Nov 15, 2017

bradleeedmondson commented Nov 15, 2017

wking commented Nov 15, 2017

goneall commented Nov 15, 2017

wking commented Dec 29, 2017

bradleeedmondson commented Jan 12, 2018

wking commented Jan 12, 2018

goneall commented Jan 13, 2018

wking commented Jan 15, 2018

goneall left a comment

goneall commented Jan 16, 2018

schema/ListedLicense: Punt license text into <text> #452

schema/ListedLicense: Punt license text into <text> #452

Conversation

wking commented Oct 19, 2017

wking commented Oct 19, 2017

wking commented Nov 8, 2017

goneall commented Nov 15, 2017

wking commented Nov 15, 2017

goneall commented Nov 15, 2017

bradleeedmondson commented Nov 15, 2017

wking commented Nov 15, 2017

goneall commented Nov 15, 2017

wking commented Dec 29, 2017

bradleeedmondson commented Jan 12, 2018

wking commented Jan 12, 2018

goneall commented Jan 13, 2018

wking commented Jan 15, 2018

goneall left a comment

Choose a reason for hiding this comment

goneall commented Jan 16, 2018