Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

schema/ListedLicense: Punt license text into <text> #452

Merged
merged 2 commits into from
Jan 16, 2018

Conversation

wking
Copy link
Contributor

@wking wking commented Oct 19, 2017

As it stands, there's not an easy way to extract the license text from the XML format without removing <crossRefs>, <notes>, <standardLicenseHeader>, and other siblings. With this change (which, if accepted, I still have to propagate into src/) we collect it in <text>, just like we already collect the header text in <standardLicenseHeader>.

This is a WIP, because of the pending src/ migration. If this PR gets a green light, I'll go ahead and migrate src/ before we merge it.

@wking
Copy link
Contributor Author

wking commented Oct 19, 2017

I've added a WIP commit showing the changes to Apache-2.0 and u-boot-exception-2.0. If you remove all the other files you can use gulp validate to confirm the changes pass the new schema.

wking added a commit to wking/license-list-XML that referenced this pull request Nov 6, 2017
By differentiating between formattedAltText..., which may contain
<alt> and <optional>, and formattedFixedText..., which may not.  The
optionalType is using formattedAltTextGroup, because nesting variable
content inside an optional element can be useful [1,2].  The <alt>
element, on the other hand, specifies all the variable options in its
match attribute, so there's no need for nesting variable elements
inside the alt body.

Also require fixed text for <notes>, since those aren't templates.
And drop the unused notesType, since they're vanilla formatted text.

We almost certainly want to drop titleText and copyrightText fro the
formatted*TextGroup choices, but I'm leaving that for [3].

[1]: spdx#393
[2]: spdx#462
[3]: spdx#452
@wking
Copy link
Contributor Author

wking commented Nov 8, 2017

Rebased around #475 with 619e2f73eee08f.

@goneall
Copy link
Member

goneall commented Nov 15, 2017

I think this is a good improvement (it will simplify some of the code in the tools).

One consideration is the name of the field. In the JSON and RDF/XML formats of the license data, the license text uses a property name of licenseText (see the Accessing SPDX License Document for reference). The exceptions use licenseExceptionText.

For consistency, we could use the same terms.

@wking
Copy link
Contributor Author

wking commented Nov 15, 2017

One consideration is the name of the field. In the JSON and RDF/XML formats of the license data, the license text uses a property name of licenseText (see the Accessing SPDX License Document for reference). The exceptions use licenseExceptionText.

I'm generally in favor of simple names (like text) where the full context is available via the element context. So <license><text>…</text></license> would be license text and <exception><text>…</text></exception> would be exception text. But I'm not going to block on that preference; if you don't find my reasoning convincing and really prefer licenseText and licenseExceptionText, let me know, and I'll update this PR.

@goneall
Copy link
Member

goneall commented Nov 15, 2017

I'm generally in favor of simple names (like text)

I was also thinking of the advantages of a simpler name. It would actually be easier to implement with a simple name since there is common code for parsing the license and exception XML.

The history on the longer names comes from the use in tag/value which doesn't always have a hierarchical structure to pull context from. Since we don't really have that issue here, I'm leaning toward text myself. Let's just confirm on the legal call on 21 Nov. to make sure no one has any heartburn over this approach. We can also close on adding this in before the release.

@bradleeedmondson
Copy link
Contributor

I'm generally in favor of simple names (like text) where the full context is available via the element context. So … would be license text and … would be exception text.

+1 for this approach, as long as we're sure this doesn't introduce ambiguity with other formats downstream from the XML. In other words, the RDF/JSON/etc. still has the complete context.

@wking
Copy link
Contributor Author

wking commented Nov 15, 2017

...as long as we're sure this doesn't introduce ambiguity with other formats downstream from the XML.

Whatever tool is converting from XML can translate tag names too, so I don't think this will be a problem.

@goneall
Copy link
Member

goneall commented Nov 15, 2017

as long as we're sure this doesn't introduce ambiguity with other formats downstream from the XML.

I'll make sure the tool retains the same format when translating to the JSON and other formats.

@wking
Copy link
Contributor Author

wking commented Dec 29, 2017

Rebased around #519's required addition with 3eee08fa31d982.

@bradleeedmondson
Copy link
Contributor

Marking for 3.1 per @wking and @goneall request

@wking
Copy link
Contributor Author

wking commented Jan 12, 2018

I can rebase this and inject the remaining <text> tags whenever. Let me know when would be easiest for review.

@goneall
Copy link
Member

goneall commented Jan 13, 2018

Let me know when would be easiest for review.

Two considerations on timing I can think of - possible merge conflicts and impact on the tooling.

I'm thinking we start rebasing and reviewing soon.

For the possible merge conflicts with existing PR's: Most of the PR's are @wking and I can take care of any conflict with #570 so those won't be an issue. The other PR's are #587 , #551 and #489. This seems a small enough set to be manageable.

For the tooling: The tool which tests the licenses texts will need to be updated with this XML schema change. I suggest we review and merge these changes before we enable the license text testing in PR #593 This way, I can update the tool before we start using it.

As it stands, there's not an easy way to extract the license text from
the XML format without removing <crossRefs>, <notes>,
<standardLicenseHeader>, and other siblings.  With this change (which,
if accepted, I still have to propagate into src/) we collect it in
<text>, just like we already collect the header text in
<standardLicenseHeader>.

Also use <all> instead of <choice> for <LicenseType> children.  With
that, <crossRefs> could have occured multiple times, etc.  With this
change, it can only occur once, although the children can still appear
in any order (we'd use <sequence> if we cared about child order).
Catch up with the previous commit's schema change.
@wking wking changed the title WIP: schema/ListedLicense: Punt license text into <text> schema/ListedLicense: Punt license text into <text> Jan 15, 2018
@wking
Copy link
Contributor Author

wking commented Jan 15, 2018

Rebased onto master and added <text> to all the other licenses and exceptions with a31d982c478ba1. I've removed the “WIP” from the PR subject, and this is ready for review and merging.

Copy link
Member

@goneall goneall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whew! All looks good.

@goneall
Copy link
Member

goneall commented Jan 16, 2018

Thanks @wking !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants