Skip to content
This repository has been archived by the owner on Jun 18, 2024. It is now read-only.

Deprecate or clarify use of accessURL/format outside of the distribution array #217

Closed
JoshData opened this issue Dec 5, 2013 · 11 comments

Comments

@JoshData
Copy link
Contributor

JoshData commented Dec 5, 2013

If you have two accessURLs, should you do this:

{
   "accessURL": "url1",
   "distributions": [
     { "accessURL": "url2" }
   ]
}

or

{
   "accessURL": "url1",
   "distributions": [
     { "accessURL": "url1" },
     { "accessURL": "url2" }
   ]
}
@kvuppala
Copy link

kvuppala commented Dec 5, 2013

I would assume distribution to have have everything, and the primary one can be used at the root level.

below are the usage instructions for the field from the POD schema page.

Field distribution
Cardinality (0,n)
Required No
Accepted Values See Usage Notes
Usage Notes Distribution is a concatenation, as appropriate, of the following elements: accessURL and format. If an entry has only one dataset, enter details for that one; if it has multiple datasets (such as a bulk download and an API), separate entries as seen below:

"distribution": [
{
"accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.csv?accessType=DOWNLOAD",
"format": "text/csv"
},
{
"accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.json?accessType=DOWNLOAD",
"format": "application/json"
},
{
"accessURL": "https://explore.data.gov/views/ykv5-fn9t/rows.xml?accessType=DOWNLOAD",
"format": "text/xml"
}
]

@mhogeweg
Copy link
Contributor

mhogeweg commented Dec 5, 2013

+1 on approach 2. one thought on the distribution element. if you look at the results in data.gov, many items have 10 or more links, many of which are not to a downloadable file, but to a web page, or application (another web page) or web service. would help client app developers tremendously if the accessURL can be qualified: download, online access, info page, organization page, ... (some to be defined vocabulary).

@gbinal
Copy link
Contributor

gbinal commented Dec 10, 2013

@JoshData - Approach 2.

I would actually think that the top level accessURL should be removed, though, or do you think that is a mistake?

@mhogeweg - What you're referencing is addressed in the schema guidance and is being affirmed to agencies in feedback about their data.json files. Documentation is not a guarantee against bad metadata, whether it's flawed implementation now or just left over from an earlier era. My suggestion is that any time you find bad metadata at agency.gov/data.json, you use the feedback mechanism that should be at agency.gov/data to report the problem. Data.gov is undergoing a major overhaul this month that will migrate it to presenting solely results of harvested data.json files, so any bad metadata you're finding at data.gov should go away soon, or at least be traceable to it's source.

@JoshData
Copy link
Contributor Author

@gbinal Thanks.

If I remember, I'll open a PR to mention this in the schema doc. Can we keep this issue open until it's clarified in the docs?

Agreed about removing the top-level accessURL. There is something nice, perhaps, about having a "primary" distribution, but if that's a goal it should be done inside distributions.

@philipashlock
Copy link
Contributor

My preference would be to not use accessURL outside of distribution whatsoever. Obviously it's too late to define that in the schema, but would it be too extreme to provide documentation that states that it's recommended (but not required) for accessURL to always be defined in a distribution? Either way, I would prefer accessURL to not be used on its own if distribution is used

@philipashlock philipashlock added this to the Next Version of Common Core Metadata Schema milestone May 8, 2014
@philipashlock philipashlock changed the title For multiple accessURLs, does the top-level accessURL repeat a distribution or is it another distribution? Deprecate or clarify use of accessURL/format outside of the distribution array Jul 17, 2014
@philipashlock
Copy link
Contributor

Just to reiterate, my recommendation is for accessURL and format to always be used within the distribution array for future versions of the schema.

@smrgeoinfo
Copy link
Contributor

+1 on that. I also think that for machine-actionable metadata, the distribution also needs something like the atom 'rel', and the target attributes along the line described in section 5.4 of IETF5988 and ContentModelForLinks (sorry about the docx format!)

@lilybradley
Copy link

currently, considering Issue #291 in conjunction with this one: #291

@philipashlock philipashlock modified the milestone: Next Version of Common Core Metadata Schema (1.0 -> 1.1.) Jul 24, 2014
@gbinal
Copy link
Contributor

gbinal commented Aug 25, 2014

gbinal added a commit that referenced this issue Sep 8, 2014
In response to #217, #248

I still need to update the expanded guidance
gbinal added a commit that referenced this issue Sep 8, 2014
@gbinal
Copy link
Contributor

gbinal commented Sep 8, 2014

This is addressed by baa0178 and by cd7a527

rebeccawilliams pushed a commit that referenced this issue Oct 2, 2014
Changes that still need to be addressed are changes in structure and should we add usage notes additions here or no?:

* Adds optional describedByType field at the dataset and distribution level (#291, #332)
* Changes contactPoint field to an object that contains the name (fn) and email address (hasEmail) (#358)
* Adds fn field as part of contactPoint replacing earlier use of contactPoint (#358)
* Changes publisher field to an object that allows multiple levels of organizations (#296)
* Changes accessURL field to represent indirect access and to exist only within distribution (#217, #335) 
* Changes format field to a human readable description and to exist only within distribution (#272, #293)
* Adds optional description field for use within distribution (#248)
* Adds optional title field for use within distribution (#248)
* Changes accrualPeriodicity field to use ISO 8601 date syntax (#292)
* Changes distribution field to become required-if-applicable and to always contain the accessURL or downloadURL fields (#217)
* Changes license field to be a URL (#196)
@gbinal
Copy link
Contributor

gbinal commented Nov 7, 2014

Thank you for driving the conversation around this issue and helping to assemble the v1.1 metadata update.

There appears to be strong consensus around this issue, which has been accepted in the v1.1 update and merged into Project Open Data. Project Open Data is a living project though. Please continue any conversations around how the schema can be improved with new issues and pull requests!

It's important for government staff as well as the public to continue to collaborate to make the Open Data Policy ever better. Though the v1.1 update is a substantial update, future iterations do not have to be, so whatever your ideas - big or small - please continue to work with this community to improve how government manages and opens its data.

@gbinal gbinal closed this as completed Nov 7, 2014
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

7 participants