-
Notifications
You must be signed in to change notification settings - Fork 46
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updates to core budget block #378
Comments
I'm not sure I'm clear on the motivation for this. A lot of other IDs in OCDS can be string or integer.
We can't actually make the change to the schema and be backwards compatible, because any data that uses an integer would have been valid, but now invalid. We could possibly think about deprecating the use of integers here though. |
The schema also makes reference to the budgtet data package in contract.implementation.transactions.id and contract.implementation.transactions.source we should also update the text for these fields |
@Bjwebb I had understood that the mixing of strings and integers as possible field values was an issue for tools like flatten-tool. However, if not, happy to leave this / just deprecate use of integers to slowly move to string-only to avoid placing extra requirements on future tools to handle both. |
I think this is possibly more of an issue for tabulate, and any other attempt to import OCDS into a database, than flatten-tool specifically. I count 12 identifiers in OCDS that can be string or integer, and this issue only addresses 2 of them. As far as I can tell, they all have the same problem, so should any proposed change apply to all of them? |
Hi @timgdavies Please do @pwalsh or @akariv for the FDP team to jump in here - I was not aware of this discussion. We can easily add a transaction ID to Fiscal Data Package. We do have a very strong preference to implementing data models that reflect the data we actually see. I can't remember ever seeing a budget document with transaction identifiers. I have seen spending data with transaction identifiers. Have you got some examples of budgets with transaction identifiers, so we could have a look and understand how this linkage will work with real data? About URIs, the data package specifications don't assume that all data is available on persistent, immutable, publicly accessible URIs. While that would be desirable, we don't want to enforce it at the level of the specification. I don't see this as a blocker, however - a transaction id could be a string that is a URI, or not. |
Thank for exploring this. Our approach has always been that open data specifications should seek to balance publisher and user needs. Just because data has never been represented a particular way to serve internal government needs does not mean that representation will not be important for consumers of the open data version of that dataset - and so specification development offers an opportunity for a conversation that connects data from inside governments, with user needs outside. Of course, there should be care taken not to invent new data requirements without being clear on their feasibility - but including identifiers is, I would argue, a very important part of creating an ecosystem of distributed open data. On budgetsMy understanding (from conversations with budget specialists... not direct experience), is that it should be possible to construct identifiers for budget lines from the various budget-line components and classifications. I.e. Budget items may not have an existing ID, but a composite ID could be created for them. I'm not certain that this would yield unique budget line identifiers (e.g. an identifier might span multiple budget lines), but even in that case it could be useful to help connect contracts back to their budget sources. On URIsAgreed that immutable URIs cannot be enforced at the level of the specification - but unless there is guidance on this we can point to, it makes it hard to reference FDP specifically. |
Agreed on including identifiers in theory. It is just that, in the absence of actual identifiers from the source, one goes down other paths which may or may not have the desired effect. e.g.: using the OpenSpending internal identifier as the transaction ID for a budget line - one could wonder if this is a good thing. From your perspective, I might guess that it is a good thing - OpenSpending is designed as a persistent, web-accessible service, and therefore can be a URI provider. However, what then is the relationship of this representation of the data to the source of it? Composite keys: It is actually extremely complex, and I've explored this quite deeply. Even hashing all the values of a budget line does not guarantee uniqueness in a single data source (I've had real examples from UK govt. where multiple transactions for a single department in a given month are identical, and definitely different transactions), let alone globally. One can even add the row number in a source file to the hash, which would give source-level uniqueness - then, one is confronted with the problem of updates at the source - if the row number of the same line of data changes, is it still the same line of data? These possibilities alone make it close to impossible to uniquely identify any budget or spending line if the source does not provide an internal transaction id. |
Thanks @pwalsh In the context of OCDS, we ask publishers to link out to additional contextual budget information. In the update proposed for 1.1, we would have a situation where:
As I understand, because of the way FDP has evolved, we would still need to remove the formal reference to it from OCDS, but could include a link out to some guidance / blog posts / other content showing the different potential ways (or ideally examples of practice) for people making this linkage work. Agree there is complexity here - but there are also important use-cases of being able to track between contracts and budgets that can't be served without encouraging publishers to find an approach to creating identifiers. We face the same challenges with the concept of a 'procurement process', where user needs call for the ability to link tenders, contracts and awards - but many prior systems don't clearly link these. The role of the spec in this case is to show what users need and to encourage publishers to find approaches to meet that need. |
On this, I 100% agree on the use cases. However, if the source data can't meet this promise, a spec can't either. And, that is why composite keys in this regard are actually dangerous in terms of the goals we'd want to achieve - they can't even guarantee uniqueness in a single dataset, and therefore encouraging their usage could be misleading or worse.
Either sounds fine to me (formal reference, or not), but, if we made an official "semantic type" of transaction ID, not as a |
Thanks @pwalsh On the second point first - a On the first point - I think it's important to understand specifications as part of a dynamic system of data production: the underlying data and it's features are not static. We've seen how people adapt systems and data to meet specifications, and so a spec (in the context of |
Great @timgdavies I'll talk with @akariv on getting a transaction id type added. @akariv anything to add here in this context? |
Flagging that Budget Data Package is also referenced in the
Based on the discussion above I think this just needs updating to read "Fiscal Data Package" rather than "Budget Data Package" which I will include in the updates to the transaction block in #372 |
During peer-review there was a request for a minor revision:
However, the schema states for
As human readable documents can be included in the planning.documents block, we don't propose to update the schema guidance. |
This issue is under consideration for updates to the core OCDS standard in 1.1
It sits alongside proposals for a substantial extension to budgets which would allow multi-year and multi-source budgets to be captured. See #377
It builds upon past discussion in #345
The issue
In the current version of OCDS we have a very simple budget block which talks of linking out to the 'Budget Data Package' for more in-depth information.
However, the Budget Data Package has been superseded by the Fiscal Data Package which does not currently have the concept of a transaction identifier and for which current publication approaches do not focus on providing data at a stable URI.
This makes cross-linking between OCDS and FDP challenging.
We also see
budget.source
, intended as a link to the BDP in which an identified budget line item would exist, commonly mis-used to provide a named budget line or name of a department providing budget.The schema allows both string and uri formats for
budget.source
which may be the source of this confusion. Changingbudget.source
to only a URI would not be backwards compatible.The proposal
We propose
budget.source
field;We will also update the valid types for
id
andprojectID
to strings only (they are currently able to be string or integer).Suggested draft documentation updates for the
uri
field are belowCurrent text:
Proposed update:
Discussion
We will explore guidance to the effect that URIs should include a # component indicating the identifier of the particular budget line item.
For example, if a contract is funded through the DFID aid project with the IATI Identifier 'GB-1-107171-101' then a contract process planning record could cross-reference this by:
planning.budget
blockhttp://iati.dfid.gov.uk/iati_files/Country/DFID-Afghanistan-AF.xml#GB-1-107171-101
in theuri
fieldAn application would need to be 'IATI aware' to understand that #GB-1-107171-101 refers to an entry in the XML file found at that URL with GB-1-107171-101 as the value of //iati-activities/iati-activity/iati-identifier
The updated approach in the Fiscal Data Package does not currently appear to offer either:
(a) Stable URIs for packages;
(b) Line-item identifiers;
which makes this approach very difficult.
See also
Budget breakdown in #377
Questions
uri
might also link to project information, and makes this budget specific.Engagement
Please indicate support or opposition for this proposal using the +1 / -1 buttons or a comment. If opposing the proposal, please give clear justifications, and where possible, make an alternative proposals.
Views on the discussion points are welcome.
The text was updated successfully, but these errors were encountered: