Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty versions and object stores #535

Closed
pwinckles opened this issue Mar 23, 2021 · 8 comments
Closed

Empty versions and object stores #535

pwinckles opened this issue Mar 23, 2021 · 8 comments
Milestone

Comments

@pwinckles
Copy link

The spec states that:

  1. Version directories do not need to include a copy of the inventory
  2. Version directories must not contain files other than the inventory and sidecar
  3. The version content directory should not exist if the version has no content
  4. Every version must be represented as a directory within the object root

It would seem that this behavior can only be supported on filesystem based implementations. That is to say, it is not possible to represent a version in, say S3, that does not include any content changes and does not store a copy of the version's inventory.

@julianmorley
Copy link
Contributor

You're right, but the "inventories are optional in version dirs" only exists to enable backporting of Moab versions (which are presumed to be WORM). We would expect that a new implementation would be creating inventories in every version directory. The spec says:

Additionally, every version directory SHOULD include an inventory file that is an Inventory of all content for versions up to and including that particular version.

The definition of SHOULD from [https://tools.ietf.org/html/bcp14] that applies is:

SHOULD   This word, or the adjective "RECOMMENDED", mean that there
   may exist valid reasons in particular circumstances to ignore a
   particular item, but the full implications must be understood and
   carefully weighed before choosing a different course.

So I feel that 'caveat emptor' is fine here. In Moab's case, we have a good reason and understand the implications. But if you don't have a good reason, you should always have an inventory file in your version dir.

@pwinckles
Copy link
Author

pwinckles commented Mar 23, 2021

I completely agree that versions should contain an inventory copy. I was just pointing this out because not following this SHOULD could produce invalid OCFL objects, unlike most, if not all, of the other SHOULDS in the spec, for example using sha256 instead of sha512.

@julianmorley
Copy link
Contributor

I don't see how it can produce invalid objects? You'd end up with an object that'd raise a WARN for not conforming with the SHOULD in the spec, but that's not a failure.

@pwinckles
Copy link
Author

The following will produce an invalid object:

  1. Store objects in S3
  2. Do not write a copy of the inventory in the version directory
  3. Create an object version that does not include any content changes

Let's say that this creates v2 of the object. The root inventory now references the head version as v2, but there is no v2 directory in the OCFL object as required by:

Each object version is stored in a version directory under the object root. The sequence of version numbers is the sequence of positive, base-ten integers: 1, 2, 3, etc., and the version directory name is constructed by adding the prefix v. The version number sequence MUST start at 1 and MUST be continuous without missing integers.

@julianmorley
Copy link
Contributor

Well, I think in that circumstances, a couple of things will happen:

  1. The implementors will have deliberately chosen to use an object constructor that ignores the SHOULD
  2. The implementors did not fully consider or anticipate the implications of their object creation workflow with their chosen backing storage (see: 1.)
  3. The implementors should realize they made a mistake and pick a better constructor for their particular circumstances.

Such an implementing organization is welcome to drive past a caution sign, we're assuming they know what they're doing. If they don't, I don't think that's a problem with the spec.

In the particular S3 situation described above, I strongly suspect the constructor will error out before it declares the object as finished. At some point during serialization it's going to be asking an S3 API to create an empty dir, and the S3 API is going to error out. If it choses to ignore that error and continue on, that's also not a problem with the spec.

@zimeon
Copy link
Contributor

zimeon commented Mar 24, 2021

I think the key thing is that any implementation following the SHOULD to have an inventory with each version will create objects that are fine with filesystems and current object stores (including S3). As @julianmorley pointed out, the key reason for that SHOULD rather then MUST was to allow for legacy content. I hope/trust it will be standard practice for any new implementations to have an inventory with each version.

@pwinckles
Copy link
Author

Sure, I agree that every version should have an inventory. I only brought it up because it seemed substantively different than the other SHOULDS in the spec in regards to the consequences of not following the SHOULD resulting in invalid objects. No problem if this is not of concern.

@julianmorley
Copy link
Contributor

Closing this based on @pwinckles comment; I don't think there's anything actionable here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants