Remove batch operations from the specification #76

acoburn · 2017-03-17T22:12:10Z

No description provided.

ajs6f · 2017-03-18T12:01:59Z

We put batch operations in because people do have real use cases for them. I agree that they don't make sense in a REST architecture and never will. But if we want to discard them, I think we need to talk about how those needs get satisfied.

…

On Mar 17, 2017, at 9:29 PM, Aaron Coburn ***@***.***> wrote: Two of the main issues I have with batch operations are: 1). If a mandatory section of the specification is extremely difficult to implement (and in a distributed context, I can assure you that it is extremely difficult to implement), then keeping the section will weaken the entire spec because it makes it harder to implement the full spec in the first place. 2). The interaction model for batch operations is defined, but the actual behavior of batch operations is left entirely undefined, especially in the context of interleaving operations. For example, it should be clear from the specification how the following sequence of events turn out (and I have no idea what the "right" answer to this is): Starting with the resource /foo and two triples: </foo> dc:title "Foo" . </foo> dc:alternate "Foobar" . In a single node context (assuming all interactions use the appropriate If-Match or If-Unmodified-Since headers), given the following sequence of events, what happens? Scenario 1: Client 1: (in the context of a batch operation): PATCH /foo 16:00:00 DELETE WHERE { </foo> dc:alternate ?o . } (commit) 16:00:02 Client 2 (not in the context of a batch): PUT /foo 16:00:01 </foo> dc:title "Bar" . </foo> dc:alternate "BarFoo" . Does the commit operation succeed? What is the state of the resource at 16:00:03? Scenario 2: PATCH /foo 16:00:00 DELETE WHERE { </foo> dc:alternate "Bar" . } (commit) 16:00:02 Non-Batch: PUT /foo 16:00:01 </foo> dc:title "Bar" . </foo> dc:alternate "Foo" . Again, does the commit operation succeed? And what is the state of the resource at 16:00:03? In a multi-node context, this becomes arbitrarily more complex. What about four PUT operations on the same resource: Client 1 (batch): PUT 16:00:00 </foo> dc:title "Title1" . (commit) 16:00:02 Client 2 (batch): PUT 16:00:00 </foo> dc:title "Title2" . (commit) 16:00:03 Client 3 (non-batch): PUT 16:01:00 </foo> dc:title "Title 3" . Client 4 (operating on a node with an incorrect clock time -- delta ~ 2 minutes): PUT 16:02:00 </foo> dc:title "Title 4" . Even with a "last-write-wins" policy, how do you define the "last write"? Is it the greatest clock time? Is it the time the last commit was applied? And what if you add the deletion of a node into this? Client 1 (batch): PUT 16:00:00 </foo> dc:title "Title1" . (commit) 16:00:02 Client 2 (non-batch): DELETE 16:00:01 /foo I can go on with these examples, but here is one more: Client 1 (batch): PATCH 16:00:00 INSERT { </foo> dc:subject <http://example.org/subj/1> . } WHERE {} Client 2 (batch): PATCH 16:00:00 INSERT { </foo> skos:prefLabel "something" . } WHERE {} Client 3 (non-batch): PUT 16:00:01 dc:title "title5" . Client 4 (non-batch): PATCH 16:00:02 INSERT { skos:prefLabel "something else" . } WHERE {} Client If each client issues a GET request at 16:00:02, what are the responses? Given this level of ambiguity, I have no idea how the existing specification of "batch operations" would make behavior even remotely consistent across different applications. As described above, I am in favor of dropping batch operations from the spec entirely. I think the only reasonable alternative is to be so specific in the definition of a batch operation that all of the above examples are clearly answerable in a consistent manner. If we cannot have a consistent definition of "batch operation" behavior, then how can one define a "passing test"? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

ajs6f · 2017-03-18T13:46:31Z

https://wiki.duraspace.org/label/FF/uc-batch

birkland · 2017-03-18T20:33:38Z

At this point, the spec is ambiguous as to the purpose of batch operations. The use cases on the wiki look out of date, or only minimally related to batch operations. It's hard to tell.

I will note that any use cases for fedora 4 transactions I've had in the past have all been related to rollback functionality; i.e. if an unrecoverable error is encountered while performing a related sequence of operations, it is convenient to just throw away everything that just occurred and leave the repository in a clean state.

There's a good discussion here from January 2016 on fedora-community and fedora-tech, which I think lead to the consensus weakening of "transactions" to "batch atomic operations" in the first place. For some reason, "transactions optional" and "memento for versioning" messages are intermingled with one another in the thread:
https://groups.google.com/forum/#!topic/fedora-community/LAsEn46N5N0

There were some use cases from ICPSR from @ummerh and from UVa

ruebot · 2017-03-20T13:03:37Z

👍 for moving batch operations to MAY

dannylamb · 2017-03-20T13:16:51Z

Being one of the people that originally provided a use case for batch operations, I must confess that I've had a change of heart over the course of islandora development. I've significantly altered my approach to avoid them, and will rely on either transactions either at the JMS level or some other Camel tricks to ensure I can restart failed multi-step workflows safely. I lose the ability to rollback (e.g. the half-baked structures will still exist until I fix my problem), but find that trade-off acceptable if it eases scaling.

BUT, I think they are still of merit to others, and am sure that many Fedora users are currently relying on them. So 👍 for MAY, since those using the reference implementation right now will still want them.

ajs6f · 2017-03-20T13:30:53Z

-1 to using MAY for any entire section of the spec. Either remove it (possibly creating a separate spec for it) or don't. MAY should be reserved to qualify or explicate behavior within a particular section of this spec.

zimeon · 2017-03-20T13:50:17Z

Batch/atomic transactions are certainly a high bar which should be justified by strong use cases (and it seems that isn't the case). Whether made MAY or removed to a separate spec, support is already quite testable (send request Atomic-Start, see if you get Atomic-ID back. If not, then not supported).

bseeger · 2017-03-20T14:11:52Z

Following this and am -1 to using MAY for an entire section of the spec as well.

ajs6f · 2017-03-20T14:42:49Z

@zimeon The problem is not test-ability. It is replace-ability. If the test is negative, the client is now on the hook to operate in some (non-obvious, case-specific) manner that obviates the need for the facility. If that is the practical upshot, what is the value of an "optional" facility? Clients that expect to act against a generic Fedora API will have to work out their own "atomicity" facilities for their use cases. No assumption is possible.

I'm not disagreeing with your claim about test-ability. I am claiming that "made MAY" and "removed" are not both reasonable choices.

zimeon · 2017-03-20T14:53:13Z

@ajs6f - I was just looking at whether a compliance tool (or client) could tell cleanly whether the facility for atomic operations were supported. I think it is OK, and I think that the same logic applies whether or not this is MAY (in which case one could consider a stronger requirement that non-supporting implementations give 501 in response to any request with an Atomic-Start header) or whether it should be a separate spec (in which case it isn't reasonable to demand anything extra for an implementation that implements on "core" and not "atomic extension"). I agree that a "work around" to provide atomicity over a non-atomic Fedora is no easy thing.

IMO, the MAY vs separate spec choice is really a style issue. However, I lean toward removing it to retain a cleaner notion of what complying with the Fedora API means.

whikloj · 2017-03-20T15:00:13Z

I'll shed a tear for the time spent trying to help with this part of the specification, but it seems to me that removing it entirely is the best choice.
So burn it down!!!

birkland · 2017-03-21T00:33:31Z

It would be really nice if it were possible to narrow the scope or functionality enough such that batch atomic ops weren't such a burden. From a client perspective, I think atomic batch ops help to reduce the cognitive load of building applications against the repository. Absent sufficient narrowing, then removal is the next logical option.

ajs6f · 2017-03-21T14:58:49Z

The alternatives appear to be keep it or remove it entirely. I have put the question on the next tech meeting agenda.

awoods · 2017-03-22T14:06:06Z

@no-reply ? @barmintor ? from the perspective of your own Fedora implementations, it would be value to get your thoughts on maintaining or removing the Batch Atomic Operations element of the specification.

barmintor · 2017-03-22T14:45:45Z

This is why I am more interested in specifying conformant behaviors for messaging,batch,versioning, etc and their advertising than I am in talking about whether a particular behavior is required or not, which I think is the client's business.

barmintor · 2017-03-22T14:52:33Z

For my part, using a Blazegraph backend gives you some tx support that makes this spec look pretty achievable, tho like MODE the binary component is a problem. On the other hand, if it's not being used in Hydra I would not put a high priority on its implementation.

peichman-umd · 2017-03-23T14:27:28Z

At UMD, we have relied on transactions in our batch loading process to make the logic in the client simpler and easier to follow. To that end, I echo what @birkland said above about atomic operations significantly easing the cognitive load of client development, especially long-running batch processes which may be running unattended for hours.

However, I do understand @acoburn's point about transactions being difficult or impossible to do in distributed or horizontally-scaled implementations. I certainly don't want the Fedora spec to stand in the way of these sorts of implementations.

I don't share @ajs6f's strong objection to make atomic operations a MAY level requirement, though I can see the potential for "optionality creep" eventually muddying the spec.

My proposal: If we remove atomic operations from the Fedora spec, move the atomic operations section into its own "microspec" (e.g., "Atomic-LDP"). Or, find a way to model transactions that fits with an existing REST/LDP spec. That way, implementations that want to support atomic operations can have a standard that describes how to do it, but it is not part of the Fedora API so implementations like @acoburn's would not be burdened with implementing it.

(Unfortunately, I cannot be at today's tech call, as I am at the DC FUG right now.)

barmintor · 2017-03-23T15:32:49Z

@acoburn has provided a good example of difficulties in specifying merge behavior here.

ruebot · 2017-03-23T15:36:45Z

Based on today's discussion in the Fedora Tech Call, I'll put in a PR later today that pulls out Atomic Batch Operations. @awoods, can you create a new repo for "Atomic-LDP", and I'll do the initial PR there with what as removed, along with respec boilerplate.

* Resolves fcrepo#76

ruebot · 2017-03-23T15:44:16Z

PR: #79

awoods · 2017-03-23T16:33:32Z

@ruebot : does it need to be a new repo? or just a new document in:
https://github.com/fcrepo/fcrepo-specification ?

ruebot · 2017-03-23T18:45:37Z

@awoods I think it should be a new repo, since the idea -- at least my interpretation of the meeting and @peichman-umd's above comment -- is that this is a separate specification.

awoods · 2017-03-23T18:54:33Z

@ruebot : here it is: https://github.com/fcrepo/fcrepo-specification-atomic-operations

* Partially resolves fcrepo/fcrepo-specification#76

ruebot · 2017-03-23T19:15:36Z

Second PR: fcrepo/fcrepo-specification-atomic-operations#1

* Remove "Atomic Batch Operations" * Resolves #76

whikloj mentioned this issue Mar 18, 2017

Do we need Atomic Operations? Islandora/documentation#557

Closed

acoburn mentioned this issue Mar 21, 2017

Revisit how batch atomic operations are terminated. #75

Closed

ruebot added a commit to yorkulibraries/fcrepo-specification that referenced this issue Mar 23, 2017

Remove "Atomic Batch Operations"

764ac1d

* Resolves fcrepo#76

ruebot mentioned this issue Mar 23, 2017

Remove "Atomic Batch Operations" #79

Merged

ruebot added a commit to yorkulibraries/fcrepo-specification-atomic-operations that referenced this issue Mar 23, 2017

Move Atomic Batch Operations into its own specification.

880816b

* Partially resolves fcrepo/fcrepo-specification#76

ruebot mentioned this issue Mar 23, 2017

Move Atomic Batch Operations into its own specification. fcrepo/fcrepo-specification-atomic-operations#1

Merged

awoods closed this as completed in #79 Mar 24, 2017

awoods pushed a commit that referenced this issue Mar 24, 2017

Remove "Atomic Batch Operations" (#79)

4053b68

* Remove "Atomic Batch Operations" * Resolves #76

zimeon mentioned this issue Jul 2, 2018

Possible future support for compliance levels/profiles for the Fedora API? #388

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove batch operations from the specification #76

Remove batch operations from the specification #76

acoburn commented Mar 17, 2017 •

edited

Loading

ajs6f commented Mar 18, 2017 via email

ajs6f commented Mar 18, 2017

birkland commented Mar 18, 2017

ruebot commented Mar 20, 2017

dannylamb commented Mar 20, 2017

ajs6f commented Mar 20, 2017

zimeon commented Mar 20, 2017

bseeger commented Mar 20, 2017

ajs6f commented Mar 20, 2017

zimeon commented Mar 20, 2017

whikloj commented Mar 20, 2017

birkland commented Mar 21, 2017 •

edited

Loading

ajs6f commented Mar 21, 2017

awoods commented Mar 22, 2017

barmintor commented Mar 22, 2017

barmintor commented Mar 22, 2017

peichman-umd commented Mar 23, 2017

barmintor commented Mar 23, 2017

ruebot commented Mar 23, 2017

ruebot commented Mar 23, 2017

awoods commented Mar 23, 2017

ruebot commented Mar 23, 2017

awoods commented Mar 23, 2017

ruebot commented Mar 23, 2017

Remove batch operations from the specification #76

Remove batch operations from the specification #76

Comments

acoburn commented Mar 17, 2017 • edited Loading

ajs6f commented Mar 18, 2017 via email

ajs6f commented Mar 18, 2017

birkland commented Mar 18, 2017

ruebot commented Mar 20, 2017

dannylamb commented Mar 20, 2017

ajs6f commented Mar 20, 2017

zimeon commented Mar 20, 2017

bseeger commented Mar 20, 2017

ajs6f commented Mar 20, 2017

zimeon commented Mar 20, 2017

whikloj commented Mar 20, 2017

birkland commented Mar 21, 2017 • edited Loading

ajs6f commented Mar 21, 2017

awoods commented Mar 22, 2017

barmintor commented Mar 22, 2017

barmintor commented Mar 22, 2017

peichman-umd commented Mar 23, 2017

barmintor commented Mar 23, 2017

ruebot commented Mar 23, 2017

ruebot commented Mar 23, 2017

awoods commented Mar 23, 2017

ruebot commented Mar 23, 2017

awoods commented Mar 23, 2017

ruebot commented Mar 23, 2017

acoburn commented Mar 17, 2017 •

edited

Loading

birkland commented Mar 21, 2017 •

edited

Loading