-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove batch operations from the specification #76
Remove batch operations from the specification #76
Comments
We put batch operations in because people do have real use cases for them. I agree that they don't make sense in a REST architecture and never will.
But if we want to discard them, I think we need to talk about how those needs get satisfied.
… On Mar 17, 2017, at 9:29 PM, Aaron Coburn ***@***.***> wrote:
Two of the main issues I have with batch operations are:
1). If a mandatory section of the specification is extremely difficult to implement (and in a distributed context, I can assure you that it is extremely difficult to implement), then keeping the section will weaken the entire spec because it makes it harder to implement the full spec in the first place.
2). The interaction model for batch operations is defined, but the actual behavior of batch operations is left entirely undefined, especially in the context of interleaving operations.
For example, it should be clear from the specification how the following sequence of events turn out (and I have no idea what the "right" answer to this is):
Starting with the resource /foo and two triples:
</foo> dc:title "Foo" .
</foo> dc:alternate "Foobar" .
In a single node context (assuming all interactions use the appropriate If-Match or If-Unmodified-Since headers), given the following sequence of events, what happens?
Scenario 1:
Client 1: (in the context of a batch operation):
PATCH /foo 16:00:00
DELETE WHERE {
</foo> dc:alternate ?o .
}
(commit) 16:00:02
Client 2 (not in the context of a batch):
PUT /foo 16:00:01
</foo> dc:title "Bar" .
</foo> dc:alternate "BarFoo" .
Does the commit operation succeed? What is the state of the resource at 16:00:03?
Scenario 2:
PATCH /foo 16:00:00
DELETE WHERE {
</foo> dc:alternate "Bar" .
}
(commit) 16:00:02
Non-Batch:
PUT /foo 16:00:01
</foo> dc:title "Bar" .
</foo> dc:alternate "Foo" .
Again, does the commit operation succeed? And what is the state of the resource at 16:00:03?
In a multi-node context, this becomes arbitrarily more complex. What about four PUT operations on the same resource:
Client 1 (batch):
PUT 16:00:00
</foo> dc:title "Title1" .
(commit) 16:00:02
Client 2 (batch):
PUT 16:00:00
</foo> dc:title "Title2" .
(commit) 16:00:03
Client 3 (non-batch):
PUT 16:01:00
</foo> dc:title "Title 3" .
Client 4 (operating on a node with an incorrect clock time -- delta ~ 2 minutes):
PUT 16:02:00
</foo> dc:title "Title 4" .
Even with a "last-write-wins" policy, how do you define the "last write"? Is it the greatest clock time? Is it the time the last commit was applied?
And what if you add the deletion of a node into this?
Client 1 (batch):
PUT 16:00:00
</foo> dc:title "Title1" .
(commit) 16:00:02
Client 2 (non-batch):
DELETE 16:00:01 /foo
I can go on with these examples, but here is one more:
Client 1 (batch):
PATCH 16:00:00
INSERT {
</foo> dc:subject <http://example.org/subj/1> .
} WHERE {}
Client 2 (batch):
PATCH 16:00:00
INSERT {
</foo> skos:prefLabel "something" .
} WHERE {}
Client 3 (non-batch):
PUT 16:00:01
dc:title "title5" .
Client 4 (non-batch):
PATCH 16:00:02
INSERT {
skos:prefLabel "something else" .
} WHERE {}
Client
If each client issues a GET request at 16:00:02, what are the responses?
Given this level of ambiguity, I have no idea how the existing specification of "batch operations" would make behavior even remotely consistent across different applications.
As described above, I am in favor of dropping batch operations from the spec entirely. I think the only reasonable alternative is to be so specific in the definition of a batch operation that all of the above examples are clearly answerable in a consistent manner. If we cannot have a consistent definition of "batch operation" behavior, then how can one define a "passing test"?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub, or mute the thread.
|
At this point, the spec is ambiguous as to the purpose of batch operations. The use cases on the wiki look out of date, or only minimally related to batch operations. It's hard to tell. I will note that any use cases for fedora 4 transactions I've had in the past have all been related to rollback functionality; i.e. if an unrecoverable error is encountered while performing a related sequence of operations, it is convenient to just throw away everything that just occurred and leave the repository in a clean state. There's a good discussion here from January 2016 on fedora-community and fedora-tech, which I think lead to the consensus weakening of "transactions" to "batch atomic operations" in the first place. For some reason, "transactions optional" and "memento for versioning" messages are intermingled with one another in the thread: There were some use cases from ICPSR from @ummerh and from UVa |
👍 for moving batch operations to |
Being one of the people that originally provided a use case for batch operations, I must confess that I've had a change of heart over the course of islandora development. I've significantly altered my approach to avoid them, and will rely on either transactions either at the JMS level or some other Camel tricks to ensure I can restart failed multi-step workflows safely. I lose the ability to rollback (e.g. the half-baked structures will still exist until I fix my problem), but find that trade-off acceptable if it eases scaling. BUT, I think they are still of merit to others, and am sure that many Fedora users are currently relying on them. So 👍 for |
-1 to using |
Batch/atomic transactions are certainly a high bar which should be justified by strong use cases (and it seems that isn't the case). Whether made |
Following this and am -1 to using MAY for an entire section of the spec as well. |
@zimeon The problem is not test-ability. It is replace-ability. If the test is negative, the client is now on the hook to operate in some (non-obvious, case-specific) manner that obviates the need for the facility. If that is the practical upshot, what is the value of an "optional" facility? Clients that expect to act against a generic Fedora API will have to work out their own "atomicity" facilities for their use cases. No assumption is possible. I'm not disagreeing with your claim about test-ability. I am claiming that "made |
@ajs6f - I was just looking at whether a compliance tool (or client) could tell cleanly whether the facility for atomic operations were supported. I think it is OK, and I think that the same logic applies whether or not this is IMO, the |
It would be really nice if it were possible to narrow the scope or functionality enough such that batch atomic ops weren't such a burden. From a client perspective, I think atomic batch ops help to reduce the cognitive load of building applications against the repository. Absent sufficient narrowing, then removal is the next logical option. |
The alternatives appear to be keep it or remove it entirely. I have put the question on the next tech meeting agenda. |
@no-reply ? @barmintor ? from the perspective of your own Fedora implementations, it would be value to get your thoughts on maintaining or removing the Batch Atomic Operations element of the specification. |
This is why I am more interested in specifying conformant behaviors for messaging,batch,versioning, etc and their advertising than I am in talking about whether a particular behavior is required or not, which I think is the client's business. |
For my part, using a Blazegraph backend gives you some tx support that makes this spec look pretty achievable, tho like MODE the binary component is a problem. On the other hand, if it's not being used in Hydra I would not put a high priority on its implementation. |
At UMD, we have relied on transactions in our batch loading process to make the logic in the client simpler and easier to follow. To that end, I echo what @birkland said above about atomic operations significantly easing the cognitive load of client development, especially long-running batch processes which may be running unattended for hours. However, I do understand @acoburn's point about transactions being difficult or impossible to do in distributed or horizontally-scaled implementations. I certainly don't want the Fedora spec to stand in the way of these sorts of implementations. I don't share @ajs6f's strong objection to make atomic operations a My proposal: If we remove atomic operations from the Fedora spec, move the atomic operations section into its own "microspec" (e.g., "Atomic-LDP"). Or, find a way to model transactions that fits with an existing REST/LDP spec. That way, implementations that want to support atomic operations can have a standard that describes how to do it, but it is not part of the Fedora API so implementations like @acoburn's would not be burdened with implementing it. (Unfortunately, I cannot be at today's tech call, as I am at the DC FUG right now.) |
Based on today's discussion in the Fedora Tech Call, I'll put in a PR later today that pulls out Atomic Batch Operations. @awoods, can you create a new repo for "Atomic-LDP", and I'll do the initial PR there with what as removed, along with respec boilerplate. |
* Resolves fcrepo#76
PR: #79 |
@ruebot : does it need to be a new repo? or just a new document in: |
@awoods I think it should be a new repo, since the idea -- at least my interpretation of the meeting and @peichman-umd's above comment -- is that this is a separate specification. |
* Remove "Atomic Batch Operations" * Resolves #76
No description provided.
The text was updated successfully, but these errors were encountered: