Change new OOProc orchestration trigger protocol from JSON to protobuf#2125
Change new OOProc orchestration trigger protocol from JSON to protobuf#2125
Conversation
|
Not sure why the commit history for this PR looks so weird. I may try to recreate my clone of this repo after this PR to see if that resolves it. I don't think it should cause any problems as long as I'm doing a squash-merge. |
cgillum
left a comment
There was a problem hiding this comment.
Adding some context comments.
|
|
||
| <ItemGroup> | ||
| <PackageReference Include="Microsoft.Azure.DurableTask.AzureStorage" Version="1.10.*" /> | ||
| <PackageReference Include="Microsoft.Azure.DurableTask.Core" Version="2.8.0" /> |
There was a problem hiding this comment.
At the time of writing, this package version is currently only on myget.org. We'll need to publish it to nuget.org prior to releasing this extension version.
| Assert.Equal("World", status?.Input); | ||
| Assert.Equal("Hello, World!", status?.Output); | ||
| Assert.Equal("World", (string)status?.Input); | ||
| Assert.Equal("Hello, World!", (string)status?.Output); |
There was a problem hiding this comment.
I ran into an issue in an earlier iteration that caused this test to fail. Unfortunately, xunit doesn't know how to display JValue objects in the error messages (it always just shows "[]") so the test output was unfortunately useless. Casting the JValue properties to string, however, fixes the display problem.
test/SmokeTests/OOProcSmokeTests/DotNetIsolated/HelloCitiesUntyped.cs
Outdated
Show resolved
Hide resolved
|
I noticed some of the tests here are failing. should I start reviewing this or would you prefer I wait until the tests are "green"? |
|
@davidmrdavid feel free to start reviewing. The previous iteration was green and my latest commit was pretty small. I'll investigate what the failures are in the meantime. |
|
Looks like the CI failed because of an Azure Storage SDK deadlock timeout expiration 😬. Re-running it got things green. |
davidmrdavid
left a comment
There was a problem hiding this comment.
Made a first pass through this PR, left some questions. Thanks!
| // The legacy behavior where the DTFx orchestration context schedules the activity results in an input that's | ||
| // wrapped in an array. This is unfortunately quite inefficient because it requires us to do an extra serialization | ||
| // round-trip in order to correctly interpret the input data. |
There was a problem hiding this comment.
Can we do anything to deprecate this legacy inputsAreArrays behavior?
There was a problem hiding this comment.
This is unfortunately embedded deep into DurableTask.Core, which makes changing it more difficult since it impacts all languages. If we remove the DurableTask.Core dependency from the .NET Isolated SDK, then we can remove this unnecessary overhead (similar to how Java can bypass this stuff).
There was a problem hiding this comment.
Thanks for the explanation. My 2 cents is that it would be great if we could document this blocker in the code, as I don't think the impact is immediately obvious :)
test/SmokeTests/OOProcSmokeTests/DotNetIsolated/HelloCitiesUntyped.cs
Outdated
Show resolved
Hide resolved
davidmrdavid
left a comment
There was a problem hiding this comment.
Thanks for the answers so far @cgillum. My only blockers at this point are:
(1) documenting why inputsAsArray is hard to deprecate in code
(2) looking into fixing the \nStack-parsing edge case
Thanks!
3393784 to
e3a907c
Compare
|
I somehow messed up my local branch history and had to do a force-push of the latest iteration. The only notable change was related to this comment:
I changed Regarding this:
I don't recall exactly what the state of the code was when you requested this documentation. I feel the current set of comments are good enough, though. I don't think it's appropriate to go as far as talking about why it is/isn't hard to deprecate something in DTFx, though. That kind of comment belongs in DTFx itself. I hope you're good with this latest update, assuming all tests pass! |
Overview
This PR changes the orchestration trigger payload protocol from JSON strings to protobuf strings. This includes both the request that gets sent to the language worker and the response that comes back. This only applies to the newer OOProc languages, .NET Isolated and Java.
The other big change in this PR is in the way we interpret errors from the OOProc worker. Previously, we made no attempt to interpret error messages. With this PR, we make a point of parsing out the error message and the stack trace details from the
RpcExceptions that we get when an activity function fails.Why are we making this JSON --> protobuf change?
The old JSON schema for the orchestration trigger payload was problematic from the perspective of SDK code reuse. The newer OOProc SDKs are designed to work with a sidecar over gRPC. Having JSON as the protocol for Functions would then require use to either translate the JSON into protobuf or to translate the JSON into the SDK objects directly. This translation would be a lot of one-off code in each language SDK just to support functions. I deemed it much more economical to instead change the Durable extension to use the protobuf format instead so that language SDKs wouldn't need to do any protocol translation.
This change has the potential of also increasing efficiency since protobuf is much more compact and faster to serialize/deserialize compared to JSON. The downside is that it's much more difficult to debug if unexpected data is received at either end. This tradeoff seemed acceptable, however.
Why are we making this error parsing change?
One of the features I'm working on for .NET Isolated and Java is the ability to create custom retry handlers. In order to create a reasonable experience, the error information from an activity needs to be in a format that a developer can reason about. This was not previously possible since the
RpcExceptiongiven to the extension contained unstructured error information. As part of Azure/durabletask#681, we updated DT.Core to support structured failure details. In this PR, we parse the exception to construct a newTaskFailureDetailspayload, which is a structure that defines an error in a language neutral way. Having this payload available to orchestrations enables developers to write much simpler error handling logic.Pull request checklist
pending_docs.mdrelease_notes.md/src/Worker.Extensions.DurableTask/AssemblyInfo.cs