-
Notifications
You must be signed in to change notification settings - Fork 6
docs: add language sdk specification #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| Invocation 1: | ||
| - Load state: [] | ||
| - Start STEP(id="step1") | ||
| - Checkpoint: START step1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess we need more guidance here around:
- AT-LEAST/MOST-ONCE
- batching/optimizations
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - that would be a good idea.
| - Be checkpointed and resumed | ||
| - Maintain execution state across interruptions | ||
|
|
||
| The two core durable operation primitives are: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are 5 primitives (if you ignore the EXECUTION operation which is only used to complete the execution):
CALLBACKCHAINED_INVOKECONTEXTSTEPWAIT
|
|
||
| For correct replay behavior, **user code MUST be deterministic**: | ||
|
|
||
| 1. Non-durable code (code outside operations) MUST execute identically on each replay |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing we should note is that this may require re-implementing/providing alternatives for certain language constructs that are inherently nondeterministic.
For example in Java unless you use a LinkedHashMap instead of a HashMap, the iteration order is not guaranteed to be the same on multiple creations of the same map, or in Go where map iteration order is purposefully randomized, etc.
| "CheckpointToken": "string", | ||
| "InitialExecutionState": { | ||
| "Operations": [ | ||
| /* Operation objects */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we link to the Lambda API docs sections where appropriate in this doc? E.g. https://docs.aws.amazon.com/lambda/latest/api/API_Operation.html
LANGUAGE_SDK_SPECIFICATION.md
Outdated
|
|
||
| The SDK CANNOT: | ||
|
|
||
| - Prevent users from writing non-deterministic code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean if it is somehow able to, it should 😄 - the spec shouldn't prevent it from doing so
Maybe this should say "The SDK is not responsible for:"
LANGUAGE_SDK_SPECIFICATION.md
Outdated
| "Error": { | ||
| "ErrorType": "string", | ||
| "ErrorMessage": "string", | ||
| "StackTrace": ["string"] // OPTIONAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the fields are actually optional
(There's also a 4th ErrorData field as well for additional machine-readable error data)
|
|
||
| - Maximum execution duration: 1 year | ||
| - Maximum response payload: 6MB | ||
| - Maximum history size: Limited by service quotas |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maximum number of durable operations (including retries)? The limit is not directly on history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we also have a history limit (100MB), added both
| Invocation 1: | ||
| - Load state: [] | ||
| - Start STEP(id="step1") | ||
| - Checkpoint: START step1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah - that would be a good idea.
| - Load state: [step1: SUCCEEDED, step2: STARTED] | ||
| - Replay STEP(id="step1") - return cached "result1" | ||
| - Resume STEP(id="step2") | ||
| - Checkpoint: START step2 (same ID, continues) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You wouldn't checkpoint START again - it's already started. Depends on semantics but can either run it again then checkpoint success/failure/retry or decide to immediately checkpoint failure, or retry, etc.
LANGUAGE_SDK_SPECIFICATION.md
Outdated
|
|
||
| ``` | ||
| [callback_promise, callback_id] = await context.create_callback("approval") | ||
| await send_approval_email(callback_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably want to put this in a context.step
LANGUAGE_SDK_SPECIFICATION.md
Outdated
| │ START action | ||
| ▼ | ||
| ┌─────────┐ | ||
| │ STARTED │◄──────┐ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the arrow here should be coming from READY
Signed-off-by: Michael Gasch <[email protected]>
Signed-off-by: Michael Gasch <[email protected]>
Signed-off-by: Michael Gasch <[email protected]>
|
@jriecken shall I add a section on testing (in-memory local executor)? |
|
Just connected with @maschnetwork and I noticed that we currently don't have guidance in this SPEC how to handle concurrent durable operations when waits/suspension are involved (simple waits, durable invokes, callbacks, including timeouts). For example, you want to use cc/ @ParidelPooya |
|
|
||
| 1. Non-durable code (code outside operations) MUST execute identically on each replay | ||
| 2. User code MUST NOT use non-deterministic values (e.g., `Date.now()`, `Math.random()`) outside durable operations | ||
| 3. User code MUST NOT perform side effects (e.g., API calls, database writes) outside durable operations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They can perform side-effects if they want as long as they don't use the results to affect operation order.
| ``` | ||
| [New] → START → STARTED → (time passes) → SUCCEEDED [Done] | ||
| ↓ | ||
| CANCEL → CANCELLED [Done] | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flowchart LR
New[Customer calls ctx.wait] --> START
START --> |Started| Delay{Wait}
Delay --> |Succeeded| Success[ctx.wait completes]
Delay --> CANCEL
CANCEL --> |Cancelled| Cancelled[ctx.wait completes]
| └→ (external failure) → FAILED [Done] | ||
| └→ (timeout) → TIMED_OUT [Done] | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flowchart LR
New[Customer calls ctx.createCallback] --> START
START --> |Started| Delay{Wait}
SUCCEED --> |Succeeded| Success[ctx.createCallback completes successully]
FAIL --> |Failed| Failure[ctx.createCallback completes with error]
TIMEOUT --> |TimedOut| Failure
Delay .-> SendDurableExecutionCallbackSuccess
Delay .-> SendDurableExecutionCallbackFailure
Delay .-> TIMEOUT
SendDurableExecutionCallbackSuccess --> SUCCEED
SendDurableExecutionCallbackFailure --> FAIL
subgraph External System
SendDurableExecutionCallbackSuccess
SendDurableExecutionCallbackFailure
end
| └→ (invoke timeout) → TIMED_OUT [Done] | ||
| └→ (invoke stopped) → STOPPED [Done] | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flowchart LR
New[Customer calls ctx.invoke] --> START
START --> |Started| Delay{Wait}
SUCCEED --> |Succeeded| Success[ctx.createCallback completes successully]
FAIL --> |Failed| Failure[ctx.createCallback completes with error]
TIMEOUT --> |TimedOut| Failure
STOP[StopDurableExecution] --> |Stopped| Failure
Delay .-> External
Delay .-> TIMEOUT
subgraph External System
External@{ shape: fork }
External .-> Invoked[Invoked Function]
External .-> STOP
end
Invoked .-> SUCCEED
Invoked .-> FAIL
| ↓ | ||
| └→ FAIL → FAILED [Done] | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flowchart LR
New[Customer calls operation] --> START
START --> |Started| SUCCEED
SUCCEED --> |Succeeded| Success[Completes successully]
START --> |Started| FAIL
FAIL --> |Failed| Failure[Completes with error]
| ### 11.2 Async Patterns | ||
|
|
||
| The SDK MUST integrate with the language's asynchronous programming model: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by 'integrate'? The Python SDK doesn't integrate with asyncio.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MUST integrate
this could be needlessly restrictive.
There can be different ways of implementing asynchronous or concurrent work even within a language, and some opinionated views over which is "better".
Other than that, it could be that someone wants to make a deliberately simplified synchronous or light-weight version of the SDK that eschews concurrency?
Issue #, if available: n/a
Description of changes:
Provide a language SDK specification for developers to build their own SDKs and establish conformance testing. This is just a first start to iterate on the SDK and provide builders guidance given the large interest in additional SDKs (Go, Rust, Java, Swift, .NET). The file should then be extracted into its own repository to create conformance tests for officially supported SDKs.
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.