Skip to content

IWF-232: Add disabling sticky cache option #473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 4, 2024
8 changes: 7 additions & 1 deletion .github/workflows/ci-cadence-integ-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,12 +9,18 @@ jobs:
tests:
name: "Integration testing"
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
test-subset:
- "a-m"
- "n-z"
steps:
- uses: actions/checkout@v3
- name: "Set up cadence environment"
run: docker compose -f docker-compose/ci-cadence-dependencies.yml up -d
- name: "Test against cadence"
run: make ci-cadence-integ-test
run: make ci-cadence-integ-test startsWith=${{ matrix['test-subset'] }}
- name: Dump docker logs
if: always()
uses: jwalton/gh-docker-logs@v2
9 changes: 8 additions & 1 deletion .github/workflows/ci-temporal-integ-test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,12 +13,19 @@ jobs:
# Give the default GITHUB_TOKEN write permission to commit and push the
# added or changed files to the repository.
contents: write
strategy:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought we wanted have one action to run with stickyCache, and another run without, right? (two actions run in parallel so that it won't slow down)
So I expect we need new yml files like:

  • ci-temporal-integ-test-disable-sticky.yml
  • ci-cadence-integ-test-disable-sticky.yml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I thought we would only do tests with no sticky cache, but it makes sense to keep both scenarios 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

fail-fast: false
matrix:
test-subset:
- "a-m"
- "n-z"

steps:
- uses: actions/checkout@v3
- name: "Set up temporal environment"
run: docker compose -f docker-compose/ci-temporal-dependencies.yml up -d
- name: "Test against temporal"
run: make ci-temporal-integ-test
run: make ci-temporal-integ-test startsWith=${{ matrix['test-subset'] }}
- name: Dump docker logs
if: always()
uses: jwalton/gh-docker-logs@v2
12 changes: 10 additions & 2 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -177,10 +177,18 @@ cadenceIntegTests:
$Q go test -v ./integ -temporal=false

ci-cadence-integ-test:
$Q go test -v ./integ -search=false -temporal=false -dependencyWaitSeconds=180
ifeq (${startsWith},)
$Q go test -v ./integ -search=false -temporal=false -dependencyWaitSeconds=180 -disableStickyCache=true
else
$Q go test -v ./integ -run "(?i)^Test[${startsWith}]" -search=false -temporal=false -dependencyWaitSeconds=180 -disableStickyCache=true
endif

ci-temporal-integ-test:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you test does -disableStickyCache=true work? I thought it has to be -disableStickyCache because it's a bool flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does work, but I can simplify it to disableStickyCache

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

$Q go test -v ./integ -cover -coverprofile coverage.out -coverpkg ./service/... -search=false -cadence=false -dependencyWaitSeconds=60
ifeq (${startsWith},)
$Q go test -v ./integ -cover -coverprofile coverage.out -coverpkg ./service/... -search=false -cadence=false -dependencyWaitSeconds=60 -disableStickyCache=true
else
$Q go test -v ./integ -run "(?i)^Test[${startsWith}]" -cover -coverprofile coverage.out -coverpkg ./service/... -search=false -cadence=false -dependencyWaitSeconds=60 -disableStickyCache=true
endif

ci-all-tests:
# Fails CI when used with -coverprofile flag due to tests that panic; see https://go.dev/doc/build-cover#panicprof
Expand Down
2 changes: 2 additions & 0 deletions integ/flag.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,5 @@ var searchWaitTimeIntegTest = flag.Int("searchWaitMs", 2000, "the amount of time
var temporalHostPort = flag.String("temporalHostPort", "", "temporal host port")

var dependencyWaitSeconds = flag.Int("dependencyWaitSeconds", 60, "the number of seconds waiting for dependencies to be up(Cadence/Temporal)")

var disableStickyCache = flag.Bool("disableStickyCache", false, "disable Temporal/Cadence sticky execution")
4 changes: 2 additions & 2 deletions integ/util.go
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,7 @@ func doStartIwfServiceWithClient(config IwfServiceTestConfig) (uclient uclient.U

// start iwf interpreter worker
interpreter := temporal.NewInterpreterWorker(createTestConfig(config), temporalClient, service.TaskQueue, config.MemoEncryption, dataConverter, uclient)
interpreter.Start()
interpreter.StartWithOptions(temporal.StartOptions{DisableStickyCache: *disableStickyCache})
return uclient, func() {
iwfServer.Close()
interpreter.Close()
Expand Down Expand Up @@ -155,7 +155,7 @@ func doStartIwfServiceWithClient(config IwfServiceTestConfig) (uclient uclient.U

// start iwf interpreter worker
interpreter := cadence.NewInterpreterWorker(createTestConfig(config), serviceClient, iwf.DefaultCadenceDomain, service.TaskQueue, closeFunc, uclient)
interpreter.Start()
interpreter.StartWithOptions(cadence.StartOptions{DisableStickyCache: *disableStickyCache})
return uclient, func() {
iwfServer.Close()
interpreter.Close()
Expand Down
19 changes: 19 additions & 0 deletions service/interpreter/cadence/worker.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,11 @@ type InterpreterWorker struct {
tasklist string
}

type StartOptions struct {
// When DisableStickyCache is true it can harm performance; should not be used in production environment
DisableStickyCache bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know we have a specific use case for this start option to disable sticky cache, but I'm wondering if we should add a warning comment about the performance impacts of doing this for other reasons. (My understanding is that this would have a not insignificant impact on performance in a non-test/real-world use case. Is that right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. I'll add a comment pointing it out

}

func NewInterpreterWorker(
config config.Config, service workflowserviceclient.Interface, domain, tasklist string, closeFunc func(),
unifiedClient uclient.UnifiedClient,
Expand All @@ -38,11 +43,25 @@ func (iw *InterpreterWorker) Close() {
}

func (iw *InterpreterWorker) Start() {
var options StartOptions

// default options
options.DisableStickyCache = false

iw.StartWithOptions(options)
}

func (iw *InterpreterWorker) StartWithOptions(startOptions StartOptions) {
config := env.GetSharedConfig()
options := worker.Options{
MaxConcurrentActivityTaskPollers: 10,
MaxConcurrentDecisionTaskPollers: 10,
}

if startOptions.DisableStickyCache {
options.DisableStickyExecution = true
}

if config.Interpreter.Cadence != nil && config.Interpreter.Cadence.WorkerOptions != nil {
options = *config.Interpreter.Cadence.WorkerOptions
}
Expand Down
18 changes: 18 additions & 0 deletions service/interpreter/temporal/worker.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,11 @@ type InterpreterWorker struct {
taskQueue string
}

type StartOptions struct {
// When DisableStickyCache is true it can harm performance; should not be used in production environment
DisableStickyCache bool
}

func NewInterpreterWorker(
config config.Config, temporalClient client.Client, taskQueue string, memoEncryption bool,
memoEncryptionConverter converter.DataConverter, unifiedClient uclient.UnifiedClient,
Expand All @@ -36,6 +41,15 @@ func (iw *InterpreterWorker) Close() {
}

func (iw *InterpreterWorker) Start() {
var options StartOptions

// default options
options.DisableStickyCache = false

iw.StartWithOptions(options)
}

func (iw *InterpreterWorker) StartWithOptions(startOptions StartOptions) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To even better address @mixydubs concerns, I feel like it's even better to have this StartWithStickyCacheDisabledForTest() so that no one would use this in production.

StartWithOptions is also a little confusing with the workerOptions that we are passing through config config.Config in NewInterpreterWorker. If we really need to pass options, they can be passed from there for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

config := env.GetSharedConfig()
options := worker.Options{
MaxConcurrentActivityTaskPollers: 10,
Expand All @@ -49,6 +63,10 @@ func (iw *InterpreterWorker) Start() {
iw.worker = worker.New(iw.temporalClient, iw.taskQueue, options)
worker.EnableVerboseLogging(config.Interpreter.VerboseDebug)

if startOptions.DisableStickyCache {
worker.SetStickyWorkflowCacheSize(0)
}

iw.worker.RegisterWorkflow(Interpreter)
iw.worker.RegisterWorkflow(WaitforStateCompletionWorkflow)
iw.worker.RegisterActivity(interpreter.StateStart) // TODO: remove in next release
Expand Down
Loading