Skip to content
This repository has been archived by the owner on Dec 13, 2023. It is now read-only.

Remove tasks from Elasticsearch when workflow is removed #3300

Conversation

RemcoBuddelmeijer
Copy link
Contributor

@RemcoBuddelmeijer RemcoBuddelmeijer commented Oct 17, 2022

Pull Request type

  • Bugfix
  • Feature
  • Refactoring (no functional changes, no api changes)
  • Build related changes (Please run ./gradlew generateLock saveLock to refresh dependencies)
  • WHOSUSING.md
  • Other (please describe):

NOTE: Please remember to run ./gradlew spotlessApply to fix any format violations.

Changes in this PR

  • Implement removal of tasks when calling /workflow/remove

  • Written tests for the new behaviour

    • Introduced tests for WorkflowServiceTest
    • Introduced tests for WorkflowResourceTest
    • Introduced tests for ExecutionDAOFacadeTest
    • Introduced tests for QueueResilienceSpec.groovy
    • Introduced tests for TestElasticSearchRestDAOV6
    • Introduced tests for TestElasticSearchDAOV6
  • Updated Javadocs where necessary to reflect on the new changes

Describe the new behavior from this PR, and why it's needed
Issue #1505

Tasks were not removed from Elasticsearch once the workflow was removed. This is necessary to not leave hanging objects in Elasticsearch that are not connected to any existing instance of a workflow. This change allows you to specify to also remove the tasks.

Alternatives considered

None

Additional notes

Options for removing tasks and archiving tasks have both been considered. In the final draft only the option to remove tasks was introduced. This to keep the changes clear and communicate to the user that tasks are removed now. On top of this all of this was done as it's unknown if the users are using this undefined behaviour of tasks not being removed. This was later on removed in the PR as it can be trusted that users do not rely on unidentified behaviour.

Right now to minimize the impact of the change the option to toggle on removing tasks or not is present. Only methods being called by the WorkflowResource class for workflow removal allows you to delete tasks, any other class such as WorkflowServiceImpl does not implement this yet. This can be done in a future PR.

* Add removeTask method for IndexDAO

Implement removeTask and asyncRemoveTask methods to the IndexDAO
interface.
Implemented in ElasticSearchDAOV6, ElasticSearchRestDAOV6, and
NoopIndexDAO.

* Implement removal of tasks for removeWorkflow

- Update ExecutionDAOFacade#removeWorkflow with two new parameters:
removeTasks and archiveTasks
- Added ExecutionDAOFacade#removeTaskIndex
- Added methods updateTask and asyncUpdateWorkflow to IndexDAO
- Added methods updateTask and asyncUpdateWorkflow to ElasticSearchDAOV6
  and ElasticSearchRestDAOV6

* Add tests for removing workflow through /workflow/remove

- Introduced tests for WorkflowServiceTest
- Introduced tests for WorkflowResourceTest
- Introduced test for ExecutionDAOFacadeTest
- Introduced test for QueueResilienceSpec.groovy

* Update Javadoc to include task archival when workflow archived and removeTasks true
@RemcoBuddelmeijer
Copy link
Contributor Author

Once reviewed I will resolve the merge conflicts.

- Now tasks are search and associated with workflows through a search
  query directly embedded rather than constructed
- If a task can not be removed it will error and return
- Monitors are updated automatically
- Updated TestElasticSearchRestDAOV6 and TestElasticSearchDAOV6
@RemcoBuddelmeijer
Copy link
Contributor Author

RemcoBuddelmeijer commented Oct 17, 2022

I have gone ahead and resolved the merge conflicts @jxu-nflx @apanicker-nflx @ghoshabhi
Please allow the workflow to be ran so it can verify if all is still working as expected.

@apanicker-nflx
Copy link
Collaborator

@RemcoBuddelmeijer Thanks for creating the PR and 💯 on adding a wide slew of tests.

While I understand the reasoning for adding a flag removeTasks to the API and service layers, it should be the default behavior. The current implementation where tasks are left orphaned when workflow is removed is really a bug and your PR directly addresses this. Hence, I would suggest removing the flag and making it the default behavior.

@RemcoBuddelmeijer
Copy link
Contributor Author

RemcoBuddelmeijer commented Oct 17, 2022

@RemcoBuddelmeijer Thanks for creating the PR and 💯 on adding a wide slew of tests.

While I understand the reasoning for adding a flag removeTasks to the API and service layers, it should be the default behavior. The current implementation where tasks are left orphaned when workflow is removed is really a bug and your PR directly addresses this. Hence, I would suggest removing the flag and making it the default behavior.

@apanicker-nflx Okay that sounds good. I'll go ahead and remove it and trust on people not relying on undefined behaviour.

@RemcoBuddelmeijer RemcoBuddelmeijer requested review from apanicker-nflx and ghoshabhi and removed request for ghoshabhi and apanicker-nflx October 17, 2022 23:09
Currently ExecutionDAOFacade would call removeTask double the amount of
times, this has been fixed.
Now the only bug still present is that tasks are not terminated once removed
and with that the index may also not be removed.
@RemcoBuddelmeijer
Copy link
Contributor Author

RemcoBuddelmeijer commented Oct 18, 2022

@apanicker-nflx After updating the QueueResilienceSpec test to also include archiving workflows it notices that the workflow or the tasks associated with the workflow are not terminated in the same execution as this is done through the ExecutionService.
I would like to propose to keep this out of the testing plan since the QueueResiliencySpec is not to test the termination of workflows. We could always reintroduce it but force the status of the tasks and workflow. Right now I don't have the knowledge to determine if it is possible and then possibly be a bug once the termination of the workflow is not executed prior to the archival of the index.

At least for now I will push the change removing the archival test of QueueResiliencySpec.

I have pushed this as part of this PR to show the execution of both scenarios.

try {
removeWorkflowIndex(workflow, archiveWorkflow);
} catch (JsonProcessingException e) {
throw new TransientException("Workflow can not be serialized to json", e);
}

executionDAO.removeWorkflow(workflowId);
tasks.forEach(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why loop through the tasks twice? A regular for loop might be easier to read here, and couldn't you do both operations through a single loop of the tasks?

Copy link
Contributor Author

@RemcoBuddelmeijer RemcoBuddelmeijer Oct 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I found it more readable and easier to have two separate loops with separate error handling where queue removal can be controlled separately from index removal.
When it comes to a traditional for loop: I wanted to keep in line with the style of the file. But this can always be changed.

Either way since it’s unnecessary to loop over something twice and now that the order of method has been changed allowing the loops to be merged, I’ll go ahead and merge the loops.

@james-deee
Copy link
Contributor

The only other thing I would call out is that the Community Version will need to be updated as well for the V7 ElasticSearch IndexDAO. So whoever releases this fix (thanks for fixing btw!!!!!), will need to be aware when they bump the version over in Community @apanicker-nflx

@RemcoBuddelmeijer
Copy link
Contributor Author

The only other thing I would call out is that the Community Version will need to be updated as well for the V7 ElasticSearch IndexDAO. So whoever releases this fix (thanks for fixing btw!!!!!), will need to be aware when they bump the version over in Community @apanicker-nflx

@james-deee I can always merge this logic into the community edition as well.

@james-deee
Copy link
Contributor

@apanicker-nflx sorry for the ping, just wanted to see the status of this PR. Thanks!

Copy link

@ghoshabhi ghoshabhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@james-deee
Copy link
Contributor

@ghoshabhi and @apanicker-nflx when is an ETA that we can expect this to get merged and for a new release of Conudctor to go out?

I know there is also some batch polling that was added to the Java client that we'd also like to take advantage of. Thanks!

@james-deee
Copy link
Contributor

@ghoshabhi or @apanicker-nflx sorry to ping again, just wanted to see if there was an update on this. Thanks!

@apanicker-nflx apanicker-nflx merged commit 0c30109 into Netflix:main Jan 23, 2023
@RemcoBuddelmeijer RemcoBuddelmeijer deleted the feature/remove_tasks_through_workflow_resource branch January 26, 2023 12:48
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants