Skip to content
This repository has been archived by the owner on Jan 10, 2019. It is now read-only.

decision_context not set during signal-induced DecisionTask #110

Open
ben-mays opened this issue Jan 11, 2016 · 7 comments
Open

decision_context not set during signal-induced DecisionTask #110

ben-mays opened this issue Jan 11, 2016 · 7 comments

Comments

@ben-mays
Copy link

Running the sample code given with a reference to the decision_context causes the DecisionTask to fail. The execution history shows DecisionTaskScheduled, DecisionTaskStarted but never DecisionTaskCompleted. Eventually the workflow will timeout. The cause is the decision_context resolving to nil.

Here is the modified code to reproduce:

require_relative '../../recipe_activities'
class WaitForSignalWorkflow
  extend AWS::Flow::Workflows

  workflow :place_order do
    {
      version: "1.0",
      task_list: "wait_for_signal_workflow",
      execution_start_to_close_timeout: 60,
      task_start_to_close_timeout: 20,
    }
  end
  activity_client(:client) { { from_class: "RecipeActivity" } }
  signal :change_order

  def initialize
    @change_order_period = 30
    @signal_received = Future.new
  end

  def place_order(original_amount)
    timer = create_timer_async(@change_order_period)
    wait_for_any(timer, @signal_received)
    client.process(amount)
  end

  def change_order(amount)
    puts workflow_id # raises exception, workflow_id calls decision_context.workflow_context..
    @signal_received.set(amount) unless @signal_received.set?
  end
end
@ben-mays
Copy link
Author

Additionally, the workflow executor does not log the failure anywhere and simply blackholes failures in the signal-induced DecisionTasks.

@mustafashabib
Copy link

👍

@runjoerun
Copy link

🙏

@pheuter
Copy link

pheuter commented Jan 19, 2016

+1

@mjsteger
Copy link
Contributor

@ben-mays Can you provide the code you are using to run the worker/activity_worker/starter? I was getting a similar issue where I'd get DecisionTaskStarted but never DecisionTaskCompleted, and the workflow would apparently blackhole the error and timeout. Bumping to 3.1.0(the newest release, which for some reason is not in the gemfile for the samples repo) allowed it to properly raise the exception and let me see my error, and after adding a value to start_execution allowed it to go through correctly(I still get an error, but that's due to amount not being defined in the code snippet given)

@ben-mays
Copy link
Author

@mjsteger sorry, we're actively moving functionality off of SWF as a result of this and numerous other issues that manifested themselves- long polling causing tasks to be scheduled on dead sockets, the decision/activity context not being set, a memory leak that won't go away. I'll leave the issue open for others that may have the same issue.

@jpfuentes2
Copy link

@ben-mays Do you have any literature you've written about these issues? Did you happen to use the JVM Flow framework as well or are these experiences solely based on the ruby version? Can you speak to what you've switched to (assuming custom-grown workflow management on-top of a message bus)?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants