Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Concurrent Flows? #214

Open
seanhess opened this issue Jul 24, 2023 · 3 comments
Open

Question: Concurrent Flows? #214

seanhess opened this issue Jul 24, 2023 · 3 comments
Labels

Comments

@seanhess
Copy link

If step 1 gets 100 pieces of data, and I need to execute a flow for each one, does funflow support running these concurrently?

Or would I need to write a separate program that kicks off 100 flows?

How would you handle this?

@dorranh
Copy link
Contributor

dorranh commented Jul 25, 2023

Hi @seanhess, the flow runners provided with the project (runFlow or runFlowWithConfig) will indeed run independent branches of the flow's DAG in parallel (src). Under the hood we use kernmantle's performP function which is ultimately built on top of the async package. The main limitation is that funflow only supports multithreaded and not distributed execution, so those 100 tasks would need to be able to be processed by a single machine.

@seanhess
Copy link
Author

That's great, thanks! Multithreaded should provide plenty of performance, I suspect by the time I need a whole cluster it might make sense to switch to an event bus anyway.

In real life, my process needs to run a long flow for each item, which never converge. So the real flow is something that happens once per item. In a real-world application, would you include the first step (collecting the items) in the flow, such that the huge fanout is a part of it? Or would you have collection be a separate function / program, and start the flow with the long process per item?

@dorranh
Copy link
Contributor

dorranh commented Sep 5, 2023

Hi @seanhess, apologies for the delayed response. I am not actually sure whether there would be an advantage to one approach over the other. In the case of one flow with many parallel DAG branches, the branches should be processed in parallel as noted above. In the case of launching many flows in parallel from a driver program, I think the main concern would be how the driver program executes each flow and whether they are launched in parallel.

My intuition would be to start by trying the first option since that is the primary way we have used funflow and should hopefully work for your use case out of the box. Please feel free to open an issue if you run into any problems with that approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants