Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GetParentsNum function #18

Open
IgalRozen1 opened this issue Jun 3, 2022 · 10 comments
Open

GetParentsNum function #18

IgalRozen1 opened this issue Jun 3, 2022 · 10 comments

Comments

@IgalRozen1
Copy link

HI :) First of all, it's a great library, thank you for putting your effort into it.

I want to use DAG for ETL workflow purposes, with our architecture, we need to be able to proceed to vertex only if all of the vertex's parents finished their "job".

This is why running over the graph with BFS/DFS would not work for us.
To do so, I can think about 2 options with this lib:

  1. GetChildren, GetRoots, DeleteVertex:
    After the vertex completes its mission, we will find all of its children and we can delete the vertex from the graph. Now we will search again for roots, if a new root appears, a new vertex will start to run.
    Complexity: with this approach, we need to save and track all of the roots (hash table).
    The problem here is that GetRoots() returns all of the roots and we will have to iterate over all of them each time to find the new root.
    Let’s take an example of 2 layers graph with 2n vertices, n roots, n leaves, each root connected to one leaf => O(n^2)

  2. GetVertices, GetChildren, GetParents, DeleteVertex:
    Pre-processing step addition. Iterating over all of the vertices to add “dependencies” meta-data (inbound edge, part of the vertex struct).
    Complexity: we have to use GetVertices() → O(n) and iterate over them to get the parents num with GetParents for each vertex. Let’s take an example of 2 layers graph with 2n vertices, n roots, n leaves, each root connected to all n leaves => O(n^2)

We can improve the complexity of the second option if we could get the parents num (len(inboundEdge)) => O(n) instead of the actual edges.
Please current me if I'm wrong or if there is any other way.
Thanks a lot!

@vtolstov
Copy link

vtolstov commented Jun 3, 2022

Why not get ordered descendants/ancestors ? So you can travers in order to you needed vertex after all jobs are completed?

@IgalRozen1
Copy link
Author

Good question, GetOrderedDescendants returns in a BFS order.
Let's take the next example
DAG-Page-4 drawio
:

What will GetOrderedDescendants with BFS order return for v0?
The first layer will be v1 & v3 (rand order), and the second layer contains v2 & v4 (rand order).
This means v4 can run before v2 is finished (but v4 is dependent on v2).

@vtolstov
Copy link

vtolstov commented Jun 3, 2022

Ah yes, if you have two branches and number of vertexes are not equal you have problems =)
I'm agree with 2 suggestion =)
How dag processing will be in this case?
Create dag => preprocess (add metadata) ?
Mostly i want to the same as you suggest (i'm process independent vertexes in parallel)

@sebogh
Copy link
Collaborator

sebogh commented Jun 6, 2022

@vtolstov, @IgalRozen1 unfortunately I'm short of time thus the following is not thought through:

Have an orchestrator/broker that spins up a worker for every required node (BFS/DFS doen't matter) passing in

  • a notification channel and
  • the list/map of parents

Whenever a worker finishes, the worker terminates its notification channel to let the orchestrator know it's done. The orchestrator then sends this node's "name" onto all remaining nodification channels. Which wakes them up. Whenever a worker received the names of all its parents (i.e. all dependencies are satisfied), this worker may start the actual work...

The orchestrator and worker impl. should be only a matter of a couple of lines.

@IgalRozen1
Copy link
Author

@sebogh Thank you for the response, I've done something similar to your suggestion.
But, the preprocessing step will be the same (manage the workers through the orchestrator).

@sebogh
Copy link
Collaborator

sebogh commented Jun 26, 2022

@vtolstov, @IgalRozen1 see: https://pkg.go.dev/github.com/heimdalr/[email protected]#DAG.DescendantsFlow

(DescendantsFlow() also provides for passing results downstream.)

@vtolstov
Copy link

what about walking in reverse order ? =)

@vtolstov
Copy link

but thanks this is very usable

@vtolstov
Copy link

vtolstov commented Jul 1, 2022

~@sebogh in case of DescendantsFlow how can i get [][]string of steps?

like where first index contains order of execution and []string contains slice of steps on this order level?~

not need to answer =) i found my mistake

@vtolstov
Copy link

vtolstov commented Jul 1, 2022

@IgalRozen1 or how you try to use ? i need to have all steps slice and know what steps i can run in parallel to speedup execution.

so i found that i have big mistake and not all graph stuff can be represented with two dimensional array

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants