Skip to content

Parallelize the Rounding Stage of GCS #22222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 3, 2024

Conversation

cohnt
Copy link
Contributor

@cohnt cohnt commented Nov 20, 2024

This ended up being pretty simple -- solve all the restrictions in a parallel for loop, and override options.parallelism to set the number of threads to one if any not-thread-safe costs or constraints are included in the GCS.

Gonna test this with the various debug builds before I send it up for review.


This change is Reviewable

@cohnt
Copy link
Contributor Author

cohnt commented Nov 21, 2024

@drake-jenkins-bot linux-jammy-clang-bazel-experimental-thread-sanitizer please
@drake-jenkins-bot linux-jammy-clang-bazel-experimental-valgrind-memcheck please
@drake-jenkins-bot linux-jammy-clang-bazel-experimental-undefined-behavior-sanitizer please
@drake-jenkins-bot linux-jammy-clang-bazel-experimental-address-sanitizer please

Copy link
Contributor

@AlexandreAmice AlexandreAmice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, needs platform reviewer assigned, needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @cohnt)


geometry/optimization/graph_of_convex_sets.cc line 1618 at r1 (raw file):

    // If any costs or constraints aren't thread-safe, we can't parallelize.
    bool is_thread_safe = true;

Don't forget to set options.kMaxThreads = 1 and make sure to warn if is_thread_safe == false && parallelism.num_threads() > 1, otherwise users might be surprised if the extra threads aren't used.

…ts or constraints. Also make sure only one thread is used per solver.
Copy link
Contributor Author

@cohnt cohnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 1 unresolved discussion, needs platform reviewer assigned, needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @cohnt)


geometry/optimization/graph_of_convex_sets.cc line 1618 at r1 (raw file):

Previously, AlexandreAmice (Alexandre Amice) wrote…

Don't forget to set options.kMaxThreads = 1 and make sure to warn if is_thread_safe == false && parallelism.num_threads() > 1, otherwise users might be surprised if the extra threads aren't used.

Done.

@cohnt
Copy link
Contributor Author

cohnt commented Nov 30, 2024

+@RussTedrake for feature review, if you have the time?

Copy link
Contributor

@RussTedrake RussTedrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll need some test coverage.

Reviewed 1 of 2 files at r1, 1 of 1 files at r2, all commit messages.
Reviewable status: 4 unresolved discussions, LGTM missing from assignee RussTedrake(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @cohnt)


geometry/optimization/graph_of_convex_sets.cc line 1662 at r2 (raw file):

    if (n_threads_to_use < options.parallelism.num_threads()) {
      log()->warn(
          "GCS restriction has costs or constraints which are not thread-safe, "

perhaps "not declared thread-safe"? i suspect most are thread safe, but might need to be marked appropriately since we have to assume the worst.


geometry/optimization/graph_of_convex_sets.cc line 1690 at r2 (raw file):

    using common_robotics_utilities::parallelism::DegreeOfParallelism;
    using common_robotics_utilities::parallelism::DynamicParallelForIndexLoop;
    using common_robotics_utilities::parallelism::ParallelForBackend;

btw -- i guess i thought that drake/common/parallelism.h was supposed to shield us from using CRU parallelism directly. I guess you need more?


geometry/optimization/graph_of_convex_sets.cc line 1696 at r2 (raw file):

    DynamicParallelForIndexLoop(
        DegreeOfParallelism(n_threads_to_use), 0, ssize(candidate_paths),
        solve_ith_convex_restriction, ParallelForBackend::BEST_AVAILABLE);

this whole glob of code is conceptually separable from the rest of SolveShortestPath, and it's adding a LOT of lines to an already enormous method. What do you think about pulling it out into it's own method? It might even be useful by itself? (e.g. SamplePaths got pulled out for this reason).

Copy link
Contributor Author

@cohnt cohnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would test coverage look like? Since the default is to use parallelism, we already have a ton of test cases that verify that parallelism works. I guess I could pick one of the test cases to run with and without parallelism, and verify that the solution is the same?

Reviewable status: 4 unresolved discussions, LGTM missing from assignee RussTedrake(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @cohnt)


geometry/optimization/graph_of_convex_sets.cc line 1662 at r2 (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

perhaps "not declared thread-safe"? i suspect most are thread safe, but might need to be marked appropriately since we have to assume the worst.

Done.


geometry/optimization/graph_of_convex_sets.cc line 1690 at r2 (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

btw -- i guess i thought that drake/common/parallelism.h was supposed to shield us from using CRU parallelism directly. I guess you need more?

I was trying to work within the current program structure, but now that I'm refactoring, I've been able to remove the CRU usage.


geometry/optimization/graph_of_convex_sets.cc line 1696 at r2 (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

this whole glob of code is conceptually separable from the rest of SolveShortestPath, and it's adding a LOT of lines to an already enormous method. What do you think about pulling it out into it's own method? It might even be useful by itself? (e.g. SamplePaths got pulled out for this reason).

I've done some refactoring to clean this up. Do you like this better?

Copy link
Contributor

@RussTedrake RussTedrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that seems reasonable. Philosophically, if you are adding a new feature (especially one that changes this much code), then we want to make sure we have coverage.

Reviewed 3 of 3 files at r3, all commit messages.
Reviewable status: 4 unresolved discussions, LGTM missing from assignee RussTedrake(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @cohnt)


geometry/optimization/graph_of_convex_sets.cc line 1696 at r2 (raw file):

Previously, cohnt (Thomas Cohn) wrote…

I've done some refactoring to clean this up. Do you like this better?

yes. thank you!


geometry/optimization/graph_of_convex_sets.cc line 1662 at r3 (raw file):

    }

    std::optional<solvers::SolverId> maybe_solver_id = std::nullopt;

btw -- do you worry about calling ChooseBestSolver for every solve if solver_id is not set here? I guess it's necessary in the current workflow, since there could be one program in progs that e.g. visits a vertex whom no other program visits which has a different (e.g. nonconvex) program.


geometry/optimization/graph_of_convex_sets.h line 883 at r3 (raw file):

  // Add results for additional variables in `result` to make it comparable with
  // other transcriptions.
  void SubstituteAdditionalVariables(

The prefix "Substitute" here confused me, since you have placeholder variable substitution happening in various places. Can we find a more specific name? This is only to make the convex restriction result look more like the MIP result?

fysa, in PRs like #21833, I've actually been trying to do less of this writing of extra stuff into the solution results, and instead making GetSolutionCost and friends more robust to missing information.

Copy link
Contributor Author

@cohnt cohnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added this for one of the test cases that does rounding.

Dismissed @RussTedrake from a discussion.
Reviewable status: 3 unresolved discussions, LGTM missing from assignee RussTedrake(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @cohnt)


geometry/optimization/graph_of_convex_sets.h line 883 at r3 (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

The prefix "Substitute" here confused me, since you have placeholder variable substitution happening in various places. Can we find a more specific name? This is only to make the convex restriction result look more like the MIP result?

fysa, in PRs like #21833, I've actually been trying to do less of this writing of extra stuff into the solution results, and instead making GetSolutionCost and friends more robust to missing information.

How about this? A little verbose, but very specific.


geometry/optimization/graph_of_convex_sets.cc line 1662 at r3 (raw file):

Previously, RussTedrake (Russ Tedrake) wrote…

btw -- do you worry about calling ChooseBestSolver for every solve if solver_id is not set here? I guess it's necessary in the current workflow, since there could be one program in progs that e.g. visits a vertex whom no other program visits which has a different (e.g. nonconvex) program.

For the reason you described, I think we have to call ChooseBestSolver each time. Not sure what the overhead looks like.

Presumably, the best practice is to set this solver if it's known -- do you want me to document that in GraphOfConvexSetsOptions?

Copy link
Contributor

@AlexandreAmice AlexandreAmice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 unresolved discussions, LGTM missing from assignee RussTedrake(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @cohnt)


geometry/optimization/graph_of_convex_sets.cc line 1662 at r3 (raw file):

Previously, cohnt (Thomas Cohn) wrote…

For the reason you described, I think we have to call ChooseBestSolver each time. Not sure what the overhead looks like.

Presumably, the best practice is to set this solver if it's known -- do you want me to document that in GraphOfConvexSetsOptions?

FWIW, ChooseBestSolver is pretty fast since it is just a big lookup table. If you're worried about the cost of repeatedly constructing the best solver, then SolveInParallel already takes care of that by caching the constructed solvers as it solves.


geometry/optimization/graph_of_convex_sets.cc line 1653 at r4 (raw file):

          "thread-safe. Any paths being considered in the rounding process "
          "that contain these constraints will be solved sequentially.");
    }

Maybe only warn once in case the GCS is reused to avoid console spam.

Code quote:

    if (!is_thread_safe) {
      log()->warn(
          "GCS restriction has costs or constraints which are not declared "
          "thread-safe. Any paths being considered in the rounding process "
          "that contain these constraints will be solved sequentially.");
    }

Copy link
Contributor Author

@cohnt cohnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 3 unresolved discussions, LGTM missing from assignee RussTedrake(platform), needs at least two assigned reviewers, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes


geometry/optimization/graph_of_convex_sets.cc line 1653 at r4 (raw file):

Previously, AlexandreAmice (Alexandre Amice) wrote…

Maybe only warn once in case the GCS is reused to avoid console spam.

Done.

Copy link
Contributor

@RussTedrake RussTedrake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

+@rpoyner-tri for platform review, please.

Reviewed 2 of 3 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: 1 unresolved discussion, LGTM missing from assignee rpoyner-tri(platform), labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits), missing label for release notes (waiting on @cohnt)


geometry/optimization/graph_of_convex_sets.h line 883 at r3 (raw file):

Previously, cohnt (Thomas Cohn) wrote…

How about this? A little verbose, but very specific.

I like it.


geometry/optimization/graph_of_convex_sets.cc line 1662 at r3 (raw file):

Previously, AlexandreAmice (Alexandre Amice) wrote…

FWIW, ChooseBestSolver is pretty fast since it is just a big lookup table. If you're worried about the cost of repeatedly constructing the best solver, then SolveInParallel already takes care of that by caching the constructed solvers as it solves.

Nice.

@rpoyner-tri rpoyner-tri added the release notes: fix This pull request contains fixes (no new features) label Dec 3, 2024
Copy link
Contributor

@rpoyner-tri rpoyner-tri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+(release notes: fix)
:lgtm:

Reviewed 1 of 2 files at r1, 1 of 3 files at r3, 2 of 3 files at r4, 1 of 1 files at r5, all commit messages.
Reviewable status: 1 unresolved discussion, labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @cohnt)

@cohnt cohnt added release notes: feature This pull request contains a new feature and removed release notes: fix This pull request contains fixes (no new features) labels Dec 3, 2024
Copy link
Contributor Author

@cohnt cohnt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-(release notes: fix)
+(release notes: feature)

Dismissed @RussTedrake from a discussion.
Reviewable status: labeled "do not merge", commits need curation (https://drake.mit.edu/reviewable.html#curated-commits) (waiting on @cohnt)

@cohnt cohnt added status: squashing now https://drake.mit.edu/reviewable.html#curated-commits and removed status: do not merge labels Dec 3, 2024
@cohnt cohnt merged commit e03af33 into RobotLocomotion:master Dec 3, 2024
10 checks passed
@cohnt cohnt deleted the gcs-rounding-parallel branch December 3, 2024 21:10
RussTedrake pushed a commit to RussTedrake/drake that referenced this pull request Dec 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release notes: feature This pull request contains a new feature status: squashing now https://drake.mit.edu/reviewable.html#curated-commits
Projects
Development

Successfully merging this pull request may close these issues.

4 participants