Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust Integration: Perform Max Flow in Rust instead of Python #1

Merged
merged 43 commits into from
Nov 8, 2024

Conversation

ohjuny
Copy link
Collaborator

@ohjuny ohjuny commented Oct 18, 2024

The overarching goal is to improve performance by using Rust instead of Python to perform intense computations, while maintaining the Python API.

We implement Python-Rust interop using pyo3. We use the pathfinding crate for the Rust-side implementation of the Edmonds-Karp max flow algorithm.

Through initial profiling, we have identified the call to nx.maximum_flow to be the overwhelming bottleneck (51-74% of total time spent). This PR replaces the calls to nx.maximum_flow with Rust-side bindings, which initial profiling shows a total speedup of 1.6-2.6x.

Links

Results

Based on profiling runs of 5 models on A40 and A100 GPUs, we found:

  • 3.83~6.99x speed up on first call to Rust-side max_flow
  • 1.89~3.56x speed up on seconds call to Rust-side max_flow
  • Total runtime: 1.31~3.46x speedup

Build and Run

Assume you have some project that is using lowtime, and you have a local clone of this lowtime branch.

$ ls
project/    lowtime/

Then run the following to set up a virtual environment with this version of lowtime.

$ cd project/
$ python -m venv .env        # create virtual environment
$ source .env/bin/activate   # activate virtual environment
$ cd ../lowtime/
$ pip install -e .           # (editable) install lowtime
$ cd ../project/
$ #work on project, lowtime is now installed

If you modify lib.rs and want those changes reflected:

$ pwd
lowtime
$ pip uninstall lowtime
$ pip install -e .

Design Decisions

SparseCapacity vs DenseCapacity

The Rust-side code uses the pathfinding::durected::edmonds_karp::edmonds_karp implementation for max flow. The function takes in an EK type parameter, that determines whether to represent the graph using a SparseCapacity (BTree of Btrees) or DenseCapacity (adjaceny matrix). Due to the nature of computation graphs of parallelized training workloads, a SparseCapacity was more efficient. This was verified through profiling, which showed that SparseCapacity was up to 10x faster than DenseCapacity.

OrderedFloat<f64> vs f32 vs i64

The edmonds_karp implementation requires the Zero, Bounded, Signed, Ord, Copy` traits to be implemented by capacity values.

  • Raw floats in Rust do not implement Ord, so they cannot be used; OrderedFloat does.
  • OrderedFloat<f32> resulted in incorrect output due to lack of precision.

We considered an implementation where we use integers, and convert to/from floats by multiplying by 1e9.

  • Using i64 resulted in incorrect output. Furthermore, it resulted in a similar runtime (no significant gain).
  • u64 cannot be used as it does not implement Signed, which is needed to represent negative residual flows in edmonds_karp.

Therefore OrderedFloat<f64> was chosen to represent capacity.

pyo3 Data Transfer Reduction with "pimpl" idiom

The _lowtime_rs::PhillipsDessouky constructor returns the object itself, which is very large as it contains all the nodes, edges, and associated capacities. Due to a misconception that pyo3 would have to create a "Python copy" of the Rust object, we experimented with using the pimpl idiom to implement a wrapper PhillipsDessouky class that contained a pointer to an internal _PhillipsDessouky class, which contained the actual data and implementations. However, pyo3 seems to have a clever way of associating Rust objects with Python objects without creating a "Python copy". In fact, profiling the "pimpl version" resulted in slightly slower runtimes, probably due to the added indirection and use of the heap.

Handling Rust-side Errors

Before this PR, the lowtime codebase performs exception handling on the call tonx.maximum_flow, since the nx library is designed to raise well-defined exceptions (eg. nx.NetworkXUnbounded). However, the Rust-side pathfinding library is does not contain well-defined errors; instead, the edmonds_karp function considers the following situations unrecoverable:

  • panic!s if source and sink nodes are not found in verticies.
  • unreachable! if there is no flow to cancel in its internal representation of node capacities. One way this is caused is by providing a negative capacity as input (this caused a very hard-to-debug error when some flows were very small negative values due to floating point imprecision).
  • assert!s many invariants of the internal graph representation (assert! will panic! if the condition is false).

If the Rust-side library use well-defined errors, pyo3 has integration for propagating those errors as Python exceptions. The "recommended way" when using a third-party crate that does not contain well-defined errors is to, if possible, modify the third-party crate itself to do this.

pathfinding chooses to use unrecoverable macros over well-defined errors. Assuming we do not intend on changing the pathfinding codebase itself, we have 2 options:

  1. Use catch_unwindto catch panic!s in Rust. Once caught, we convert the error to a custom Error type that is propagated to Python as an exception, which can be caught and handled in Python.
  2. Do not handle panic!s, letting the program crash gracefully. The panic! and unreachable! macros are intended as graceful exit points for unrecoverable states, so we should not try to treat it like a Python exception.

There is significant debate on this topic; for example, this discussion based on the polars library. In the polars discussion, the debate was closed with:

A Rust panic really is intended to be treated as a (graceful) crash, not as a user-space error. It sounds from the polars thread that the maintainers there are willing to treat panics as bugs and fix them.

one does not need to manually extend error catching conditions when using a cython-wrapped library, as one would have to do when including a pyo3-wrapped library.

The point is that a panic isn't a typical error condition (returning Result is). A Python user isn't expected to interact with panics, the choice to not extend from Exception is deliberate so that panics are not accidentally caught.

Will close this one, if there is significant pressure from the community we can revisit.

We make the same decision here: we do not attempt to catch or handle panic!s and unreachable! crashes. If needed in the future, we can add invariant checks before the call to edmonds_karp.

Profiling Set Up

Through the development process, we use profiling data to determine whether an idea was worth pursuing. Since there were many ideas that depended on each other, there was a need to build a profiling infrastructure fast (even if it is scrappy) before we could start experimenting with lowtime. As a result, the current profiling inftrastructure was built:

  1. Measure times between intervals of interest (eg. the call to find_min_cut).
  2. Log the time.
  3. Run a separate script after lowtime has completed that parses the job.log for profiling-related logs and groups relevant intervals together.

While fast, this approach results in many profiling related logs in the lowtime codebase, and requires a separate script to parse the logs post-lowtime. An ideal solution would be integrated profiling infrastructure integrated into lowtime itself, that can be turned on/off through a command-line argument when running lowtime. However, this PR chooses the "scrappy" option because:

  1. The need for profiling data was blocking progress on the "core" idea (Rust interop).
  2. Building an integrated profiling infrastructure seems like an independent addition that could be its own PR.

We will need to remove the profiling logs at the end of this chain of commits (i.e. when we eventually merge lowtime-rust with main), but we keep them for now as we will need to use the existing profiling infrastructure for future commits in this chain.

@ohjuny ohjuny marked this pull request as draft October 18, 2024 21:16
@ohjuny ohjuny self-assigned this Oct 18, 2024
@ohjuny ohjuny added the enhancement New feature or request label Oct 18, 2024
lowtime/solver.py Outdated Show resolved Hide resolved
@jaywonchung
Copy link
Member

Could you also quickly document how to build and run this?

@ohjuny
Copy link
Collaborator Author

ohjuny commented Nov 5, 2024

Not 100% knowledgeable on Python packaging and distribution, but here's what I tried:

  • Started with src layout that you linked before. I got confused about how to reconcile the 2 different pyproject.tomls that used different build-backends.
  • Found an example (deltalake-python) that had mixed Python and Rust. The organization looked clean, and used a single maturin build-backend for both Rust and Python files.
  • Read through maturin README for more info as well. Main takeaway was that (1) the deltalake-python organization works and (2) it is good convention to prepend _ to the Rust module if the intention is that it is internal.

That's how I ended up with the current structure.

I'm trying to fix the last ci.yaml pyright test, where it seems to say that _lowtime_rust does not exist. When I test locally, just running "pip install lowtime" seems to be sufficient in building both Rust and Python libraries, so I'm not sure why the automated test is failing. Will have to look deeper into this, but let me know if you spot anything.

@jaywonchung
Copy link
Member

jaywonchung commented Nov 6, 2024

Have a pyi file? lowtime/_lowtime_rs.pyi.

from __future__ import annotations

class PhillipsDessouky:
    node_ids: list[int]
    
    def __init__(
        self,
        node_ids: list[int],
        source_node_id: int,
        sink_node_id: int,
        edges_raw: list[tuple[tuple[int, int], float]]
    ) -> None: ...

    def max_flow(self) -> list[tuple[tuple[int, int], float]]: ...

@ohjuny
Copy link
Collaborator Author

ohjuny commented Nov 7, 2024

Have a pyi file? lowtime/_lowtime_rs.pyi.

from __future__ import annotations

class PhillipsDessouky:
    node_ids: list[int]
    
    def __init__(
        self,
        node_ids: list[int],
        source_node_id: int,
        sink_node_id: int,
        edges_raw: list[tuple[tuple[int, int], float]]
    ) -> None: ...

    def max_flow(self) -> list[tuple[tuple[int, int], float]]: ...

DIdn't seem to work :( Will look further into this.

Edit: needed to also add an empty py.typed file, documented in the maturin docs. Fun fact: there is an open PR on pyo3 to generate these automatically.

Fixing some other pyright issues now.

@ohjuny ohjuny marked this pull request as ready for review November 7, 2024 18:00
Copy link
Member

@jaywonchung jaywonchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is absolutely great work, thank you! I've left some suggestions and comments.

.github/workflows/CI.yml Outdated Show resolved Hide resolved
.github/workflows/CI.yml Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
pyproject.toml Show resolved Hide resolved
src/phillips_dessouky.rs Show resolved Hide resolved
Cargo.toml Outdated Show resolved Hide resolved
lowtime/.gitignore Outdated Show resolved Hide resolved
lowtime/_lowtime_rs.pyi Outdated Show resolved Hide resolved
lowtime/_lowtime_rs.pyi Show resolved Hide resolved
lowtime/_lowtime_rs.pyi Outdated Show resolved Hide resolved
Copy link
Member

@jaywonchung jaywonchung left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

One thing that would be nice to have is an automated e2e test for result consistency.

@jaywonchung jaywonchung merged commit 3f086b9 into lowtime-rust Nov 8, 2024
17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants