Releases · flexflow/FlexFlow

01 Aug 04:07

d670657

Release 22.07 Latest

Latest

This is the last stable release of FlexFlow before the Unity merge. Unity enables joint optimization of algebraic transformations and parallelization and generally achieves better performance and scalability compared to the original FlexFlow without Unity's optimizations. The Unity merge introduces the following major changes to FlexFlow.

With Unity, we now use parallel computation graphs (PCGs) to represent a DNN model. PCG is a unified representation of
distributed DNN training that simultaneously expresses computation, parallelism, and data movement. A detailed description of PCG is available here.
We add support for Unity's additional forms of parallelism, including reduction parallelism and other operator-specific parallelization strategies.
We replace FlexFlow's MCMC search with a three-layer hierarchical search algorithm, which discovers joint optimization of algebraic transformations and parallelization and achieves better performance and scalability compared to FlexFlow's MCMC search.

Starting from this release, Unity's changes will be available in the master branch of the FlexFlow repository.

Assets 2

08 Jun 16:20

jiazhihao

r22.05

ad627c9

Release 22.05

This is a stable release of FlexFlow in preparation for the Unity merge.

Frontend support:

FlexFlow now supports training HuggingFace models using the PyTorch fx interface. An example of training HuggingFace MT5 in FlexFlow is available at https://github.com/flexflow/FlexFlow/tree/master/examples/python/pytorch/mt5

PyTorch Alignment:

Added unit tests for aligning FlexFlow's operators with PyTorch's. For each operator, the unit test checks if FlexFlow and PyTorch return identical activations/gradients when given the same inputs. More details of the PyTorch alignment is available at https://github.com/flexflow/FlexFlow/tree/master/align

Documentation:

Initial documentation support added: https://github.com/flexflow/FlexFlow/tree/master/docs

Operators:

Multiple bug fixes for FlexFlow operators

Broadcast:

FlexFlow now supports broadcasting for a subset of operators, include elementwise unary and elementwise binary operators. The broadcasting semantic is identical to that of Numpy's

Assets 2

06 Oct 14:57

jiazhihao

r21.09

ec7d783

Release 21.09 (September 30th 2021)

Frontend Supports

Changing PyBind11 as the default Python frontend in FlexFlow.

Control Replication

FlexFlow now enables Legion's dynamic control replication by default

Distributed training

FlexFlow now uses NCCL AllReduce for gradients synchronization by default. To switch to distributed parameter server, set FF_USE_NCCL=OFF in cmake.

Distributed inference

Passing comp_node = comp_node = CompMode::INFERENCE as an additional argument to model.compile will run a DNN model in the inference model
Various bug fixes and performance improvements for distributed inference in FlexFlow.

Operators

Additional operators include AggregateSpec, Multi-Head Attention

Machine Model

FlexFlow now support a new machine model for more precisely modeling network topology and simulating traffics at the granularity of individual packages

Assets 2

02 Apr 21:19

eddy16112

r21.03

142e2b1

Release 21.03 (March 31, 2021)

Build
- FlexFlow now uses camke build by default, the Makefiles will be deprecated soon.
Frontend Supports
- In addition to CFFI, FlexFlow now also supports Python interface via PyBind11. To use ByBind11, please set FF_USE_PYBIND = ON in cmake.
Distributed inference
- FlexFlow supports automated performance tuning for both distributed training and inference. For optimizing and performing distributed inference, simply pass comp_node = CompMode::INFERENCE as an additional argument to model.compile. An example can be found at https://github.com/flexflow/FlexFlow/blob/master/examples/python/native/bert_proxy_native.py.
Runtime
- FlexFlow now supports gradients update via either Parameter Server or NCCL Allreduce. To enable NCCL, please set FF_USE_NCCL = ON in cmake.
Operators
- New operators including Aggregate, Multi-head Attention, Scalar Multiply, Scalar Add, Scalar Sub, Scalar Divide and Top-K.
- Conv2D now supports group convolutions.
Examples
- Unit tests of all operators have been added to the tests/ops folder.

Assets 2

04 Jan 19:40

jiazhihao

r20.12

8e8d067

Release 20.12 (December 21, 2020)

Build
- FlexFlow now supports both Makefile and CMake build. More details are available in this instruction.
Frontend Supports
- PyTorch. FlexFlow now supports training existing PyTorch models with minimal changes to the source code. To run PyTorch models in FlexFlow, users can first export a model to the ONNX format using torch.onnx and then load an ONNX model in FlexFlow for distributed training. More examples: https://github.com/flexflow/FlexFlow/tree/master/examples/python/pytorch
- ONNX. FlexFlow supports training existing ONNX models through flexflow.onnx.model. More examples: https://github.com/flexflow/FlexFlow/tree/master/examples/python/onnx
- TensorFlow Keras. Similar to the PyTorch support. flexflow.keras enables distributed training of existing TensorFlow Keras models. See this bootcamp talk for more details.
Parallelization Optimizer
- Integrated the parallelization optimizer into the FlexFlow runtime. Users can now use the --search-budget and --search-alpha to control the FlexFlow parallelization optimizer for searching for optimized strategies. See this post for the usage of the optimizer.
Examples
- More PyTorch, ONNX, TensorFlow Keras examples have been added to the /examples/python folder.
- Updated the cpp examples to use the new runtime interface.
Mapper
- Implemented a new mapper with improved runtime performance.
Legion
- Updated the Legion version with improved runtime performance

Assets 2

14 Feb 01:33

flexflow

v1.1.1

dbdbd81

FlexFlow v1.1.1 Release for the SysML19 Artifact Evaluation Pre-release

Pre-release

This is v1.1.1 pre-release for SysML19 Artifact Evaluation. Follow the instructions to build FlexFlow and use the script run_experiments.sh to run all experiments.

Assets 2

11 Feb 19:13

jiazhihao

v1.1

5a4d4ad

FlexFlow v1.1 Release for the SysML19 Artifact Evaluation Pre-release

Pre-release

This is v1.1 pre-release for SysML19 Artifact Evaluation. Follow the instructions to build FlexFlow and use the script run_experiments.sh to run all experiments.

Assets 2

26 Jan 01:29

flexflow

v1.0

d87ab34

SysML19 Artifact Evaluation

This is a pre-release for SysML19 Artifact Evaluation. Follow the instructions to build FlexFlow and use the script run_experiments.sh to run all experiments.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: flexflow/FlexFlow

Release 22.07

Release 22.05

Release 21.09 (September 30th 2021)

Release 21.03 (March 31, 2021)

Release 20.12 (December 21, 2020)

FlexFlow v1.1.1 Release for the SysML19 Artifact Evaluation

FlexFlow v1.1 Release for the SysML19 Artifact Evaluation

SysML19 Artifact Evaluation