Skip to content

Conversation

@dylanllim
Copy link
Contributor

@dylanllim dylanllim commented Jun 7, 2024

cublasGemmEx takes a bunch of void * and then the appropriate datatype. Moves the cast for the actual datatype to void * in the kernel code.

Linked Issues:


This change is Reviewable

@dylanllim dylanllim closed this Jun 7, 2024
@dylanllim dylanllim deleted the linear_kernel_update branch June 7, 2024 19:18
dylanllim added a commit to dylanllim/FlexFlow that referenced this pull request Oct 16, 2024
dylanllim added a commit to dylanllim/FlexFlow that referenced this pull request Jan 28, 2025
lockshaw added a commit that referenced this pull request May 2, 2025
* test_utils refactor, local_cpu_allocator

* test utils modification, cast, reverse, and replicate cpu kernels

* combine kernel

* combine kernels .h file

* Implementations for methods for machine_views and associated modules  (#1429)

* initial commit for machine view adjacent modules

* Formatting

* Tests for new machine_view.cc functions

* formatting

* Minor Test correction

* formatting

* PR fixes

* PR Fixes

---------

Co-authored-by: Pietro Max Marsella <[email protected]>

* test utils logic cleanup, reverse cpu_kernel pedagogical implmentation, other minor fixes

* cpu_kernel's refactor, generic tensor accessor indexing

* accessor.h formatting

* mk_runtime_error formatting

* reverse_kernels include

* test_utils refactor and clarity

* formatting

* comment removal reverse_kernels

* Issue #1435, tests for managed stream and handle

* #1435 formatting

* #1409 issue, change datatype for linear kernels away from void *

* R & W accessor changes, minimize code bloat

* code formatting and refactor

* issue #1502 & issue #1540

* format check

* branch merge and test fixes

* build issues

* Add AWS linux AMI to runs-on for testing (#1589)

* Pin runs-on images (#1590)

* GPU CI Fix (Pin runs-on GPU image) (#1588)

* Debug

* Change to base DL AMI

* Print disk usage

* Run nvidia-smi

* Remove excess cuda installs in base ami

* Re-enable freeing space in GPU CI

* Try updating nix-develop version

* Check what happens if you just enter the non-nixGL environment

* Try switching AMIs

* Try to remove the module stuff

* Move to lockshaw/develop-action

* Try pointing at a fixed commit

* Update nix-develop action

* Update nix-develop action to use BASH_FUNC filtering

* Remove all the /usr/local/cuda entries

* Switch back to gpu-ci env

* Update the cuda arch

* Try out the new runs-on gpu image

* Move over to pinned runs-on image

* Remove a bunch more unnecessary stuff in image to get back disk space

* Try using an emphemeral store

* Try mounting

* Fix bug

* Try sudo

* Move nix into _work

* Rollback all unnecessary changes

* Re-enable waiting on cpu-ci

* Merge substitution-builder (#1575)

* Start on pcg builder

* Add tests and some implementation for pcg builder

* Add pcg tests, make dtgen constructors explicit to fix bug

* Add remainder of PCG tests

* Fix build issues in local-execution

* Format

* Address Reyna comments, add topological_order function for PCG

* Pre multidigraph refactor

* Removing visitable from sp code

* Add open dataflow graph, start to replace pcg dataflow graph

* Start refactoring substitutions

* Add utility functions to support pattern matching

* Pre-refactor inputs

* Fix proj url

* Get back to substitutions, now with unordered graph inputs

* Get substitutions building

* substitutions-tests now builds

* Fix bug in filter, pass some initial substitution tests

* Add tests for fmt::to_string, fix some substitutions bugs

* Pass initial unit tests for find_pattern_matches

* Start on unit tests for pcg pattern

* Pass initial test for find_pattern_matches

* Fix small build issue in tests

* Format

* Sync tests in CI with tests in proj

* Fix minor build errors in kernels and local-execution

* Format

* Remove outdated code

* More outdated code removal

* More cleanup, add test for sp decomposition

* Pull apart containers.h

* More sp testing and fixes

* Break up graph algorithms.h

* Pre- full SP algo commit

* Add initial implementation and tests for cbc decomposition and inverse line graph

* Pass test for get_inverse_line_graph

* Add new multidigraph

* Fix get_inverse_line_graph to return a MultiDiGraph instead of a DiGraph

* Add tests for parallel and series reduction finding

* Add really rough implementation of valdez sp decomposition

* Fix local-execution build

* Add implementations and tests for applying series/parallel reductions

* Format

* Clean up sp decomposition interface and tests

* Format

* Add comments for top-level substitutions functions, add proj doxygen support

* Start sketching out substitutions code

* Fix build errors

* Add ability to permute node ids

* Cleanup and start to test new substitutions code

* Add test case for evaluate_substitution_output

* Add naive isomorphism detection code

* Add graph inputs to open dataflow graph isomorphism

* Add input permutation to evaluate_substitution_output

* Fix permute_node_ids

* Add test for permute_input_ids

* Migrate over to mutable implementation of apply_substitution

* Add fast isomorphism checking and an initial implementation of full substitution logic

* Pass initial full substitutions test

* Cleanup old isomorphism checking code

* Fix post-merge bugs

* Fix broken pcg builder test

* Format

* Reorganize code and remove some outdated code pre-code-review

* Format

* Restarting work on this after working on export-model-arch

* Adding in some a simple function to get the currently available substritutions

* nonnegative_int additions, code cleanup, etc.

* A bunch more moving over to nonnegative_int

* Even more nonnegative_int updating

* Fix build

* Fix failing tests

* Format

* Format

---------

Co-authored-by: Colin Unger <[email protected]>
Co-authored-by: Victor Li <[email protected]>

* test_utils refactor, local_cpu_allocator

* test utils modification, cast, reverse, and replicate cpu kernels

* combine kernel

* test utils logic cleanup, reverse cpu_kernel pedagogical implmentation, other minor fixes

* cpu_kernel's refactor, generic tensor accessor indexing

* test_utils refactor and clarity

* R & W accessor changes, minimize code bloat

* issue #1502 & issue #1540

* branch merge and test fixes

* merge

* build after merge

* kernel issues

* managed stream / handle test case fix

* test_utils update, kernel/ops refactor

* Review fixes

* Update doctest includes in kernels

* More PR review

* Try using rhel package-based nixgl

* Format

* Update proj with test command fixes

* Attempt to fix gpu CI

* Use custom AMI in GPU CI

* Fix proj bug in cpu-ci

* Try including run id

* Temporarily allow gpu ci to run regardless for testing purposes

* Try using official ubuntu ami in gpu ci

* Try out new ami

* Change to use new flexflow-gpu-ci AMI

* Fix bugs in GPU tests and restore GPU CI gating

* Format

* Fix bug in accessor formatting test cases

* Bugfixes and updated proj

* Fix all cpu tests

* Format

* Add improved test failure output for replicate cpu vs gpu tests

* Continue debugging replicate cuda testcases

* Format

* Fix incorrect tensor size in replicate kernel tests

* Transpose replicate backward cpu kernel

* Try flipping output dimensions in replica cuda kernel test

* Update proj

---------

Co-authored-by: Marsella8 <[email protected]>
Co-authored-by: Pietro Max Marsella <[email protected]>
Co-authored-by: Colin Unger <[email protected]>
Co-authored-by: Victor Li <[email protected]>
Co-authored-by: Victor Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants