Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
1672069
[OPTIMIZER] simplified pipeline pass (#1582)
ptillet Apr 26, 2023
8f47bdc
[OPTIMIZER] Added kWidth attribute to DotOperandEncoding (#1584)
ptillet Apr 27, 2023
e326ff7
[TEST] Fix test cache (#1588)
Jokeren Apr 28, 2023
65fb36e
[BACKEND] Updated slice layout semantics, updated vectorization logic…
zahimoud Apr 28, 2023
ee86404
[FRONTEND][BACKEND] Add the `noinline` annotation for `triton.jit` (#…
Jokeren Apr 28, 2023
4b07251
[FRONTEND] add architecture to hash to avoid invalid image on cubin l…
david-macleod Apr 29, 2023
3aff010
[FRONTEND] Fix calling local variables’ attribute functions in the if…
Jokeren Apr 30, 2023
39e751b
[OPTIMIZER][BACKEND] Enabled elementwise ops (including casts) betwee…
ptillet May 1, 2023
9d5354d
[RUNTIME] Ensure we hold the GIL before calling into CPython API in c…
albanD May 1, 2023
26d80f0
Merge branch `llvm-head` (#1600)
chsigg May 1, 2023
3449a9d
Zahi/slice reduce rebased (#1594)
zahimoud May 2, 2023
33174cc
[OPTIMIZER] Fix crash in loop pipelining. (#1602)
bchetioui May 2, 2023
d196302
[FRONTEND] make torch optional (#1604)
pommedeterresautee May 3, 2023
19e7238
[OPTIMIZER] Clean-up Utility.cpp and fixed bug in RematerializeForwar…
ptillet May 3, 2023
f387a6c
[BACKEND] Fixed up ConvertLayout for slices (#1616)
ptillet May 4, 2023
deb2c71
[FRONTEND] Add `tl.expand_dims` (#1614)
peterbell10 May 4, 2023
e2ae2c6
[BACKEND] Modified store op thread masking (#1605)
zahimoud May 5, 2023
aaeba98
[CI] no longer runs CI job on macos-10.15 (#1624)
ptillet May 5, 2023
fd381e2
[BACKEND] Allow noinline functions to return multiple values of primi…
Jokeren May 5, 2023
9d7980f
[BACKEND] Updated predicate for atomic ops (#1619)
zahimoud May 5, 2023
125d9d1
[TEST] Added convert layout test from/to sliced blocked/mma (#1620)
zahimoud May 6, 2023
7d20a86
[BACKEND] fix typo in Membar class about WAR description and refine s…
lipracer May 6, 2023
d338521
[SETUP] Removing `torch` as a test dependency (#1632)
ptillet May 7, 2023
132fe1b
[DOCS] Fix docstrings for sphinx docs (#1635)
grimoire May 8, 2023
858a2f0
[FRONTEND] Added interpreter mode (#1573)
pommedeterresautee May 8, 2023
319af1f
[CI] Build wheels for musllinux (#1638)
pganssle May 9, 2023
b19b274
[FRONTEND] Fix return op related control flow issues (#1637)
Jokeren May 9, 2023
0cd8f05
[CI] Upload CUDA test artifacts (#1645)
zahimoud May 10, 2023
6b1af5f
[FRONTEND] Add support for scalar conditions in `device_assert` (#1641)
lezcano May 10, 2023
147ec43
[FRONTEND] Hotfix for `contains_return_op` (#1651)
Jokeren May 10, 2023
fb40bf1
[TEST] Fixed and re-enabled reduce test (#1644)
zahimoud May 10, 2023
0daee68
[FRONTEND] Don't call set_device in tl.dot (#1646)
May 11, 2023
115964b
[TESTS] Add regression test for issue #1601. (#1611)
bchetioui May 11, 2023
35b27e1
[BUILD] Move canonicalization patterns of Load/Store to Ops.cpp. (NFC…
ingomueller-net May 11, 2023
674f9bf
[FRONTEND] Better error messages for noinline functions (#1657)
Jokeren May 11, 2023
b2a757d
[BUILD] Add missing CMake link-time dependencies. (#1654)
ingomueller-net May 12, 2023
47af6ba
[BACKEND] Move isSharedEncoding to TritonGPUIR. (#1655)
ingomueller-net May 13, 2023
3249d7a
[FRONTEND] Do not use exceptions do guide control flow in compilation…
cheshire May 13, 2023
9820899
[FRONTEND] Assert that for loop bounds must be ints (#1664)
sophiawisdom May 13, 2023
e5e961f
[OPTIMIZER] Fix-up reduction cloning
ptillet May 16, 2023
0c4de8a
[DEPENDENCIES] Update LLVM to 17.0.0 (c5dede880d17) and port changes.…
ingomueller-net May 16, 2023
95a932e
[OPTIMIZER] adjusted selection heuristics for when `mmaLayout.warpsPe…
ptillet May 16, 2023
323843c
[BUILD] stop depending on dlfcn-win32 by implementing `dladdr` native…
cloudhan May 16, 2023
177b46b
[BUILD] minor fixes (#1676)
chsigg May 16, 2023
3baab48
[FRONTEND] Differentiate between bool and int in the frontend (#1678)
Jokeren May 16, 2023
e5ae37f
[BUILD] Add deduction guide for `Interval` (#1680)
chsigg May 16, 2023
17eb982
[OPS] Remove duplicated function already defined in `triton` module. …
dfukalov May 17, 2023
4c4e42e
Merge remote-tracking branch 'openai/main' into IFU-230517
jayfurmanek May 17, 2023
78c6074
IFU 230517 Resolve merge conflicts
jayfurmanek May 17, 2023
cbb15e1
[ROCM] Fix hardcoded warpsize in getMask
jayfurmanek May 18, 2023
0c95e2c
[ROCM] Fix is_hip() checks
jayfurmanek May 22, 2023
db8e5fc
[ROCM] Fix noinline LIT test
jayfurmanek May 22, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 14 additions & 2 deletions .github/workflows/integration-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,9 +25,9 @@ jobs:
id: set-matrix
run: |
if [ x"${{ github.repository }}" == x"openai/triton" ]; then
echo '::set-output name=matrix::[["self-hosted", "A100"], ["self-hosted", "V100"], ["self-hosted", "gfx908"], "macos-10.15"]'
echo '::set-output name=matrix::[["self-hosted", "A100"], ["self-hosted", "V100"], ["self-hosted", "gfx908"]]'
else
echo '::set-output name=matrix::["ubuntu-latest", "macos-10.15"]'
echo '::set-output name=matrix::["ubuntu-latest"]'
fi

Integration-Tests:
Expand Down Expand Up @@ -101,6 +101,18 @@ jobs:
cd python/test/unit
python3 -m pytest

- name: Create artifacts archive
if: ${{(matrix.runner[0] == 'self-hosted') && (matrix.runner[1] == 'V100' || matrix.runner[1] == 'A100')}}
run: |
tar -czvf artifacts.tar.gz ~/.triton/cache

- name: Upload artifacts archive
if: ${{(matrix.runner[0] == 'self-hosted') && (matrix.runner[1] == 'V100' || matrix.runner[1] == 'A100')}}
uses: actions/upload-artifact@v2
with:
name: artifacts
path: artifacts.tar.gz

- name: Run CXX unittests
if: ${{ env.BACKEND != 'ROCM'}}
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ jobs:
#export CIBW_MANYLINUX_PYPY_X86_64_IMAGE="quay.io/pypa/manylinux2014_x86_64:latest"
export CIBW_BEFORE_BUILD="pip install cmake;"
export CIBW_SKIP="{cp,pp}35-*"
export CIBW_BUILD="{cp,pp}3*-manylinux_x86_64"
export CIBW_BUILD="{cp,pp}3*-manylinux_x86_64 cp3*-musllinux_x86_64"
python3 -m cibuildwheel python --output-dir wheelhouse


Expand Down
3 changes: 0 additions & 3 deletions .gitmodules

This file was deleted.

6 changes: 0 additions & 6 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -49,12 +49,6 @@ include_directories(${CMAKE_CURRENT_SOURCE_DIR}/include)
# Third-party
include_directories(${PYBIND11_INCLUDE_DIR})

if(WIN32)
SET(BUILD_SHARED_LIBS OFF)
find_package(dlfcn-win32 REQUIRED)
set(CMAKE_DL_LIBS dlfcn-win32::dl)
endif()

set(CMAKE_CXX_FLAGS "${CMAKE_C_FLAGS} -D__STDC_FORMAT_MACROS -fPIC -std=gnu++17 -fvisibility=hidden -fvisibility-inlines-hidden")

if (TRITON_USE_ROCM)
Expand Down
1 change: 1 addition & 0 deletions docs/python-api/triton.language.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Shape Manipulation Ops
:nosignatures:

broadcast_to
expand_dims
reshape
ravel

Expand Down
97 changes: 77 additions & 20 deletions include/triton/Analysis/Allocation.h
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
#ifndef TRITON_ANALYSIS_ALLOCATION_H
#define TRITON_ANALYSIS_ALLOCATION_H

#include "triton/Analysis/Utility.h"
#include "llvm/ADT/DenseMap.h"
#include "llvm/ADT/MapVector.h"
#include "llvm/ADT/SetVector.h"
Expand Down Expand Up @@ -49,18 +50,25 @@ template <typename T> class Interval {
T End = std::numeric_limits<T>::max();
};

template <class T> Interval(T, T) -> Interval<T>;

class Allocation {
public:
/// A unique identifier for shared memory buffers
using BufferId = size_t;
using BufferIdSetT = DenseSet<BufferId>;
using FuncAllocMapT = CallGraph<Allocation>::FuncDataMapT;

static constexpr BufferId InvalidBufferId =
std::numeric_limits<BufferId>::max();

Allocation() = default;
/// Creates a new Allocation analysis that computes the shared memory
/// information for all associated shared memory values.
Allocation(Operation *operation) : operation(operation) { run(); }
explicit Allocation(Operation *operation) : operation(operation) {}

/// Runs allocation analysis on the given top-level operation.
void run(FuncAllocMapT &funcAllocMap);

/// Returns the operation this analysis was constructed from.
Operation *getOperation() const { return operation; }
Expand All @@ -75,6 +83,12 @@ class Allocation {
return bufferSet.at(bufferId).size;
}

/// Returns the allocated interval of the given buffer.
Interval<size_t> getAllocatedInterval(BufferId bufferId) const {
auto &buffer = bufferSet.at(bufferId);
return Interval<size_t>(buffer.offset, buffer.offset + buffer.size);
}

/// Returns the buffer id of the given value.
/// This interface only returns the allocated buffer id.
/// If you want to get all the buffer ids that are associated with the given
Expand Down Expand Up @@ -104,26 +118,28 @@ class Allocation {
BufferId getBufferId(Operation *operation) const {
if (opScratch.count(operation)) {
return opScratch.lookup(operation)->id;
} else if (opVirtual.count(operation)) {
return opVirtual.lookup(operation)->id;
} else {
return InvalidBufferId;
}
}

/// Returns the size of the given buffer is a virtual buffer.
bool isVirtualBuffer(BufferId bufferId) const {
return bufferSet.at(bufferId).kind == BufferT::BufferKind::Virtual;
}

/// Returns the size of total shared memory allocated
size_t getSharedMemorySize() const { return sharedMemorySize; }

bool isIntersected(BufferId lhsId, BufferId rhsId) const {
if (lhsId == InvalidBufferId || rhsId == InvalidBufferId)
return false;
auto lhsBuffer = bufferSet.at(lhsId);
auto rhsBuffer = bufferSet.at(rhsId);
return lhsBuffer.intersects(rhsBuffer);
}

private:
/// A class that represents a shared memory buffer
struct BufferT {
enum class BufferKind { Explicit, Scratch };
/// Explicit: triton_gpu.alloc_tensor
/// Scratch: triton_gpu.convert_layout
/// Virtual: triton.call
enum class BufferKind { Explicit, Scratch, Virtual };

/// MT: thread-safe
inline static std::atomic<BufferId> nextId = 0;
Expand All @@ -142,12 +158,6 @@ class Allocation {
BufferT(BufferKind kind, size_t size) : BufferT(kind, size, 0) {}
BufferT(BufferKind kind, size_t size, size_t offset)
: kind(kind), id(nextId++), size(size), offset(offset) {}

bool intersects(const BufferT &other) const {
return Interval<size_t>(offset, offset + size)
.intersects(
Interval<size_t>(other.offset, other.offset + other.size));
}
};

/// Op -> Scratch Buffer
Expand All @@ -158,8 +168,6 @@ class Allocation {
using AliasBufferMapT = llvm::MapVector<Value, llvm::SetVector<BufferT *>>;
/// BufferId -> Buffer
using BufferSetT = std::map<BufferId, BufferT>;
/// Runs allocation analysis on the given top-level operation.
void run();

private:
template <BufferT::BufferKind Kind, typename KeyType, typename... Args>
Expand All @@ -168,6 +176,8 @@ class Allocation {
bufferSet[buffer.id] = std::move(buffer);
if constexpr (Kind == BufferT::BufferKind::Explicit) {
valueBuffer[key] = &bufferSet[buffer.id];
} else if constexpr (Kind == BufferT::BufferKind::Virtual) {
opVirtual[key] = &bufferSet[buffer.id];
} else {
opScratch[key] = &bufferSet[buffer.id];
}
Expand All @@ -178,8 +188,9 @@ class Allocation {
}

private:
Operation *operation;
Operation *operation = nullptr;
OpScratchMapT opScratch;
OpScratchMapT opVirtual;
ValueBufferMapT valueBuffer;
AliasBufferMapT aliasBuffer;
BufferSetT bufferSet;
Expand All @@ -188,7 +199,53 @@ class Allocation {
friend class triton::AllocationAnalysis;
};

template <typename T> Interval(T, T) -> Interval<T>;
/// Static analysis that computes the allocation of shared memory buffers
/// of the entire call graph.
/// The allocation is performed in a post-order walk of the call graph.
/// Each call op is treated like convert_layout that allocates a scratch buffer.
/// At each call, we compute the start offset of the scratch buffer and pass it
/// as an argument to the callee.
class ModuleAllocation : public CallGraph<Allocation> {
public:
using FuncOffsetMapT = DenseMap<FunctionOpInterface, Value>;

explicit ModuleAllocation(ModuleOp moduleOp)
: CallGraph<Allocation>(moduleOp) {
walk<WalkOrder::PreOrder, WalkOrder::PostOrder>(
// Pre-order edge walk callback
[](CallOpInterface callOp, FunctionOpInterface funcOp) {},
// Post-order node walk callback
[&](FunctionOpInterface funcOp) {
auto [iter, inserted] = funcMap.try_emplace(funcOp, funcOp);
if (inserted)
iter->second.run(funcMap);
});
}

size_t getSharedMemorySize() {
size_t size = 0;
for (auto funcOp : getRoots()) {
auto *alloc = getFuncData(funcOp);
size = std::max(size, alloc->getSharedMemorySize());
}
return size;
}

size_t getSharedMemorySize(FunctionOpInterface funcOp) {
return getFuncData(funcOp)->getSharedMemorySize();
}

void setFunctionSharedMemoryValue(FunctionOpInterface funcOp, Value value) {
sharedMemoryValue[funcOp] = value;
}

Value getFunctionSharedMemoryBase(FunctionOpInterface funcOp) {
return sharedMemoryValue[funcOp];
}

private:
FuncOffsetMapT sharedMemoryValue;
};

} // namespace mlir

Expand Down
55 changes: 55 additions & 0 deletions include/triton/Analysis/AxisInfo.h
Original file line number Diff line number Diff line change
Expand Up @@ -286,16 +286,71 @@ class AxisInfoAnalysis
AxisInfoAnalysis(DataFlowSolver &solver);
using dataflow::SparseDataFlowAnalysis<
dataflow::Lattice<AxisInfo>>::getLatticeElement;
using FuncAxisInfoMapT = DenseMap<FunctionOpInterface, AxisInfo>;

void visitOperation(Operation *op,
ArrayRef<const dataflow::Lattice<AxisInfo> *> operands,
ArrayRef<dataflow::Lattice<AxisInfo> *> results) override;
};

/// Module level axis info analysis based on the call graph, assuming that we
/// do not have recursive functions.
/// Since each function will be called multiple times, we need to
/// calculate the axis info based on the axis info of all the callers.
/// In the future, we can perform optimization using function cloning so that
/// each call site will have unique axis info.
using AxisInfoMapT = DenseMap<Value, AxisInfo>;
class ModuleAxisInfoAnalysis : public CallGraph<AxisInfoMapT> {
public:
explicit ModuleAxisInfoAnalysis(ModuleOp moduleOp)
: CallGraph<AxisInfoMapT>(moduleOp) {
SmallVector<FunctionOpInterface> funcs;
for (auto root : getRoots()) {
walk<WalkOrder::PreOrder, WalkOrder::PostOrder>(
// Pre-order edge walk callback
[](CallOpInterface callOp, FunctionOpInterface funcOp) {},
// Post-order node walk callback
[&](FunctionOpInterface funcOp) {
funcs.push_back(funcOp);
funcMap.try_emplace(funcOp, AxisInfoMapT{});
});
}
SetVector<FunctionOpInterface> sortedFuncs(funcs.begin(), funcs.end());
SymbolTableCollection symbolTable;
for (auto funcOp : llvm::reverse(sortedFuncs)) {
initialize(funcOp);
funcOp.walk([&](CallOpInterface callOp) {
auto callee =
dyn_cast<FunctionOpInterface>(callOp.resolveCallable(&symbolTable));
update(callOp, callee);
});
}
}

AxisInfo *getAxisInfo(Value value) {
auto funcOp =
value.getParentRegion()->getParentOfType<FunctionOpInterface>();
auto *axisInfoMap = getFuncData(funcOp);
if (!axisInfoMap) {
return nullptr;
}
auto it = axisInfoMap->find(value);
if (it == axisInfoMap->end()) {
return nullptr;
}
return &(it->second);
}

unsigned getPtrContiguity(Value ptr);

unsigned getPtrAlignment(Value ptr);

unsigned getMaskAlignment(Value mask);

private:
void initialize(FunctionOpInterface funcOp);

void update(CallOpInterface callOp, FunctionOpInterface funcOp);
};

} // namespace mlir
Expand Down
Loading