-
Notifications
You must be signed in to change notification settings - Fork 2.9k
[PROTON][Experimental] Initialize instruction sampling support for NVIDIA GPUs #4674
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
bd4ce81
Update
Jokeren 854d9ed
Update
Jokeren 7a0903b
Update
Jokeren cd1f0cf
Update
Jokeren 6409fed
Update
Jokeren 25fbd18
Update
Jokeren b9b5543
Update
Jokeren 3c4dd19
Update
Jokeren e5a4fd7
Update
Jokeren 6c2be4f
Update
Jokeren 2fef490
Update
Jokeren 3adb522
Update
Jokeren cdcd24e
Update
Jokeren 9a3161b
Update
Jokeren 3f87312
Update
Jokeren 7e9e0ce
Update
Jokeren 33815a3
Update
Jokeren d283ce0
Update
Jokeren 860ce1a
Update
Jokeren 1075d1b
Update
Jokeren 2a746a0
Update
Jokeren 20cadc7
Update
Jokeren da2a9e7
Update
Jokeren 20a3e10
Update
Jokeren 0c96768
Update
Jokeren 52840f9
Update
Jokeren fd5406a
Update
Jokeren ec7c96b
Add backend
Jokeren 13c5e99
Update
Jokeren 2c91e94
Merge branch 'main' into keren/cupti-pc-samples
Jokeren e2d88a8
Move files
Jokeren 8872da0
Enable pc sampling with timing
Jokeren a5b6e2a
Merge branch 'main' into keren/cupti-pc-samples
Jokeren 9aea49e
Update
Jokeren 1f3adc0
Update
Jokeren c2d2fcf
Update
Jokeren e9e5cd5
Update
Jokeren 8494e4b
Fix samples
Jokeren 8fbb751
Update
Jokeren 557d423
Update readme
Jokeren f8c043b
Update readme
Jokeren b291993
Merge branch 'main' into keren/cupti-pc-samples
Jokeren 26aab6d
Update
Jokeren 995647e
Update
Jokeren e761c7a
Update
Jokeren b9487c8
Merge branch 'keren/cupti-pc-samples' of github.com:openai/triton int…
Jokeren aee678a
Update
Jokeren File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
141 changes: 141 additions & 0 deletions
141
third_party/proton/csrc/include/Profiler/Cupti/CuptiPCSampling.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,141 @@ | ||
| #ifndef PROTON_PROFILER_CUPTI_PC_SAMPLING_H_ | ||
| #define PROTON_PROFILER_CUPTI_PC_SAMPLING_H_ | ||
|
|
||
| #include "CuptiProfiler.h" | ||
| #include "Driver/GPU/CudaApi.h" | ||
| #include "Driver/GPU/CuptiApi.h" | ||
| #include "Utility/Map.h" | ||
| #include "Utility/Singleton.h" | ||
| #include <atomic> | ||
| #include <mutex> | ||
|
|
||
| namespace proton { | ||
|
|
||
| struct CubinData { | ||
| size_t cubinCrc; | ||
| const char *cubin; | ||
| size_t cubinSize; | ||
|
|
||
| struct LineInfoKey { | ||
| uint32_t functionIndex; | ||
| uint64_t pcOffset; | ||
|
|
||
| bool operator<(const LineInfoKey &other) const { | ||
| return functionIndex < other.functionIndex || | ||
| (functionIndex == other.functionIndex && | ||
| pcOffset < other.pcOffset); | ||
| } | ||
| }; | ||
|
|
||
| struct LineInfoValue { | ||
| uint32_t lineNumber{}; | ||
| const std::string functionName{}; | ||
| const std::string dirName{}; | ||
| const std::string fileName{}; | ||
|
|
||
| LineInfoValue() = default; | ||
|
|
||
| LineInfoValue(uint32_t lineNumber, const std::string &functionName, | ||
| const std::string &dirName, const std::string &fileName) | ||
| : lineNumber(lineNumber), functionName(functionName), dirName(dirName), | ||
| fileName(fileName) {} | ||
| }; | ||
|
|
||
| std::map<LineInfoKey, LineInfoValue> lineInfo; | ||
| }; | ||
|
|
||
| struct ConfigureData { | ||
| ConfigureData() = default; | ||
|
|
||
| ~ConfigureData() { | ||
| if (stallReasonNames) { | ||
| for (size_t i = 0; i < numStallReasons; i++) { | ||
| if (stallReasonNames[i]) | ||
| std::free(stallReasonNames[i]); | ||
| } | ||
| std::free(stallReasonNames); | ||
| } | ||
| if (stallReasonIndices) | ||
| std::free(stallReasonIndices); | ||
| if (pcSamplingData.pPcData) { | ||
| for (size_t i = 0; i < numValidStallReasons; ++i) { | ||
| std::free(pcSamplingData.pPcData[i].stallReason); | ||
| } | ||
| std::free(pcSamplingData.pPcData); | ||
| } | ||
| } | ||
|
|
||
| void initialize(CUcontext context); | ||
|
|
||
| CUpti_PCSamplingConfigurationInfo configureStallReasons(); | ||
| CUpti_PCSamplingConfigurationInfo configureSamplingPeriod(); | ||
| CUpti_PCSamplingConfigurationInfo configureSamplingBuffer(); | ||
| CUpti_PCSamplingConfigurationInfo configureScratchBuffer(); | ||
| CUpti_PCSamplingConfigurationInfo configureHardwareBufferSize(); | ||
| CUpti_PCSamplingConfigurationInfo configureStartStopControl(); | ||
| CUpti_PCSamplingConfigurationInfo configureCollectionMode(); | ||
|
|
||
| // The amount of data reserved on the GPU | ||
| static constexpr size_t HardwareBufferSize = 128 * 1024 * 1024; | ||
| // The amount of data copied from the hardware buffer each time | ||
| static constexpr size_t ScratchBufferSize = 16 * 1024 * 1024; | ||
| // The number of PCs copied from the scratch buffer each time | ||
| static constexpr size_t DataBufferPCCount = 1024; | ||
| // The sampling period in cycles = 2^frequency | ||
| static constexpr uint32_t DefaultFrequency = 10; | ||
|
|
||
| CUcontext context{}; | ||
| uint32_t contextId; | ||
| uint32_t numStallReasons{}; | ||
| uint32_t numValidStallReasons{}; | ||
| char **stallReasonNames{}; | ||
| uint32_t *stallReasonIndices{}; | ||
| std::map<size_t, size_t> stallReasonIndexToMetricIndex{}; | ||
| std::set<size_t> notIssuedStallReasonIndices{}; | ||
| CUpti_PCSamplingData pcSamplingData{}; | ||
| // The memory storing configuration information has to be kept alive during | ||
| // the profiling session | ||
| std::vector<CUpti_PCSamplingConfigurationInfo> configurationInfos; | ||
| }; | ||
|
|
||
| class CuptiPCSampling : public Singleton<CuptiPCSampling> { | ||
|
|
||
| public: | ||
| CuptiPCSampling() = default; | ||
| virtual ~CuptiPCSampling() = default; | ||
|
|
||
| void initialize(CUcontext context); | ||
|
|
||
| void start(CUcontext context); | ||
|
|
||
| void stop(CUcontext context, uint64_t externId, bool isAPI); | ||
|
|
||
| void finalize(CUcontext context); | ||
|
|
||
| void loadModule(const char *cubin, size_t cubinSize); | ||
|
|
||
| void unloadModule(const char *cubin, size_t cubinSize); | ||
|
|
||
| private: | ||
| ConfigureData *getConfigureData(uint32_t contextId); | ||
|
|
||
| CubinData *getCubinData(uint64_t cubinCrc); | ||
|
|
||
| void processPCSamplingData(ConfigureData *configureData, uint64_t externId, | ||
| bool isAPI); | ||
|
|
||
| ThreadSafeMap<uint32_t, ConfigureData> contextIdToConfigureData; | ||
| // In case the same cubin is loaded multiple times, we need to keep track of | ||
| // all of them | ||
| ThreadSafeMap<size_t, std::pair<CubinData, /*count=*/size_t>> | ||
| cubinCrcToCubinData; | ||
| ThreadSafeSet<uint32_t> contextInitialized; | ||
|
|
||
| std::atomic<bool> pcSamplingStarted{false}; | ||
| std::mutex pcSamplingMutex{}; | ||
| std::mutex contextMutex{}; | ||
| }; | ||
|
|
||
| } // namespace proton | ||
|
|
||
| #endif // PROTON_PROFILER_CUPTI_PC_SAMPLING_H_ | ||
2 changes: 1 addition & 1 deletion
2
...ton/csrc/include/Profiler/CuptiProfiler.h → ...rc/include/Profiler/Cupti/CuptiProfiler.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this needs to be a raw pointer of chars, as opposed to just a vector of string, because of the CUPTI interfaces?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's specified in
CUpti_PCSamplingGetStallReasonsParams