Add retirement latencies to perf events and other perf event and metric improvements #298

captain5050 · 2025-03-22T04:12:55Z

Add retirement latencies to perf events on GNR.
Classify more cache and memory events topics.
Simplify metric thresholds.

Try to have fewer events classified as topic "other" preferring "cache" and "memory" topics. Remove some duplicated matches.

Use the mapfile.csv's event type of 'retirement latency' to locate a retirement latency file. When create the perf event's dictionary add retirement latency mean, min and max. For example on GNR in cache.json: ``` ... { "BriefDescription": "Retired load instructions with locked access.", "Counter": "0,1,2,3", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.LOCK_LOADS", "PublicDescription": "Counts retired load instructions with locked access.", "RetirementLatencyMax": 5156, "RetirementLatencyMean": 63.76, "RetirementLatencyMin": 15, "SampleAfterValue": "100007", "UMask": "0x21" }, ... ```

Simplify metric thresholds to clean up unnecessary brackets in metrics parsed from TMA.

Without this Dependent_Loads_Weight isn't properly resolved.

edwarddavidbaker

I generated the output from main and Ian's branched rebased onto the latest main.

Simplified metric threshold expressions.

--- main/alderlake/adl-metrics.json     2025-03-24 09:07:43.823351404 -0700
+++ retirement-latencies/alderlake/adl-metrics.json     2025-03-24 09:08:17.203371459 -0700
@@ -103,7 +103,7 @@
         "MetricExpr": "tma_core_bound",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_core_bound_group",
         "MetricName": "tma_allocation_restriction",
-        "MetricThreshold": "(tma_allocation_restriction >0.10) & ((tma_core_bound >0.10) & ((tma_backend_bound >0.10)))",
+        "MetricThreshold": "tma_allocation_restriction > 0.1 & (tma_core_bound > 0.1 & tma_backend_bound > 0.1)",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
@@ -113,7 +113,7 @@
         "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALL@ / (5 * cpu_atom@CPU_CLK_UNHALTED.CORE@)",
         "MetricGroup": "Default;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
-        "MetricThreshold": "(tma_backend_bound >0.10)",
+        "MetricThreshold": "tma_backend_bound > 0.1",
         "MetricgroupNoGroup": "TopdownL1;Default",
         "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count",
         "ScaleUnit": "100%",

Retired latency stats are added to events.

--- main/graniterapids/cache.json       2025-03-24 09:07:46.873356313 -0700
+++ retirement-latencies/graniterapids/cache.json       2025-03-24 09:08:20.623372472 -0700
@@ -344,60 +344,75 @@
         "SampleAfterValue": "1000003",
         "UMask": "0x83"
     },
     {
         "BriefDescription": "Retired load instructions with locked access.",
         "Counter": "0,1,2,3",
         "Data_LA": "1",
         "EventCode": "0xd0",
         "EventName": "MEM_INST_RETIRED.LOCK_LOADS",
         "PublicDescription": "Counts retired load instructions with locked access.",
+        "RetirementLatencyMax": 5156,
+        "RetirementLatencyMean": 63.76,
+        "RetirementLatencyMin": 15,
         "SampleAfterValue": "100007",
         "UMask": "0x21"
     },
     {
         "BriefDescription": "Retired load instructions that split across a cacheline boundary.",
         "Counter": "0,1,2,3",
         "Data_LA": "1",
         "EventCode": "0xd0",
         "EventName": "MEM_INST_RETIRED.SPLIT_LOADS",
         "PublicDescription": "Counts retired load instructions that split across a cacheline boundary.",
+        "RetirementLatencyMax": 4704,
+        "RetirementLatencyMean": 3.97,
+        "RetirementLatencyMin": 0,
         "SampleAfterValue": "100003",
         "UMask": "0x41"
     },

edwarddavidbaker · 2025-03-24T16:34:58Z

@1perrytaylor Do you know why the last three Aux entries in the TMA metrics sheets do not start with #?

Discussed with Perry. This is expected.

Use cpu_core rather than cpu PMU.

Ensure the correct spreadsheet and column name are used. This data isn't in mapfile.csv or the spreadsheet.

edwarddavidbaker

Thanks Ian. Added a few Grand Ridge notes.

scripts/create_perf_json.py

edwarddavidbaker · 2025-03-26T23:19:36Z

scripts/create_perf_json.py

@@ -1934,7 +1936,7 @@ def to_perf_json(self, outdir: Path):
                perf_json.write('\n')

        # Skip hybrid because event grouping does not support it well yet
-        if self.shortname not in ['ADL', 'ADLN', 'ARL', 'LNL', 'MTL']:
+        if self.shortname not in ['ADL', 'ADLN', 'ARL', 'LNL', 'MTL', 'SRF']:


Please also add GRR If we need to skip grouping for E-core only products (these aren't hybrids).

scripts/create_perf_json.py

Ensure the correct spreadsheet and column name are used. This data isn't in mapfile.csv or the spreadsheet.

captain5050 added 3 commits March 21, 2025 17:17

Update topics for cache and memory

4e8afd8

Try to have fewer events classified as topic "other" preferring "cache" and "memory" topics. Remove some duplicated matches.

Simplify metric thresholds

78e162b

Simplify metric thresholds to clean up unnecessary brackets in metrics parsed from TMA.

captain5050 requested a review from kliang2 as a code owner March 22, 2025 04:12

Properly handle aux names that don't have a hash prefix

bc1e1cc

Without this Dependent_Loads_Weight isn't properly resolved.

edwarddavidbaker approved these changes Mar 24, 2025

View reviewed changes

captain5050 added 2 commits March 25, 2025 08:14

Correct Arrowlake fixups

208a648

Use cpu_core rather than cpu PMU.

Add alderlake-N style fix up for sierraforest

aac40fd

Ensure the correct spreadsheet and column name are used. This data isn't in mapfile.csv or the spreadsheet.

edwarddavidbaker approved these changes Mar 26, 2025

View reviewed changes

Add alderlake-N style fix up for grandridge

3b17b3f

Ensure the correct spreadsheet and column name are used. This data isn't in mapfile.csv or the spreadsheet.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add retirement latencies to perf events and other perf event and metric improvements #298

Add retirement latencies to perf events and other perf event and metric improvements #298

captain5050 commented Mar 22, 2025

edwarddavidbaker left a comment

edwarddavidbaker commented Mar 24, 2025 •

edited

Loading

edwarddavidbaker left a comment

edwarddavidbaker Mar 26, 2025

Add retirement latencies to perf events and other perf event and metric improvements #298

Are you sure you want to change the base?

Add retirement latencies to perf events and other perf event and metric improvements #298

Conversation

captain5050 commented Mar 22, 2025

edwarddavidbaker left a comment

Choose a reason for hiding this comment

edwarddavidbaker commented Mar 24, 2025 • edited Loading

edwarddavidbaker left a comment

Choose a reason for hiding this comment

edwarddavidbaker Mar 26, 2025

Choose a reason for hiding this comment

edwarddavidbaker commented Mar 24, 2025 •

edited

Loading