Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retirement latencies to perf events and other perf event and metric improvements #298

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

captain5050
Copy link
Contributor

Add retirement latencies to perf events on GNR.
Classify more cache and memory events topics.
Simplify metric thresholds.

Try to have fewer events classified as topic "other" preferring
"cache" and "memory" topics. Remove some duplicated matches.
Use the mapfile.csv's event type of 'retirement latency' to locate a
retirement latency file. When create the perf event's dictionary add
retirement latency mean, min and max. For example on GNR in
cache.json:

```
...
    {
        "BriefDescription": "Retired load instructions with locked access.",
        "Counter": "0,1,2,3",
        "Data_LA": "1",
        "EventCode": "0xd0",
        "EventName": "MEM_INST_RETIRED.LOCK_LOADS",
        "PublicDescription": "Counts retired load instructions with locked access.",
        "RetirementLatencyMax": 5156,
        "RetirementLatencyMean": 63.76,
        "RetirementLatencyMin": 15,
        "SampleAfterValue": "100007",
        "UMask": "0x21"
    },
...
```
Simplify metric thresholds to clean up unnecessary brackets in metrics
parsed from TMA.
@captain5050 captain5050 requested a review from kliang2 as a code owner March 22, 2025 04:12
Without this Dependent_Loads_Weight isn't properly resolved.
Copy link
Contributor

@edwarddavidbaker edwarddavidbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generated the output from main and Ian's branched rebased onto the latest main.

Simplified metric threshold expressions.

--- main/alderlake/adl-metrics.json     2025-03-24 09:07:43.823351404 -0700
+++ retirement-latencies/alderlake/adl-metrics.json     2025-03-24 09:08:17.203371459 -0700
@@ -103,7 +103,7 @@
         "MetricExpr": "tma_core_bound",
         "MetricGroup": "TopdownL3;tma_L3_group;tma_core_bound_group",
         "MetricName": "tma_allocation_restriction",
-        "MetricThreshold": "(tma_allocation_restriction >0.10) & ((tma_core_bound >0.10) & ((tma_backend_bound >0.10)))",
+        "MetricThreshold": "tma_allocation_restriction > 0.1 & (tma_core_bound > 0.1 & tma_backend_bound > 0.1)",
         "ScaleUnit": "100%",
         "Unit": "cpu_atom"
     },
@@ -113,7 +113,7 @@
         "MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALL@ / (5 * cpu_atom@CPU_CLK_UNHALTED.CORE@)",
         "MetricGroup": "Default;TopdownL1;tma_L1_group",
         "MetricName": "tma_backend_bound",
-        "MetricThreshold": "(tma_backend_bound >0.10)",
+        "MetricThreshold": "tma_backend_bound > 0.1",
         "MetricgroupNoGroup": "TopdownL1;Default",
         "PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count",
         "ScaleUnit": "100%",

Retired latency stats are added to events.

--- main/graniterapids/cache.json       2025-03-24 09:07:46.873356313 -0700
+++ retirement-latencies/graniterapids/cache.json       2025-03-24 09:08:20.623372472 -0700
@@ -344,60 +344,75 @@
         "SampleAfterValue": "1000003",
         "UMask": "0x83"
     },
     {
         "BriefDescription": "Retired load instructions with locked access.",
         "Counter": "0,1,2,3",
         "Data_LA": "1",
         "EventCode": "0xd0",
         "EventName": "MEM_INST_RETIRED.LOCK_LOADS",
         "PublicDescription": "Counts retired load instructions with locked access.",
+        "RetirementLatencyMax": 5156,
+        "RetirementLatencyMean": 63.76,
+        "RetirementLatencyMin": 15,
         "SampleAfterValue": "100007",
         "UMask": "0x21"
     },
     {
         "BriefDescription": "Retired load instructions that split across a cacheline boundary.",
         "Counter": "0,1,2,3",
         "Data_LA": "1",
         "EventCode": "0xd0",
         "EventName": "MEM_INST_RETIRED.SPLIT_LOADS",
         "PublicDescription": "Counts retired load instructions that split across a cacheline boundary.",
+        "RetirementLatencyMax": 4704,
+        "RetirementLatencyMean": 3.97,
+        "RetirementLatencyMin": 0,
         "SampleAfterValue": "100003",
         "UMask": "0x41"
     },

@edwarddavidbaker
Copy link
Contributor

edwarddavidbaker commented Mar 24, 2025

@1perrytaylor Do you know why the last three Aux entries in the TMA metrics sheets do not start with #?

image


Discussed with Perry. This is expected.

Use cpu_core rather than cpu PMU.
Ensure the correct spreadsheet and column name are used. This data
isn't in mapfile.csv or the spreadsheet.
Copy link
Contributor

@edwarddavidbaker edwarddavidbaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Ian. Added a few Grand Ridge notes.

@@ -1934,7 +1936,7 @@ def to_perf_json(self, outdir: Path):
perf_json.write('\n')

# Skip hybrid because event grouping does not support it well yet
if self.shortname not in ['ADL', 'ADLN', 'ARL', 'LNL', 'MTL']:
if self.shortname not in ['ADL', 'ADLN', 'ARL', 'LNL', 'MTL', 'SRF']:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add GRR If we need to skip grouping for E-core only products (these aren't hybrids).

Ensure the correct spreadsheet and column name are used. This data
isn't in mapfile.csv or the spreadsheet.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants