-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add retirement latencies to perf events and other perf event and metric improvements #298
base: main
Are you sure you want to change the base?
Conversation
Try to have fewer events classified as topic "other" preferring "cache" and "memory" topics. Remove some duplicated matches.
Use the mapfile.csv's event type of 'retirement latency' to locate a retirement latency file. When create the perf event's dictionary add retirement latency mean, min and max. For example on GNR in cache.json: ``` ... { "BriefDescription": "Retired load instructions with locked access.", "Counter": "0,1,2,3", "Data_LA": "1", "EventCode": "0xd0", "EventName": "MEM_INST_RETIRED.LOCK_LOADS", "PublicDescription": "Counts retired load instructions with locked access.", "RetirementLatencyMax": 5156, "RetirementLatencyMean": 63.76, "RetirementLatencyMin": 15, "SampleAfterValue": "100007", "UMask": "0x21" }, ... ```
Simplify metric thresholds to clean up unnecessary brackets in metrics parsed from TMA.
Without this Dependent_Loads_Weight isn't properly resolved.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generated the output from main and Ian's branched rebased onto the latest main.
Simplified metric threshold expressions.
--- main/alderlake/adl-metrics.json 2025-03-24 09:07:43.823351404 -0700
+++ retirement-latencies/alderlake/adl-metrics.json 2025-03-24 09:08:17.203371459 -0700
@@ -103,7 +103,7 @@
"MetricExpr": "tma_core_bound",
"MetricGroup": "TopdownL3;tma_L3_group;tma_core_bound_group",
"MetricName": "tma_allocation_restriction",
- "MetricThreshold": "(tma_allocation_restriction >0.10) & ((tma_core_bound >0.10) & ((tma_backend_bound >0.10)))",
+ "MetricThreshold": "tma_allocation_restriction > 0.1 & (tma_core_bound > 0.1 & tma_backend_bound > 0.1)",
"ScaleUnit": "100%",
"Unit": "cpu_atom"
},
@@ -113,7 +113,7 @@
"MetricExpr": "cpu_atom@TOPDOWN_BE_BOUND.ALL@ / (5 * cpu_atom@CPU_CLK_UNHALTED.CORE@)",
"MetricGroup": "Default;TopdownL1;tma_L1_group",
"MetricName": "tma_backend_bound",
- "MetricThreshold": "(tma_backend_bound >0.10)",
+ "MetricThreshold": "tma_backend_bound > 0.1",
"MetricgroupNoGroup": "TopdownL1;Default",
"PublicDescription": "Counts the total number of issue slots that were not consumed by the backend due to backend stalls. Note that uops must be available for consumption in order for this event to count. If a uop is not available (IQ is empty), this event will not count",
"ScaleUnit": "100%",
Retired latency stats are added to events.
--- main/graniterapids/cache.json 2025-03-24 09:07:46.873356313 -0700
+++ retirement-latencies/graniterapids/cache.json 2025-03-24 09:08:20.623372472 -0700
@@ -344,60 +344,75 @@
"SampleAfterValue": "1000003",
"UMask": "0x83"
},
{
"BriefDescription": "Retired load instructions with locked access.",
"Counter": "0,1,2,3",
"Data_LA": "1",
"EventCode": "0xd0",
"EventName": "MEM_INST_RETIRED.LOCK_LOADS",
"PublicDescription": "Counts retired load instructions with locked access.",
+ "RetirementLatencyMax": 5156,
+ "RetirementLatencyMean": 63.76,
+ "RetirementLatencyMin": 15,
"SampleAfterValue": "100007",
"UMask": "0x21"
},
{
"BriefDescription": "Retired load instructions that split across a cacheline boundary.",
"Counter": "0,1,2,3",
"Data_LA": "1",
"EventCode": "0xd0",
"EventName": "MEM_INST_RETIRED.SPLIT_LOADS",
"PublicDescription": "Counts retired load instructions that split across a cacheline boundary.",
+ "RetirementLatencyMax": 4704,
+ "RetirementLatencyMean": 3.97,
+ "RetirementLatencyMin": 0,
"SampleAfterValue": "100003",
"UMask": "0x41"
},
@1perrytaylor Do you know why the last three Discussed with Perry. This is expected. |
Use cpu_core rather than cpu PMU.
Ensure the correct spreadsheet and column name are used. This data isn't in mapfile.csv or the spreadsheet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Ian. Added a few Grand Ridge notes.
scripts/create_perf_json.py
Outdated
@@ -1934,7 +1936,7 @@ def to_perf_json(self, outdir: Path): | |||
perf_json.write('\n') | |||
|
|||
# Skip hybrid because event grouping does not support it well yet | |||
if self.shortname not in ['ADL', 'ADLN', 'ARL', 'LNL', 'MTL']: | |||
if self.shortname not in ['ADL', 'ADLN', 'ARL', 'LNL', 'MTL', 'SRF']: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also add GRR
If we need to skip grouping for E-core only products (these aren't hybrids).
Ensure the correct spreadsheet and column name are used. This data isn't in mapfile.csv or the spreadsheet.
Add retirement latencies to perf events on GNR.
Classify more cache and memory events topics.
Simplify metric thresholds.