Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request - Energy profiling #273

Open
TomMelt opened this issue Apr 3, 2023 · 14 comments
Open

feature request - Energy profiling #273

TomMelt opened this issue Apr 3, 2023 · 14 comments
Labels
enhancement New feature or request PAPI Hardware counters via PAPI process sampling Background system-level sampling performed in background thread

Comments

@TomMelt
Copy link

TomMelt commented Apr 3, 2023

Hi,

ArmForge has a feature (perf-report) that can estimate power usage of a binary.

image

Is it possible to do something like this in omnitrace?

I had tried using AMDuProf but it is not supported on linux (see section 10.3 Limitations, p. 179). I raised an issue on the Community discussion forum.

I think it has something to do with the RAPL drivers. I can see some reference to them in the source, but I don't know how to use it.

source/docs/runtime.md:124:`amd64_rapl::RAPL_ENERGY_PKG`, `perf::PERF_COUNT_HW_CPU_CYCLES`, etc.
source/docs/runtime.md:698:| amd64_rapl::RAPL_ENERGY_PKG           | Number of Joules consumed by all c... |
source/docs/runtime.md:699:| amd64_rapl::RAPL_ENERGY_PKG:u=0       | amd64_rapl::RAPL_ENERGY_PKG + moni... |
source/docs/runtime.md:700:| amd64_rapl::RAPL_ENERGY_PKG:k=0       | amd64_rapl::RAPL_ENERGY_PKG + moni... |
source/docs/runtime.md:701:| amd64_rapl::RAPL_ENERGY_PKG:period=0  | amd64_rapl::RAPL_ENERGY_PKG + samp... |
source/docs/runtime.md:702:| amd64_rapl::RAPL_ENERGY_PKG:freq=0    | amd64_rapl::RAPL_ENERGY_PKG + samp... |
source/docs/runtime.md:703:| amd64_rapl::RAPL_ENERGY_PKG:excl=0    | amd64_rapl::RAPL_ENERGY_PKG + excl... |
source/docs/runtime.md:704:| amd64_rapl::RAPL_ENERGY_PKG:mg=0      | amd64_rapl::RAPL_ENERGY_PKG + moni... |
source/docs/runtime.md:705:| amd64_rapl::RAPL_ENERGY_PKG:mh=0      | amd64_rapl::RAPL_ENERGY_PKG + moni... |
source/docs/runtime.md:706:| amd64_rapl::RAPL_ENERGY_PKG:cpu=0     | amd64_rapl::RAPL_ENERGY_PKG + CPU ... |
source/docs/runtime.md:707:| amd64_rapl::RAPL_ENERGY_PKG:pinned=0  | amd64_rapl::RAPL_ENERGY_PKG + pin ... |
@jrmadsen
Copy link
Collaborator

jrmadsen commented Apr 4, 2023

Add OMNITRACE_PAPI_EVENTS = amd64_rapl::RAPL_ENERGY_PKG to a config file and you should see them in the trace timeline... assuming your machine has the privileges to read them (which it sounds like it does), but you should be able to verify that with: omnitrace-avail -H -r RAPL

@TomMelt
Copy link
Author

TomMelt commented Apr 8, 2023

Hi @jrmadsen , thanks a lot for the quick reply. I'll give this a go when I get back to the office and let you know. For now I will mark this issue as closed.

@TomMelt TomMelt closed this as completed Apr 8, 2023
@TomMelt
Copy link
Author

TomMelt commented Apr 28, 2023

Hi @jrmadsen , I managed to get omnitrace installed on HPC with correct permissions. I have followed your instructions but I don't see anything in the trace (when I open in perfetto.ui).

I will include the command I run and the config.

omnitrace-instrument -o solver.inst -- ./bin/solver
omnitrace-run -- ./solver.inst 10 10000

The app is a simple openMP threaded application. Ideally I want to estimate the energy usage using the RAPL hw counter.

Am I doing something wrong?

below is my config:

  1 # auto-generated by omnitrace-avail (version 1.10.0) on 2023-04-28 @ 12:15
  2 
  3 OMNITRACE_CONFIG_FILE                              =
  4 OMNITRACE_USE_PERFETTO                             = true
  5 OMNITRACE_USE_TIMEMORY                             = true
  6 OMNITRACE_USE_SAMPLING                             = false
  7 OMNITRACE_USE_PROCESS_SAMPLING                     = true
  8 OMNITRACE_USE_KOKKOSP                              = false
  9 OMNITRACE_USE_CAUSAL                               = false
 10 OMNITRACE_USE_MPIP                                 = true
 11 OMNITRACE_USE_PID                                  = true
 12 OMNITRACE_USE_RCCLP                                = false
 13 OMNITRACE_OUTPUT_PATH                              = omnitrace-%tag%-output
 14 OMNITRACE_OUTPUT_PREFIX                            =
 15 OMNITRACE_CAUSAL_BACKEND                           = auto
 16 OMNITRACE_CAUSAL_BINARY_EXCLUDE                    =
 17 OMNITRACE_CAUSAL_BINARY_SCOPE                      = %MAIN%
 18 OMNITRACE_CAUSAL_DELAY                             = 0
 19 OMNITRACE_CAUSAL_DURATION                          = 0
 20 OMNITRACE_CAUSAL_FUNCTION_EXCLUDE                  =
 21 OMNITRACE_CAUSAL_FUNCTION_SCOPE                    =
 22 OMNITRACE_CAUSAL_MODE                              = function
 23 OMNITRACE_CAUSAL_RANDOM_SEED                       = 0
 24 OMNITRACE_CAUSAL_SOURCE_EXCLUDE                    =
 25 OMNITRACE_CAUSAL_SOURCE_SCOPE                      =
 26 OMNITRACE_CRITICAL_TRACE                           = false
 27 OMNITRACE_PAPI_EVENTS                              = amd64_rapl::RAPL_ENERGY_PKG
 28 OMNITRACE_PERFETTO_BACKEND                         = inprocess
 29 OMNITRACE_PERFETTO_BUFFER_SIZE_KB                  = 1024000
 30 OMNITRACE_PERFETTO_FILL_POLICY                     = discard
 31 OMNITRACE_PROCESS_SAMPLING_DURATION                = -1
 32 OMNITRACE_PROCESS_SAMPLING_FREQ                    = 0
 33 OMNITRACE_SAMPLING_CPUS                            = 1
 34 OMNITRACE_SAMPLING_DELAY                           = 0.5
 35 OMNITRACE_SAMPLING_DURATION                        = 0
 36 OMNITRACE_SAMPLING_FREQ                            = 300
 37 OMNITRACE_SAMPLING_OVERFLOW_EVENT                  = perf::PERF_COUNT_HW_CACHE_REFERENCES
 38 OMNITRACE_TIME_OUTPUT                              = true
 39 OMNITRACE_TIMEMORY_COMPONENTS                      = wall_clock
 40 OMNITRACE_TRACE_DELAY                              = 0
 41 OMNITRACE_TRACE_DURATION                           = 0
 42 OMNITRACE_TRACE_PERIOD_CLOCK_ID                    = CLOCK_REALTIME
 43 OMNITRACE_TRACE_PERIODS                            =
 44 OMNITRACE_VERBOSE                                  = 0
 45 OMNITRACE_ENABLED                                  = true
 46 OMNITRACE_SUPPRESS_CONFIG                          = false
 47 OMNITRACE_SUPPRESS_PARSING                         = false

@TomMelt TomMelt reopened this Apr 28, 2023
@jrmadsen
Copy link
Collaborator

Set the OMNITRACE_USE_SAMPLING = true and optionally increase/decrease the OMNITRACE_SAMPLING_FREQ

@TomMelt
Copy link
Author

TomMelt commented Apr 28, 2023

Thanks. Unfortunately I now get

omnitrace][305059] [timemory][papi] Warning!! Failure to add named event amd64_rapl::RAPL_ENERGY_PKG to event set 0 :: PAPI_error -1 : Invalid argument

@jrmadsen
Copy link
Collaborator

Is it showing up in omnitrace-avail -H?

@TomMelt
Copy link
Author

TomMelt commented Apr 28, 2023

yes (see output of omnitrace-avail -H -r RAPL below)

FYI, I found this link which suggests I need to specify the cpu number e.g., amd64_rapl::RAPL_ENERGY_PKG:cpu=0.

I tried and it runs without error but I don't have time to check if it's correct this evening. I will take a look tomorrow but ideally I want the whole processor not just one core.

$ omnitrace-avail -H -r RAPL
|-----------------------------------------|---------|-----------|----------------------------------------------------------------------|
|            HARDWARE COUNTER             | DEVICE  | AVAILABLE |                               SUMMARY                                |
|-----------------------------------------|---------|-----------|----------------------------------------------------------------------|
| amd64_rapl::RAPL_ENERGY_PKG             |   CPU   |   true    | Number of Joules consumed by all cores and Last level cache on the   |
|                                         |         |           |   package                                                            |
| amd64_rapl::RAPL_ENERGY_PKG:u=0         |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + monitor at user level                  |
| amd64_rapl::RAPL_ENERGY_PKG:k=0         |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + monitor at kernel level                |
| amd64_rapl::RAPL_ENERGY_PKG:period=0    |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + sampling period                        |
| amd64_rapl::RAPL_ENERGY_PKG:freq=0      |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + sampling frequency (Hz)                |
| amd64_rapl::RAPL_ENERGY_PKG:excl=0      |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + exclusive access                       |
| amd64_rapl::RAPL_ENERGY_PKG:mg=0        |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + monitor guest execution                |
| amd64_rapl::RAPL_ENERGY_PKG:mh=0        |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + monitor host execution                 |
| amd64_rapl::RAPL_ENERGY_PKG:cpu=0       |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + CPU to program                         |
| amd64_rapl::RAPL_ENERGY_PKG:pinned=0    |   CPU   |   true    | amd64_rapl::RAPL_ENERGY_PKG + pin event to counters                  |
|-----------------------------------------|---------|-----------|----------------------------------------------------------------------|

@jrmadsen
Copy link
Collaborator

Ah, yeah you may just have to specify all the CPUs if you have multiple CPUs, e.g. OMNITRACE_PAPI_EVENTS = amd64_rapl::RAPL_ENERGY_PKG:cpu=0 amd64_rapl::RAPL_ENERGY_PKG:cpu=1 (etc.) but I highly doubt the qualifier would be labeled "cpu" if it was actually per-core

@jrmadsen
Copy link
Collaborator

jrmadsen commented May 2, 2023

@TomMelt have you gotten a chance to verify that adding the :cpu=X qualifier provided the information you were seeking?

@TomMelt
Copy link
Author

TomMelt commented May 3, 2023

Hi @jrmadsen . It looks like it's similar to how omnitrace handles other CPU variables e.g., OMNITRACE_SAMPLING_CPUS is actually core level if I understand correctly and not at a socket level.

So I would need to use :cpu=0 ... :cpu=n etc. if I have multiple threads.

However the result I get in omnitrace is either wrong or doing something weird. Would it be easier if we arranged a teams/zoom call at some point? It might be easier to troubleshoot/discuss.

image

@TomMelt
Copy link
Author

TomMelt commented May 3, 2023

Ideally I don't need the trace over time of energy usage but just the final value. Similar to the armforge perf-report.

Are you able to get energy usage from a simple program?

@jrmadsen
Copy link
Collaborator

jrmadsen commented May 4, 2023

Hmmm... It's hard to tell if it is per core or not. Three of those bars look similar in magnitude when their samples are taken at overlapping timestamps -- those per-thread samples are taken with respect to the CPU-clock of the thread so it makes sense why they don't line up exactly.

I think for this particular use case, PAPI would ideally need to not initialize per-thread support and reading the counters should be done in the background "process sampling" thread instead of the per-thread interrupt sampler.

Before we hop on a call, let me experiment a bit with doing the above.

@TomMelt
Copy link
Author

TomMelt commented May 23, 2023

Hi @jrmadsen , did you have any luck?

@jrmadsen
Copy link
Collaborator

Sorry for the delay, I started a long vacation right around when you posted the last comment.

I haven’t gotten a chance yet but I’ll look into it shortly.

@jrmadsen jrmadsen added enhancement New feature or request process sampling Background system-level sampling performed in background thread PAPI Hardware counters via PAPI labels Jun 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request PAPI Hardware counters via PAPI process sampling Background system-level sampling performed in background thread
Projects
None yet
Development

No branches or pull requests

2 participants