Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variorium Connector: kernel granularity and JSON #265

Open
wants to merge 17 commits into
base: develop
Choose a base branch
from

Conversation

twilk10
Copy link

@twilk10 twilk10 commented Jul 19, 2024

Updates to the current connector with updates such as json updates, energy estimation, device information. It has also been altered to get power information at the kernel level.

@masterleinad
Copy link
Contributor

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

Copy link
Contributor

@masterleinad masterleinad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please provide sample output.

Comment on lines 22 to 35
#include <inttypes.h>
#include <chrono>
#include <cstdlib>
#include <cstring>
#include <vector>
#include <unordered_set>
#include <string>
#include <regex>
#include <ctime>
#include <cxxabi.h>
#include <dlfcn.h>
#include <ctime>
#include <chrono>
#include <iostream>
#include <fstream>

#include <inttypes.h>
#include <iostream>
#include <regex>
#include <stdio.h>
#include <string>
#include <unordered_set>
#include <vector>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which of these header files are actually required? Please make sure to only use what you need.

Comment on lines 36 to 45
int power = -1;
int power1 = 0;
int power2 = 0;
int filemake = -1;
long long time1 = 0;
long long time2 = 0;
uint32_t gdevID = -1;
std::vector<float> gpu_powers;
std::vector<float> gpu_powers2;
std::string output = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are all these global variables used for? Please add comments in the code explaining their usage.

Comment on lines 86 to 90
int type_of_profiling =
0; // 0 is for both print power & json, 1 is for print power, 2 is for json
bool usingMPI = false;
bool verbosePrint = false;
bool mpiOutPut = false;
0; // 0 is for both print power & json, 1 is for print power, 2 is for json
bool usingMPI = false;
bool verbosePrint = true;
bool mpiOutPut = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all of these still used/necessary?

Comment on lines 92 to 99
/*void printFile() {

std::ofstream file("variorumoutput.txt", std::ios::app); // Open in append
mode if (!file) { std::cerr << "Error creating the file!" << std::endl; } else {
std::cout << "File created or opened successfully." << std::endl;
}
file.close(); // Close the file
i}*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove unused code.

file.close(); // Close the file
i}*/
void printFile() {
const std::string filename = "variorumoutput.txt";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make this configurable via an environment variable.

Comment on lines 314 to 315
// std::cout << "Number of Sockets: " << num_sockets <<
// std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// std::cout << "Number of Sockets: " << num_sockets <<
// std::endl;

Comment on lines 300 to 313
char *num_sockets_pos = strstr(s, "\"num_gpus_per_socket\":");
if (num_sockets_pos != nullptr) {
num_sockets_pos += strlen("\"num_gpus_per_socket\":");

char *num_sockets_end_pos = strchr(num_sockets_pos, ',');
if (num_sockets_end_pos == nullptr) {
num_sockets_end_pos = strchr(num_sockets_pos, '}');
}
if (num_sockets_end_pos != nullptr) {

std::string num_sockets_str(num_sockets_pos,
num_sockets_end_pos - num_sockets_pos);

num_sockets = std::stoll(num_sockets_str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try using jansson here as well (and below).

Comment on lines 370 to 375
std::cout << " Gpu start power: " << gpu_powers2[gdevID]
<< " Gpu end power: " << gpu_powers[gdevID]
<< " Energy Estimation: "
<< ((gpu_powers[gdevID] + gpu_powers2[0]) / 2) *
((temp - time1) * .001)
<< std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just not print anything to stdout but only to the file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds fine to me.

Comment on lines 512 to 527
/* std::cout << "kokkos library call\n" << std::endl;
if (usingMPI) {
variorum_call_mpi();
} else {
variorum_call();
}
time_t total_time;
time_t end_time;
time(&end_time);
std::cout << "End Time: " << end_time << "\nStart Time: " << start_time
<< "\n";
total_time = end_time - start_time;

std::cout << "The kokkos library was alive for " << total_time << "
seconds."
<< std::endl;*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/* std::cout << "kokkos library call\n" << std::endl;
if (usingMPI) {
variorum_call_mpi();
} else {
variorum_call();
}
time_t total_time;
time_t end_time;
time(&end_time);
std::cout << "End Time: " << end_time << "\nStart Time: " << start_time
<< "\n";
total_time = end_time - start_time;
std::cout << "The kokkos library was alive for " << total_time << "
seconds."
<< std::endl;*/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@twilk10 Is this suggestion OK with you? If so, can you commit it?

It is good with me.

Comment on lines 580 to 593
std::ostream &operator<<(
std::ostream &os,
const Kokkos::Tools::Experimental::ExecutionSpaceIdentifier &identifier) {

os << " Type: " << identifier.type << "";
os << " Device ID: " << identifier.device_id << "";
gdevID = identifier.device_id;
os << " Instance ID: " << identifier.instance_id;
// output += " Device ID: " + std::to_string(identifier.device_id) +
// " Instance ID: " +
// std::to_string(identifier.instance_id);
writeToFile("variorumoutput.txt", output);
return os;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't overload operator<< but just write to the output file directly.

@masterleinad
Copy link
Contributor

Please use clang-format 8 for the indentation.

@vlkale
Copy link
Contributor

vlkale commented Jul 19, 2024

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1

@twilk10 Thanks for submitting this PR!

To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

@twilk10
Copy link
Author

twilk10 commented Jul 19, 2024

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1

@twilk10 Thanks for submitting this PR!

To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

@vlkale
Copy link
Contributor

vlkale commented Jul 20, 2024

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1
@twilk10 Thanks for submitting this PR!
To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1
@twilk10 Thanks for submitting this PR!
To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

Thanks! Be sure to run clang-format-8 on each code file that you changed in this PR, as @masterleinad mentions in one of his comments. This is a prereq for your PR to be merged. Note that this is the CI check that doesn't pass.

@twilk10
Copy link
Author

twilk10 commented Jul 20, 2024

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1
@twilk10 Thanks for submitting this PR!
To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1
@twilk10 Thanks for submitting this PR!
To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

Thanks! Be sure to run clang-format-8 on each code file that you changed in this PR, as @masterleinad mentions in one of his comments. This is a prereq for your PR to be merged. Note that this is the CI check that doesn't pass.

I ran clang-format 8 on it I will have to evaluate what went wrong and try it again.

@vlkale
Copy link
Contributor

vlkale commented Jul 23, 2024

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1
@twilk10 Thanks for submitting this PR!
To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1
@twilk10 Thanks for submitting this PR!
To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

Thanks! Be sure to run clang-format-8 on each code file that you changed in this PR, as @masterleinad mentions in one of his comments. This is a prereq for your PR to be merged. Note that this is the CI check that doesn't pass.

I ran clang-format 8 on it I will have to evaluate what went wrong and try it again.

@twilk10 OK. You want to do something like:

clang-format-8 -style=file --assume-filename=../../.clang-format ../../profiling/variorium-connector/kp_variorium_connector.cpp > ../../profiling/variorium-connector/kp_variorium_connector-temp.cpp

You can diff in between to see whether and/or how the file has indeed changed and then mv the temporary file to the original file.

If that doesn't work, can you send the error output with the flag -Werror ?

@vlkale
Copy link
Contributor

vlkale commented Jul 23, 2024

@twilk10 Can you look at @masterleinad 's comments particularly on printing output and parsing json with jansson? I think they are good suggestions and maybe you can commit those before committing other changes with clang-format.

Also, make sure to note to not include any unneeded headers.

@twilk10
Copy link
Author

twilk10 commented Jul 23, 2024

@twilk10 Can you look at @masterleinad 's comments particularly on printing output and parsing json with jansson? I think they are good suggestions and maybe you can commit those before committing other changes with clang-format.

Also, make sure to note to not include any unneeded headers.

I will take a look and try updating withthe help of these suggestions.

@twilk10
Copy link
Author

twilk10 commented Jul 23, 2024

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1
@twilk10 Thanks for submitting this PR!
To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1
@twilk10 Thanks for submitting this PR!
To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

Thanks! Be sure to run clang-format-8 on each code file that you changed in this PR, as @masterleinad mentions in one of his comments. This is a prereq for your PR to be merged. Note that this is the CI check that doesn't pass.

I ran clang-format 8 on it I will have to evaluate what went wrong and try it again.

@twilk10 OK. You want to do something like:

clang-format-8 -style=file --assume-filename=../../.clang-format ../../profiling/variorium-connector/kp_variorium_connector.cpp > ../../profiling/variorium-connector/kp_variorium_connector-temp.cpp

You can diff in between to see whether and/or how the file has indeed changed and then mv the temporary file to the original file.

If that doesn't work, can you send the error output with the flag -Werror ?

I am having trouble getting access to clang-format-8 but I can get other versions.

@twilk10
Copy link
Author

twilk10 commented Jul 23, 2024

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1
@twilk10 Thanks for submitting this PR!
To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

Please explain the behavior of the connector tool including options. What is the minimum variorum version required? We should check for that when trying to find Variorum via CMake.

+1
@twilk10 Thanks for submitting this PR!
To provide an explanation of this connector tool’s options and a high-level behavior from the Kokkos user perspective: maybe you can add kokkosp_print_help() (like what is in space-time-stack).

Thank you I will work on getting that information up as soon as possible.

Thanks! Be sure to run clang-format-8 on each code file that you changed in this PR, as @masterleinad mentions in one of his comments. This is a prereq for your PR to be merged. Note that this is the CI check that doesn't pass.

I ran clang-format 8 on it I will have to evaluate what went wrong and try it again.

@twilk10 OK. You want to do something like:
clang-format-8 -style=file --assume-filename=../../.clang-format ../../profiling/variorium-connector/kp_variorium_connector.cpp > ../../profiling/variorium-connector/kp_variorium_connector-temp.cpp
You can diff in between to see whether and/or how the file has indeed changed and then mv the temporary file to the original file.
If that doesn't work, can you send the error output with the flag -Werror ?

I am having trouble getting access to clang-format-8 but I can get other versions.

I have solved this issue.

@vlkale vlkale changed the title Varconnect Variorium Connector: kernel granularity and JSON Jul 23, 2024
@vlkale
Copy link
Contributor

vlkale commented Jul 23, 2024

Great! All checks have passed for clang-format-8. Can you address the comments by @masterleinad ? (You can commit suggestions if you agree).

I also updated the title to capture the changes you made in your code better.

@masterleinad Are there any other things to resolve here?

@twilk10
Copy link
Author

twilk10 commented Jul 23, 2024

Great! All checks have passed for clang-format-8. Can you address the comments by @masterleinad ? (You can commit suggestions if you agree).

I also updated the title to capture the changes you made in your code better.

@masterleinad Are there any other things to resolve here?

I am currently working on updating the connector to address the comments made by @masterleinad

@@ -1,4 +1,5 @@
//@HEADER
//@HEADER140
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//@HEADER140
//@HEADER

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still have HEADER140 here.

Comment on lines 529 to 575
std::ostream &operator<<(
std::ostream &os,
const Kokkos::Tools::Experimental::DeviceType &deviceType) {
switch (deviceType) {
case Kokkos::Tools::Experimental::DeviceType::Serial:
os << "CPU";
output += "Type: CPU";
break;
case Kokkos::Tools::Experimental::DeviceType::OpenMP:
os << "OpenMP";
output += "Type: OpenMP";
break;
case Kokkos::Tools::Experimental::DeviceType::Cuda:
os << "cuda";
output += "Type: CUDA";
break;
case Kokkos::Tools::Experimental::DeviceType::HIP:
os << "hip";
output += "Type: HIP";
break;
case Kokkos::Tools::Experimental::DeviceType::OpenMPTarget:
os << "openmptarget";
output += "Type: openmptarget";
break;
case Kokkos::Tools::Experimental::DeviceType::HPX:
os << "hpx";
output += "Type: hpx";
break;
case Kokkos::Tools::Experimental::DeviceType::Threads:
os << "threads";
output += "Type: threads";
break;
case Kokkos::Tools::Experimental::DeviceType::SYCL:
os << "sycl";
output += "Type: SYCL";
break;
case Kokkos::Tools::Experimental::DeviceType::OpenACC:
os << "openacc";
output += "Type: OPENACC";
break;

default:
os << "Unknown Device Type";
output += "Type: Uknown Device Type";
break;
}
time_t total_time;
time_t end_time;
time(&end_time);
std::cout << "End Time: " << end_time << "\nStart Time: " << start_time
<< "\n";
total_time = end_time - start_time;

std::cout << "The kokkos library was alive for " << total_time << " seconds."
<< std::endl;
return os;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define a function such as

Space get_space(SpaceHandle const& handle) {
// check that name starts with "Cuda"
if (strncmp(handle.name, "Cuda", 4) == 0) return SPACE_CUDA;
// check that name starts with "SYCL"
if (strncmp(handle.name, "SYCL", 4) == 0) return SPACE_SYCL;
// check that name starts with "OpenMPTarget"
if (strncmp(handle.name, "OpenMPTarget", 12) == 0) return SPACE_OMPT;
// check that name starts with "HIP"
if (strncmp(handle.name, "HIP", 3) == 0) return SPACE_HIP;
if (strcmp(handle.name, "Host") == 0) return SPACE_HOST;
abort();
return SPACE_HOST;
}
instead. We don't want to make Kokkos::Tools::Experimental::DeviceType implicitly printable.

@twilk10
Copy link
Author

twilk10 commented Aug 6, 2024

This is an update to the Kokkos-tool variorum connector and can be used to get the power data an
Tested on incremental HIP test on exascale machine using AMD style architecture.
The idea of change is to make this tool usable and change the functionality to retrieve power information at the kernel level as that is a more modern use for the tool.
This is a step in the direction of allowing users to measure their energy consumption while using Kokkos.
Features to be updated
-updated json parsing
-updated what is to be printed out
Features that were updated in variorum but others
-API to get power
Features implemented
-Equation to estimate energy usage until energy API is updated
-Moving the capture of data to the hooks so that the power data being retrieved is from inside the kernel
Outputs include device Id device type device kernel and power alongside power estimation
Also Outputs to a file that is made on the system. User can provide an environment variable to be used by uncommenting code but if not variorum will make a file for the user to put the information in.

Compatible with variorum version 6.0
Sample output;
name: Z4mainEUllRlE_ Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation 2679.104000

@Rombur
Copy link
Member

Rombur commented Aug 6, 2024

You pushed your build directory. You need to remove it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this file.

@@ -1,4 +1,5 @@
//@HEADER
//@HEADER140
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ping

Comment on lines 30 to 31
//#include <cxxabi.h>
//#include <dlfcn.h>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//#include <cxxabi.h>
//#include <dlfcn.h>

If you don't need it, remove it.

if (gdevID == 0) {
switch (gdevID) {
case 1: {
json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_1");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_1");
json_t* gpu_value = json_object_get(power_gpu_watts, "GPU_" + std::to_string(gdevID));

and remove all the switch cases.

Comment on lines 382 to 383
output = " Energy Estimation " + std::to_string(((power1 + power2) / 2) *
((temp - time1) * .001));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the unit?

}
} catch (int e) {
std::cout << "No MPI Option provided, not using per rank output"
<< std::endl;
usingMPI = false;
// usingMPI = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// usingMPI = false;

Comment on lines 508 to 522
std::ostream& operator<<(
std::ostream& os,
const Kokkos::Tools::Experimental::ExecutionSpaceIdentifier& identifier) {
gdevID = identifier.device_id;

output +=
" Device ID: " + std::to_string(identifier.device_id) +
" Instance ID: " + std::to_string(identifier.instance_id) +
" DeviceType: " +
deviceTypeToString(static_cast<Kokkos::Tools::Experimental::DeviceType>(
identifier.device_id));

writeToFile(filename, output);
return os;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't introduce an overload for operator<< but use a stringstream or write to the file immediately instead.

writeToFile(filename, "name: " + std::string(name));
gdevID = devID;
auto result = Kokkos::Tools::Experimental::identifier_from_devid(devID);
std::cout << result;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't print to cout anywhere but only to the output file.

@masterleinad
Copy link
Contributor

Sample output; name: Z4mainEUllRlE_ Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation 2679.104000

Please provide output from the incremental tests or so.

Comment on lines 34 to 41
// The variable below assist with logic and serve as a way to save measurements
// for calaculation
int power = -1;
int power1 = 0;
int power2 = 0;
int filemake = -1;
long long time1 = 0;
long long time2 = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you be more precise?

Comment on lines 42 to 43
// This is a global variable instituted to track what the device ID is
uint32_t gdevID = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't need a global variable for that.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for catching that.

bool verbosePrint = false;
bool mpiOutPut = false;

bool verbosePrint = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you still need this? What does it do?

}

void printFile() {
// std::string filename = getFile("KOKKOS_TOOLS_VARIORUM_OUTPUT_FILE");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this commented?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To give the option to users to use environment variable if they chose to or have the file variorumoutput.txt put directly into their directory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right but why is it not active? If you want to fall back to "variorumoutput.txt", you should not error out in getFile but rather use that if the environment variable couldn't be parsed correctly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good idea

@@ -89,19 +130,19 @@ bool mpiOutPut = false;
// value.
std::string variorum_print_power_call() {
std::string outputString;
json_t* power_obj = json_object();
char* s = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
char* s = NULL;
char* s = nullptr;

Comment on lines 211 to 215
json_t* root = NULL;
json_t* socket_0 = NULL;
json_t* timestamp_value = NULL;
json_t* power_gpu_watts = NULL;
json_t* gpu_0_value = NULL;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
json_t* root = NULL;
json_t* socket_0 = NULL;
json_t* timestamp_value = NULL;
json_t* power_gpu_watts = NULL;
json_t* gpu_0_value = NULL;
json_t* root = nullptr;
json_t* socket_0 = nullptr;
json_t* timestamp_value = nullptr;
json_t* power_gpu_watts = nullptr;
json_t* gpu_0_value = nullptr;

fprintf(stderr, "Expected 'power_gpu_watts' to be an object.\n");
}

std::cout << "Device ID: " << gdevID << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::cout << "Device ID: " << gdevID << std::endl;

profiling/variorum-connector/variorum-connector.cpp Outdated Show resolved Hide resolved
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revert deleting this line.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which line are you referring to.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines 34 to 36
int power = -1; // This variable is used to keep track of if variorum call has
// been called twice to check if the variables need to be reset
// for the next kernel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a bool be better?

Comment on lines 41 to 42
int filemake = -1; // This variable is to check if the file has already been
// made earlier in the program
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't a bool make more sense?

Comment on lines 37 to 40
double power1 =
0; // This variable is used to obtain the initial power for calculation
double power2 =
0; // This variable is used to obtain the final power for calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a double array.

Comment on lines 43 to 46
long long time1 =
0; // This variable is used to obtain the initial time for calculation
long long time2 =
0; // This variable is used to obtain the final time for calculation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be a double array.

Comment on lines 99 to 109
inline std::string getFile(const char* env_var_name) {
char* parsed_output_file = getenv(env_var_name);
if (!parsed_output_file) {
std::cerr << "Couldn't parse KOKKOS_TOOLS_VARIORUM_OUTPUT_FILE environment "
"variable!\n";
std::abort();
"variable! Printed to variorumoutput.txt\n";
// parsed_output_file = "variorumoutput.txt";
char vararr[19] = "variorumoutput.txt";
parsed_output_file = vararr;
}
return std::string(parsed_output_file);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be sufficient to call this when Kokkos is initialized?

Comment on lines 104 to 106
// parsed_output_file = "variorumoutput.txt";
char vararr[19] = "variorumoutput.txt";
parsed_output_file = vararr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// parsed_output_file = "variorumoutput.txt";
char vararr[19] = "variorumoutput.txt";
parsed_output_file = vararr;
return "variorumoutput.txt";

@@ -140,7 +149,7 @@ std::string variorum_print_power_call() {
power_node = json_real_value(json_object_get(power_obj, "power_node"));
const char* hostnameChar =
json_string_value(json_object_get(power_obj, "hostname"));

}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks suspicious.

std::cout << "No MPI Option provided, not using per rank output"
<< std::endl;

}
// Simple timer code to keep track of the general amount of time the
// application ran for.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is any of this still used?

@twilk10
Copy link
Author

twilk10 commented Aug 9, 2024

Current out put with Incremental test

name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 270.228000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 232.134000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 233.646000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 235.074000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 233.436000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 232.008000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 234.276000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 231.882000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 232.050000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 232.176000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 231.462000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 233.394000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 234.108000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 234.528000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 231.840000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 231.462000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 231.378000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 233.520000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 233.646000
name: Reduction Device ID: 0 Instance ID: 1 DeviceType: SERIAL Energy Estimation in Joules 231.042000

Comment on lines +30 to +35
// Initial and final power values for a kernel
double global_power[2] = {0, 0};
// Initial and final time value for a kernel
long long global_time[2] = {0, 0};
uint32_t global_device_id = -1;
uint32_t global_instance_id = -1;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Obvisously, this tool is not thread-safe.

@masterleinad masterleinad dismissed their stale review October 14, 2024 18:58

Fixed issues myself.

uint32_t global_instance_id = -1;
std::string global_filename;
std::string global_kernel_name;
std::string global_device_type;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need so many global variables?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to record a bunch of information in the "begin kernel" calls that we are using in the "end kernel" calls. We can't pass that through the API interfaces directly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants