-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[REVIEW] Add debug/info/trace logging and fix multithreaded replay bench #529
Conversation
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
Please update the changelog in order to start CI tests. View the gpuCI docs here. |
benchmarks/replay/replay.cpp
Outdated
* | ||
* Does not copy the mutex or the map | ||
*/ | ||
replay_benchmark(replay_benchmark const& other) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need a copy ctor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes because of the mutex. Won't compile without it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Where is replay_benchmark
being copied that it needs a copy ctor?. std::mutex
is neither copyable or movable, so I understand why the default copy ctor would fail to compile if used, but I wouldn't think replay_benchmark
should ever need to be copied. Instead I'd explicitly delete replay_benchmarks
copy and move ctors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template <class Lam> benchmark::RegisterBenchmark(const char*, Lam&&)
creates an instance of this class, which invokes the copy ctor of the Lam
that is passed to it.
#ifdef BENCHMARK_HAS_CXX11
template <class Lambda>
class LambdaBenchmark : public Benchmark {
public:
virtual void Run(State& st) { lambda_(st); }
private:
template <class OLambda>
LambdaBenchmark(const char* name, OLambda&& lam)
: Benchmark(name), lambda_(std::forward<OLambda>(lam)) {}
LambdaBenchmark(LambdaBenchmark const&) = delete;
private:
template <class Lam>
friend Benchmark* ::benchmark::RegisterBenchmark(const char*, Lam&&);
Lambda lambda_;
};
#endif
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, it invokes the move ctor. I have now replaced this copy ctor with a move ctor and explicitly deleted the copy ctor.
benchmarks/replay/replay.cpp
Outdated
void SetUp(const ::benchmark::State& state) override | ||
{ | ||
if (state.thread_index == 0) { | ||
rmm::logger().log(spdlog::level::info, "------ Start of Benchmark -----"); | ||
mr_ = factory_(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just put this logic in the replay_benchmark
ctor? That way you don't have to do the check against the thread_index
. Only one instance of the fixture object is constructed and shared by all threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the replay_benchmarks ctor is called when you register the benchmark, and the dtor is not called until after ALL the benchmarks have run. So you end up constructing all the MRs you are benchmarking first, then running the benchmarks, then destroying them. If you have a pool_mr and a binning_mr, then you effectively use up all the GPU memory before any benchmark runs, and the result is OOM during the first benchmark of one of those MRs. This is the whole reason for all the shenanigans with the fixture.
Google benchmark is not well designed for this case. It took me the better part of a day of reading their undocumented code to figure out a workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also each benchmark may be run multiple times, and you don't want to reuse the pool (or time creation of the pool) across runs.
Thanks @jrhemstad I addressed all of your feedback (except the benchmark construction timing). |
Please do not merge. There appears to be a problem with the |
This PR introduces a libcudf build error that I have root-caused to So I think I will need to patch the file (2-character fix) locally as part of the present PR rather than wait... Update: patch added. |
Turns out there's no bug in |
include/rmm/detail/logger.hpp
Outdated
default_log_filename(), true // truncate file | ||
)} | ||
{ | ||
logger_.set_pattern("[%l][%t][%H:%M:%S:%f] %v"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought: using a different format, we could sort lines in the log file to effectively group by thread id, which might be useful? This has the additional advantage of visually aligning the timestamp and thread id (the log level string is variable width).
[23:59:58:056789][1232][debug]
logger_.set_pattern("[%l][%t][%H:%M:%S:%f] %v"); | |
logger_.set_pattern("[%t][%H:%M:%S:%f][%l] %v"); |
edit: updated log pattern
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you can't sort by thread ID with the existing format?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, my suggestion is formatted incorrectly. The format I had in mind is:
[%t][%H:%M:%S:%f][%l] %v
It would be nice to open a log file in a text editor and "sort by lines ascending" to group the logs by thread. For that to work, the thread id would need to be printed first. The second part would need to be the timestamp, otherwise the logs would get out of order, even within a specific thread id. Therefore, the log level would need to be last. Since everything prior to the log level is fixed width, you also end up with a nice visual alignment.
[1232][23:59:58:056789][debug]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my experience grouping by thread has not been useful. You need to see what is happening in time order due to multiple threads using a resource. That said, I guess this change won't hurt anything. I'll fix the visual alignment with an explicit width for the level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
include/rmm/mr/device/detail/stream_ordered_memory_resource.hpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
fyi, the log format suggestion was just an idea. I don't know how useful it would be in practice. 👍
Thanks for the review! |
This PR adds debug logging using spdlog so that it is accessible from anywhere in RMM, and provides convenience macros for various levels of logging.
Adds CMake configuration to set the default logging level OFF, and allows specifying spdlog levels
using
-DLOGGING_LEVEL=<level>
on the cmake command line, wherelevel
is one of [TRACE, INFO, DEBUG, WARNING, ERROR, CRITICAL].Changing the level changes which level of logging messages are compiled in, but since spdlog's
default logging level is INFO, if you need lower-level logging, then the application must also
set spdlog's runtime logging level using
rmm::logger().set_level()
(e.g. to spdlog::level::trace).By default the log goes to a file
rmm_log.txt
unless a filename is specified in the environment variableRMM_DEBUG_LOG_FILENAME
.Also replaces calls to
assert
with a new macroRMM_LOGGING_ASSERT
which logs the reason for the assertion (only active in debug build, just like assert).This PR also fixes replay of multithreaded logs to ensure the logs are replayed in the actual order they occured to prevent deallocations preceding their corresponding allocations.
Fixes #431.
TODO: Add logging in more memory resources.We can add further logging in follow up PRs to keep this one manageable.CC @jlowe @rongou @abellina