[REVIEW] Add debug/info/trace logging and fix multithreaded replay bench #529

harrism · 2020-08-31T04:45:15Z

This PR adds debug logging using spdlog so that it is accessible from anywhere in RMM, and provides convenience macros for various levels of logging.

Adds CMake configuration to set the default logging level OFF, and allows specifying spdlog levels
using -DLOGGING_LEVEL=<level> on the cmake command line, where level is one of [TRACE, INFO, DEBUG, WARNING, ERROR, CRITICAL].

Changing the level changes which level of logging messages are compiled in, but since spdlog's
default logging level is INFO, if you need lower-level logging, then the application must also
set spdlog's runtime logging level using rmm::logger().set_level() (e.g. to spdlog::level::trace).

By default the log goes to a file rmm_log.txt unless a filename is specified in the environment variable RMM_DEBUG_LOG_FILENAME.

Also replaces calls to assert with a new macro RMM_LOGGING_ASSERT which logs the reason for the assertion (only active in debug build, just like assert).

This PR also fixes replay of multithreaded logs to ensure the logs are replayed in the actual order they occured to prevent deallocations preceding their corresponding allocations.
Fixes #431.

~~TODO: Add logging in more memory resources.~~ We can add further logging in follow up PRs to keep this one manageable.

CC @jlowe @rongou @abellina

GPUtester · 2020-08-31T04:45:43Z

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

GPUtester · 2020-08-31T04:58:01Z

Please update the changelog in order to start CI tests.

View the gpuCI docs here.

include/rmm/detail/error.hpp

include/rmm/detail/logger.hpp

include/rmm/mr/device/pool_memory_resource.hpp

benchmarks/replay/replay.cpp

jrhemstad · 2020-08-31T14:34:40Z

benchmarks/replay/replay.cpp

+   *
+   * Does not copy the mutex or the map
+   */
+  replay_benchmark(replay_benchmark const& other)


Does this need a copy ctor?

Yes because of the mutex. Won't compile without it.

I don't understand. Where is replay_benchmark being copied that it needs a copy ctor?. std::mutex is neither copyable or movable, so I understand why the default copy ctor would fail to compile if used, but I wouldn't think replay_benchmark should ever need to be copied. Instead I'd explicitly delete replay_benchmarks copy and move ctors.

template <class Lam> benchmark::RegisterBenchmark(const char*, Lam&&) creates an instance of this class, which invokes the copy ctor of the Lam that is passed to it.

#ifdef BENCHMARK_HAS_CXX11 template <class Lambda> class LambdaBenchmark : public Benchmark { public: virtual void Run(State& st) { lambda_(st); } private: template <class OLambda> LambdaBenchmark(const char* name, OLambda&& lam) : Benchmark(name), lambda_(std::forward<OLambda>(lam)) {} LambdaBenchmark(LambdaBenchmark const&) = delete; private: template <class Lam> friend Benchmark* ::benchmark::RegisterBenchmark(const char*, Lam&&); Lambda lambda_; }; #endif

https://github.com/google/benchmark/blob/4475ff6b8a7a4077d7492b76ef5278a3dc53a2e4/include/benchmark/benchmark.h#L1017-L1036

My bad, it invokes the move ctor. I have now replaced this copy ctor with a move ctor and explicitly deleted the copy ctor.

benchmarks/replay/replay.cpp

jrhemstad · 2020-08-31T14:44:10Z

benchmarks/replay/replay.cpp

+  void SetUp(const ::benchmark::State& state) override
+  {
+    if (state.thread_index == 0) {
+      rmm::logger().log(spdlog::level::info, "------ Start of Benchmark -----");
+      mr_ = factory_();
+    }


Why not just put this logic in the replay_benchmark ctor? That way you don't have to do the check against the thread_index. Only one instance of the fixture object is constructed and shared by all threads.

The problem is the replay_benchmarks ctor is called when you register the benchmark, and the dtor is not called until after ALL the benchmarks have run. So you end up constructing all the MRs you are benchmarking first, then running the benchmarks, then destroying them. If you have a pool_mr and a binning_mr, then you effectively use up all the GPU memory before any benchmark runs, and the result is OOM during the first benchmark of one of those MRs. This is the whole reason for all the shenanigans with the fixture.

Google benchmark is not well designed for this case. It took me the better part of a day of reading their undocumented code to figure out a workaround.

Also each benchmark may be run multiple times, and you don't want to reuse the pool (or time creation of the pool) across runs.

benchmarks/replay/replay.cpp

harrism · 2020-09-01T04:05:08Z

Thanks @jrhemstad I addressed all of your feedback (except the benchmark construction timing).

benchmarks/replay/replay.cpp

include/rmm/mr/device/pool_memory_resource.hpp

harrism · 2020-09-01T22:30:45Z

Please do not merge. There appears to be a problem with the fmt lib included in spdlog and libcudf compilation.

harrism · 2020-09-02T02:19:08Z

This PR introduces a libcudf build error that I have root-caused to fmt lib: fmtlib/fmt#1850. I have a fix: https://github.com/fmtlib/fmt/pr/1852 but that needs to get incorporated into spdlog: gabime/spdlog#1661

So I think I will need to patch the file (2-character fix) locally as part of the present PR rather than wait...

Update: patch added.

harrism · 2020-09-04T03:37:39Z

Turns out there's no bug in fmt, it's in our copy of libcudacxx (already fixed in mainline libcudacxx included in CUDA toolkit). I put in a better (simpler) workaround into this PR, which can be removed once libcudacxx is available on github so we can depend on it.

include/rmm/mr/device/logging_resource_adaptor.hpp

include/rmm/detail/logger.hpp

cwharris · 2020-09-04T03:58:35Z

include/rmm/detail/logger.hpp

+                default_log_filename(), true  // truncate file
+                )}
+  {
+    logger_.set_pattern("[%l][%t][%H:%M:%S:%f] %v");


Just a thought: using a different format, we could sort lines in the log file to effectively group by thread id, which might be useful? This has the additional advantage of visually aligning the timestamp and thread id (the log level string is variable width).

[23:59:58:056789][1232][debug]

Suggested change

logger_.set_pattern("[%l][%t][%H:%M:%S:%f] %v");

logger_.set_pattern("[%t][%H:%M:%S:%f][%l] %v");

edit: updated log pattern

I don't understand why you can't sort by thread ID with the existing format?

First, my suggestion is formatted incorrectly. The format I had in mind is:

[%t][%H:%M:%S:%f][%l] %v

It would be nice to open a log file in a text editor and "sort by lines ascending" to group the logs by thread. For that to work, the thread id would need to be printed first. The second part would need to be the timestamp, otherwise the logs would get out of order, even within a specific thread id. Therefore, the log level would need to be last. Since everything prior to the log level is fixed width, you also end up with a nice visual alignment.

[1232][23:59:58:056789][debug]

In my experience grouping by thread has not been useful. You need to see what is happening in time order due to multiple threads using a resource. That said, I guess this change won't hurt anything. I'll fix the visual alignment with an explicit width for the level.

benchmarks/utilities/log_parser.hpp

CMakeLists.txt

include/rmm/mr/device/detail/stream_ordered_memory_resource.hpp

include/rmm/detail/error.hpp

cwharris

lgtm.

fyi, the log format suggestion was just an idea. I don't know how useful it would be in practice. 👍

harrism · 2020-09-04T05:26:42Z

lgtm.

fyi, the log format suggestion was just an idea. I don't know how useful it would be in practice. +1

Thanks for the review!

harrism added 10 commits August 26, 2020 12:55

Add global logger

cda6d23

Add some logging messages

6c38173

device_uvector_tests can be a .cpp file

b9a8047

Add cmake options to set logging level

652b46b

Fix replay of multithreaded logs

4362d0b

Add log levels and RMM_LOGGING_ASSERT

1e92faf

Add some tracing, clean up logging, use logging assert

e01d0a3

Add log_summary_trace

3012e0f

Fix log_summary_trace call

e5d7b84

Log begin and end of benchmark

990b99a

harrism added the 2 - In Progress Currently a work in progress label Aug 31, 2020

harrism requested a review from a team as a code owner August 31, 2020 04:45

harrism self-assigned this Aug 31, 2020

Change minimum flush level to warn.

f02aa74

harrism added 4 commits August 31, 2020 14:58

Changelog for rapidsai#529

5ba721c

Add -resource command line option to replay benchmark

0163d92

Log init message shows whether PTDS is enabled.

021d460

Better pool OOM logging

72bdc82

jrhemstad requested changes Aug 31, 2020

View reviewed changes

harrism added 7 commits September 1, 2020 13:23

Fix std::array sizes

c3a4b56

Fix LOGGING_ASSERT

796a15c

Fix race condition in rmm::logger()

42500eb

Don't throw a copy.

919d1fd

Benchmark doesn't need to be a fixture

7f154e1

Use condition_variable rather than sleep wait

762dbb8

Merge branch 'branch-0.16' into fea-debug-logging

0ed380e

harrism mentioned this pull request Sep 1, 2020

[REVIEW] Merge free lists in pool_memory_resource to defragment before growing from upstream #532

Merged

harrism changed the title ~~[WIP] Add debug/info/trace logging and fix multithreaded replay bench~~ [REVIEW] Add debug/info/trace logging and fix multithreaded replay bench Sep 1, 2020

harrism added 3 - Ready for review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 1, 2020

jrhemstad requested changes Sep 1, 2020

View reviewed changes

benchmarks/replay/replay.cpp Outdated Show resolved Hide resolved

include/rmm/mr/device/pool_memory_resource.hpp Outdated Show resolved Hide resolved

harrism added the 5 - Merge After Dependencies Depends on another PR: do not merge out of order label Sep 1, 2020

harrism added 6 commits September 2, 2020 13:46

Add spdlog / fmt patch

cd20154

Remove unused std::chrono::literals

1efc3b4

Add exception message to log

68bb92f

Delete copy ctor, add move ctor

f1f1157

Workaround our libcudacxx version setting _LIBCPP_VERSION

7b54d96

Remove unnecessary fmt patch

5808325

jrhemstad approved these changes Sep 4, 2020

View reviewed changes

cwharris suggested changes Sep 4, 2020

View reviewed changes

harrism added 4 commits September 4, 2020 14:53

Fix parse_time

a5b8532

Indentation

5dc3f25

Brackets, white space

e7c2fc9

Documentation

a4a38cc

cwharris approved these changes Sep 4, 2020

View reviewed changes

harrism added 4 commits September 4, 2020 15:16

Typo

6c30bdc

logger.hpp should not be a detail header; better pattern

abe9955

Fix logger path

dccbb21

Fix example for logger

f5a60c7

harrism merged commit cb27176 into rapidsai:branch-0.16 Sep 4, 2020

abellina mentioned this pull request Sep 11, 2020

[BUG] Continue debugging OOM after ensuring device store is empty NVIDIA/spark-rapids#570

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[REVIEW] Add debug/info/trace logging and fix multithreaded replay bench #529

[REVIEW] Add debug/info/trace logging and fix multithreaded replay bench #529

harrism commented Aug 31, 2020 •

edited

Loading

GPUtester commented Aug 31, 2020

GPUtester commented Aug 31, 2020

jrhemstad Aug 31, 2020

harrism Sep 1, 2020

jrhemstad Sep 1, 2020 •

edited

Loading

harrism Sep 2, 2020

harrism Sep 2, 2020

jrhemstad Aug 31, 2020

harrism Sep 1, 2020 •

edited

Loading

harrism Sep 1, 2020

harrism commented Sep 1, 2020

harrism commented Sep 1, 2020

harrism commented Sep 2, 2020 •

edited

Loading

harrism commented Sep 4, 2020

cwharris Sep 4, 2020 •

edited

Loading

harrism Sep 4, 2020

cwharris Sep 4, 2020

harrism Sep 4, 2020

harrism Sep 4, 2020

cwharris left a comment

harrism commented Sep 4, 2020

	logger_.set_pattern("[%l][%t][%H:%M:%S:%f] %v");
	logger_.set_pattern("[%t][%H:%M:%S:%f][%l] %v");

[REVIEW] Add debug/info/trace logging and fix multithreaded replay bench #529

[REVIEW] Add debug/info/trace logging and fix multithreaded replay bench #529

Conversation

harrism commented Aug 31, 2020 • edited Loading

GPUtester commented Aug 31, 2020

GPUtester commented Aug 31, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jrhemstad Sep 1, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harrism Sep 1, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

harrism commented Sep 1, 2020

harrism commented Sep 1, 2020

harrism commented Sep 2, 2020 • edited Loading

harrism commented Sep 4, 2020

cwharris Sep 4, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cwharris left a comment

Choose a reason for hiding this comment

harrism commented Sep 4, 2020

harrism commented Aug 31, 2020 •

edited

Loading

jrhemstad Sep 1, 2020 •

edited

Loading

harrism Sep 1, 2020 •

edited

Loading

harrism commented Sep 2, 2020 •

edited

Loading

cwharris Sep 4, 2020 •

edited

Loading