Logging framework #134

ali-ramadhan · 2019-03-26T21:51:31Z

Maybe more of a question for @charleskawczynski @simonbyrne but any suggestions for a logging library? It's probably a good software practice to move away from print/@printf statements to a logging framework that provides detailed logs for debugging (and for users, with a lower severity).

I quite like Memento.jl but maybe it's better to rely on Julia's built-in logging? The default format isn't very good for detailed logs but I'm sure we can set a custom format with timestamps, line numbers, etc.

The text was updated successfully, but these errors were encountered:

simonbyrne · 2019-03-26T22:44:03Z

I think starting with the built-in logging makes sense. It works pretty well, though there are some rough edges when dealing with multiple processes.

ali-ramadhan · 2019-03-27T14:25:14Z

Thanks, will look into customizing it!

simonbyrne · 2019-03-27T16:27:09Z

Re: distributed logging issues

simonbyrne · 2019-03-27T17:00:44Z

Let's keep this issue open until we've figure it out.

ali-ramadhan · 2019-03-27T17:03:59Z

Sounds good. I just closed it as you answered my question.

jkozdon · 2019-03-27T17:36:43Z

Right now our hack in the dycore is to redirect stdout to dev/null; see here

Would be better to have something like @info and @rootinfo, where @info is that standard logging with a little bit of extra info like which MPI rank is dumping the data and @rootinfo just has one MPI rank dump the data; we'd also obviously want all the other functionality of the logging package too available via @logmsg.

Could be useful to also have the info statement "registered" in some way so that we know what is dumping the message (atmosdycore, oceandycore, landmodel, etc.). Something along the lines of

# assuming mpirank == 0
julia> @info "message"
[ info, AtmosDycore, 3: message

julia> @rootinfo "message"
[ info, AtmosDycore, --: message

julia>

# assuming mpirank == 3
julia> @info "message"
[ info, AtmosDycore, 3: message

julia> @inforoot "message"

julia>

I remember thinking a while ago that this should all be possible with a fairly thin layer on top of Logging, I just never quite fully tried to do it.

c42f · 2019-07-16T02:43:38Z

Hey guys, I just saw this issue and being the author of the standard logging frontend I think it's flexible enough to do what you want (yes I am biased ;-) ) If it's not I think we should tweak it to make it so! To have truly composable logging I think it's critical that all packages should agree on a common front end for emitting logs, with pluggable backends depending on the use case.

The lack of functionality in stdlib Logging backends is mainly because of a lack of clear use cases and distributed logging is no exception. If you can provide a compelling and difficult use case I would love to collaborate on making the system better.

Could be useful to also have the info statement "registered" in some way

Fairly rich metadata is already present with each log record, including the originating module. You just need to customize the printing (the meta_formatter argument to ConsoleLogger or equivalent in your own logger). Dynamic information such as MPI rank can be harvested by plugging in an appropriate AbstractLogger backend. You don't need to change the frontend logging macro invocations for this. This is important because it means you can automatically attach MPI rank to log records which originate from third party modules which are otherwise outside your control.

It's all about composability.

smarras79 · 2019-10-15T13:03:14Z

It seems that this open issue handles what was mentioned at our meeting yesterday.
As @simonbyrne asked, information that should be logged for reproducibility of a test should at least contain the following:

Git revision number of the code that was used
julia version, julia set up command (e.g. julia --project=... ./PATH/TO/TEST/), libraries dependencies
Number of MPI ranks
Architecture on which the run was executed
Input parameters such as
- grid resolution
- time step and stepping scheme
- ...

Can memento be setup to extract these details automatically?

kpamnany · 2019-11-11T15:29:50Z

My $0.02: the standard logging front-end is the right way forward. An info dump of the details of the run as suggested by Simone should certainly be standardized, but it is orthogonal to the logging mechanism.

For the back-end though, please do note that as soon as you get up to 64 ranks or more, you essentially have to turn logging off, or redirect to file and then grep through a bunch of log files. As you get to higher rank counts, you will overwhelm whatever parallel file system with the number of log files. And remember that debugging at high levels of parallelism pretty much requires logs.

We have dealt with these issues in the past using Dlog. It does require some helper tools, but nothing else we tried worked at scale. I propose to implement this as a Logging back-end which is selected above some threshold of ranks. Below that threshold, a simple MPI-awareness tweak to the standard back-end will suffice.

c42f · 2019-11-12T03:16:17Z

We have dealt with these issues in the past using Dlog. It does require some helper tools, but nothing else we tried worked at scale. I propose to implement this as a Logging back-end which is selected above some threshold of ranks.

A custom backend sounds ideal for big distributed jobs and Dlog seems nice and simple (I suppose with sparse files a large SpacePerRank is not really a problem?) Log aggregation / filtering / search could be done as a separate parallel job itself if required. If you need custom tooling you could also consider using a binary log format to preserve the full julia type of log records.

It should be easy and scalable to install any runtime filtering rules on the task that produces the logs but I haven't thought about scalability in sinking the resulting records. It might also be handy to sink detailed logs using Dlog but additionally send a low volume selection of high-priority warn/error/progress messages across the network for real time monitoring. LoggingExtras.jl has some tooling for log routing which could be a good starting point for that kind of thing.

simonbyrne · 2020-02-28T04:41:54Z

Notes from discussion with @vchuravy @leios @charleskawczynski and @skandalaCLIMA:

We can build an "MPI Logger" on top of MPI I/O: specifically open a shared file pointer, and write messages to the file via MPI_FILE_IWRITE_SHARED, printing the source rank along with the message. This will ensure that multiline messages don't get "mixed up" as they do now.

We would need to expose MPI_FILE_IWRITE_SHARED via MPI.jl (though that should be easy now that some of the MPI I/O stuff is exposed), and it might make sense to implement this in a new package.

Some commercial/3rd party tools we might want to explore:

c42f · 2020-02-28T04:50:56Z

See also the related discussion about Distributed logging at JuliaLogging/TerminalLoggers.jl#22

A question for you: will you serialize the log records rather than turning them into text? That would allow much more powerful and natural filtering provided you're willing to build a small amount of tooling.

simonbyrne · 2020-02-28T05:10:34Z

We could: MPI I/O is just writing bytes (or any other MPI datatype), but we would need to define some sort of header for each message.

c42f · 2020-02-28T06:29:55Z

Doesn't the Serialization stdlib do most of that work for you?

julia> using Serialization
       io = IOBuffer()
       serialize(io, Any[("message", (x=1.0,i=10,s="blah"))])
       data = take!(io)
       # put data in a file or whatever

julia> deserialize(IOBuffer(data))
1-element Array{Any,1}:
 ("message", (x = 1.0, i = 10, s = "blah"))

simonbyrne · 2020-03-06T00:06:27Z

I meant that we would need a header to denote what MPI rank the message originated from.

c42f · 2020-03-06T00:30:55Z

We could but, at the moment, we have function pointers in the balance law, which I've heard cannot be reliably serialized.

I think Ptr{T} are serialized as null, so depending on what you want to do this could lead to trouble :-)

I meant that we would need a header to denote what MPI rank the message originated from.

Right 👍 If it's on a per log record basis you could alternatively just tag each message coming from the rank with an mpi_rank=$rank key value pair. That's the kind of thing that extensible key value pairs should work well for and filtering later on would be quite natural.

simonbyrne · 2020-03-06T00:34:53Z

Ah, cool.

Is there a list of packages which implement different logging backend somewhere? I'd be keen to see what others do.

c42f · 2020-03-06T02:32:28Z

Is there a list of packages which implement different logging backend somewhere

We've got some brand new tooling for REPL-based use (see TerminalLoggers.jl) which I'm excited about, but I'd say we're still figuring out what's possible and desirable for "serious projects". I've got a rough list of tooling at https://github.com/c42f/MicroLogging.jl/#julia-logging-ecosystem but this may not be very complete.

The CLIMA case may be the first large scale HPC application using this stuff. Just in the last week I've been chatting to @tanmaykm about backends for some Julia Computing internal stuff (JuliaTeam/JuliaRun) which I think is another interesting case study and shares some of the same "serious application" needs. Though it runs on completely different infrastructure from what you'll have.

Here's a snippet which would allow you to tag all log messages from a given Task with the mpi rank:

using LoggingExtras

# Something like this tool should go in a package somewhere...
function tag_logs(f; kws...)
    function add_keys(log)
        merge(log, (kwargs = merge(values(log.kwargs), values(kws)),))
    end
    with_logger(f, TransformerLogger(add_keys, current_logger()))
end

Then for example (with the standard REPL logger backend):

julia> function run_simulation()
           @info "Starting"
           @warn "A thing"
       end
run_simulation (generic function with 1 method)

julia> run_simulation()
[ Info: Starting
┌ Warning: A thing
└ @ Main REPL[8]:3

julia> mpi_rank = 10; # MPI.Comm_rank(...)) I guess

julia> tag_logs(mpi_rank=mpi_rank) do
           run_simulation()
       end
┌ Info: Starting
└   mpi_rank = 10
┌ Warning: A thing
│   mpi_rank = 10
└ @ Main REPL[8]:3

Note that tag_logs (and indeed all loggers) act in dynamic scope, so there's no need for run_simulation() to know about mpi_rank.

trontrytel · 2020-08-04T15:21:46Z

The silent mode for running the model would be useful for me when generating the tutorial markdown pages. For example here I just want to showcase the plots. The @Info just clutter the output:

https://clima.github.io/ClimateMachine.jl/previews/PR1375/generated/Numerics/DGMethods/showcase_filters/

jkozdon · 2020-08-04T15:50:06Z

posting some of my thoughts from slack (really a bump of my previous comment: #134 (comment))

Would certainly be nice to have more fine grained control than @info and @debug. Maybe a:

@trace (super chatty, post everything, maybe not needed)
@debug (diagnostic info that might be needed for a bug report?)
@info (usually runtime output)
@production (very minimal runtime info, default for tutorials?)

Also making it MPI friendly with something like @inforoot vs @info where the later only dumps data on the root rank (what we currently do) as opposed to all ranks (not possible right now).

Awesome bonus would be module specific debug, e.g., dump debug info from the driver but not the mesh.

To make all this work we need to be in the habit of curating output statements added to PRs.

simonbyrne · 2020-08-04T19:05:37Z

There are a couple of different ways we might want to print messages:

execute and print only on root: we can do this by just disabling the logger on non-root nodes. The downside is that we then can't do operations which invoke communication (such norm(Q)) values to the logger.
execute on all, print only on root: this is what we do now.
execute on any, print on any: should be possible as is (assuming no collective operations are used), but messages can get mixed up as there is no way to group the print messages from log commands into atomic units. To do this, we could either:
- try to cache the full log message in an IOBuffer so it only requires 1 write operation to stdout (which should hopefully be atomic), or
- use shared file pointers and MPI_FILE_WRITE_SHARED (which is now available in MPI.jl). Unfotrunately support for writing to stdout with these is mixed. We could write to a file, but then we lose nice live logs with buildkite (Use buildkite to drive slurm cpu/gpu tests #1406).
execute on all, print on all (in order): again, should now be possible using MPI_FILE_WRITE_ORDERED.

There are also fancy things we can do where some output goes to stdout and some to a file (see https://github.com/oxinabox/LoggingExtras.jl).

One question is how this should be exposed at the driver level?

ali-ramadhan closed this as completed Mar 27, 2019

simonbyrne reopened this Mar 27, 2019

This was referenced Apr 12, 2019

Balance Law: add docs, refactor some of the interface, add AtmosDycore tests #173

Merged

Remove MPI dependency #172

Closed

ali-ramadhan mentioned this issue Oct 15, 2019

Logging. CliMA/Oceananigans.jl#71

Closed

c42f mentioned this issue Oct 24, 2019

Logging CliMA/Oceananigans.jl#478

Merged

c42f mentioned this issue Mar 6, 2020

Printing additional information JuliaLogging/ProgressLogging.jl#23

Open

jakebolewski mentioned this issue Jun 3, 2020

Doc build filled with Base.IOError("stream is closed or unusable", 0) exceptions #1190

Closed

3 tasks

jkozdon mentioned this issue Aug 4, 2020

Silent mode for running the model #1409

Closed

3 tasks

c42f mentioned this issue Nov 26, 2020

Need better control over worker logging JuliaLang/Distributed.jl#51

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Logging framework #134

Logging framework #134

ali-ramadhan commented Mar 26, 2019 •

edited

Loading

simonbyrne commented Mar 26, 2019

ali-ramadhan commented Mar 27, 2019

simonbyrne commented Mar 27, 2019

simonbyrne commented Mar 27, 2019

ali-ramadhan commented Mar 27, 2019 •

edited

Loading

jkozdon commented Mar 27, 2019

c42f commented Jul 16, 2019 •

edited

Loading

smarras79 commented Oct 15, 2019

kpamnany commented Nov 11, 2019

c42f commented Nov 12, 2019

simonbyrne commented Feb 28, 2020 •

edited

Loading

c42f commented Feb 28, 2020

simonbyrne commented Feb 28, 2020

c42f commented Feb 28, 2020

simonbyrne commented Mar 6, 2020

c42f commented Mar 6, 2020

simonbyrne commented Mar 6, 2020

c42f commented Mar 6, 2020

trontrytel commented Aug 4, 2020

jkozdon commented Aug 4, 2020

simonbyrne commented Aug 4, 2020

Logging framework #134

Logging framework #134

Comments

ali-ramadhan commented Mar 26, 2019 • edited Loading

simonbyrne commented Mar 26, 2019

ali-ramadhan commented Mar 27, 2019

simonbyrne commented Mar 27, 2019

simonbyrne commented Mar 27, 2019

ali-ramadhan commented Mar 27, 2019 • edited Loading

jkozdon commented Mar 27, 2019

c42f commented Jul 16, 2019 • edited Loading

smarras79 commented Oct 15, 2019

kpamnany commented Nov 11, 2019

c42f commented Nov 12, 2019

simonbyrne commented Feb 28, 2020 • edited Loading

c42f commented Feb 28, 2020

simonbyrne commented Feb 28, 2020

c42f commented Feb 28, 2020

simonbyrne commented Mar 6, 2020

c42f commented Mar 6, 2020

simonbyrne commented Mar 6, 2020

c42f commented Mar 6, 2020

trontrytel commented Aug 4, 2020

jkozdon commented Aug 4, 2020

simonbyrne commented Aug 4, 2020

ali-ramadhan commented Mar 26, 2019 •

edited

Loading

ali-ramadhan commented Mar 27, 2019 •

edited

Loading

c42f commented Jul 16, 2019 •

edited

Loading

simonbyrne commented Feb 28, 2020 •

edited

Loading