Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stackdriver logging #68 #81

Merged
merged 112 commits into from
May 11, 2017
Merged

Stackdriver logging #68 #81

merged 112 commits into from
May 11, 2017

Conversation

gregw
Copy link
Contributor

@gregw gregw commented Nov 30, 2016

The implementation of #68 is now working with the latest google-cloud-logging API supporting the changes we requested.

Currently we still have threadlocal loop protection, which I will now test if it is still needed.

@gregw
Copy link
Contributor Author

gregw commented Dec 1, 2016

Note that even with the threadlocal protection, turning logging to FINE globally is preventing deployment, but without an indication why. It may just be load and the instance takes too long to respond.

Note that netty appears to have it's own internal SLF4J logger impl that is correctly getting logged, but is taking its FINE setting from the global setting rather than an io.netty.level=INFO setting?

Selecting turning on FINE logging for some components does work.

@gregw
Copy link
Contributor Author

gregw commented Dec 1, 2016

I have worked out why deployment fails when .level=FINE is set, it is because of an OOM exception. This is probably caused by the vast number of logs from netty, which JUL logging does not appear to be able to suppress. There are many LogRecords with:

loggerName=io.grpc.netty.NettyClientHandler 
SourceClassName=io.netty.util.internal.logging.Slf4JLogger 
SourceClassMethod=debug

Even thought the logging configuration is:

.level = FINE
io.netty.level = INFO
io.gcr.level = INFO

So I may put in an explicit filter to remove those netty debug records?

Also we can consider if we want to monitor low memory and stop generating verbose logs?

thoughts?

@gregw
Copy link
Contributor Author

gregw commented Dec 1, 2016

@meltsufin updated to use the same trace id label as nginx log

@meltsufin
Copy link
Member

@gregw Even at the default logging level, it looks like we get two log messages from grpc for each log message that we send. I think during the actual remote logging, we should be suppressing all logs from google-cloud-logging. Is this feasible?

Dec 05, 2016 5:22:39 PM org.eclipse.jetty.server.Server doStart INFO: Started @4262ms
Dec 05, 2016 5:27:14 PM com.google.meltsufin.test.MainServletFlex doGet SEVERE: severe level log using JUL
Dec 05, 2016 5:27:14 PM com.google.meltsufin.test.MainServletFlex doGet INFO: info level log using JUL
Dec 05, 2016 5:27:15 PM org.eclipse.jetty.server.handler.ContextHandler$Context log INFO: test_flex: Testing Servlet.log method in Flex.
Dec 05, 2016 5:27:16 PM io.grpc.internal.ManagedChannelImpl <init> INFO: [ManagedChannelImpl@725cba47] Created with target logging.googleapis.com:443
Dec 05, 2016 5:27:16 PM io.grpc.internal.ManagedChannelImpl <init> INFO: [ManagedChannelImpl@346c067c] Created with target logging.googleapis.com:443
Dec 05, 2016 5:27:16 PM io.grpc.internal.ManagedChannelImpl <init> INFO: [ManagedChannelImpl@9329855] Created with target logging.googleapis.com:443
Dec 05, 2016 5:27:18 PM io.grpc.internal.ManagedChannelImpl maybeTerminateChannel INFO: [ManagedChannelImpl@725cba47] Terminated
Dec 05, 2016 5:27:18 PM io.grpc.internal.ManagedChannelImpl maybeTerminateChannel INFO: [ManagedChannelImpl@346c067c] Terminated
Dec 05, 2016 5:27:18 PM io.grpc.internal.ManagedChannelImpl maybeTerminateChannel INFO: [ManagedChannelImpl@9329855] Terminated

static final Logger logger = Logger.getLogger(RequestContextScope.class.getName());

private static final String X_CLOUD_TRACE = "x-cloud-trace-context";
private static final ThreadLocal<String> traceid = new ThreadLocal<>();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use camel case traceId.

if (id != null) {
int slash = id.indexOf('/');
if (slash >= 0) {
id = id.substring(0, slash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we just dropping the span id? I'm not sure this is what we're supposed to be doing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm also not sure. The compat version we inherited did a scan for the first non hex character and split there, so I just simplified and inherited that behaviour. Can you investigate if we can leave it in.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will find out.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got a response from the logging team indicating that we should only log the trace id without the span id. So, this is good.

import java.util.logging.Level;
import java.util.logging.Logger;

public class RequestContextScope implements ContextHandler.ContextScopeListener {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some javadocs to explain the purpose of the class?

* @param traceid The trace ID
* @param runnable the runnable
*/
public static void runWith(String traceid, Runnable runnable) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this method used somewhere, or just provided as a utility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a utility that I thought we might need. I will drop for now as it will be trivial to readd if there ever is a need.

@Override
public synchronized void publish(LogRecord record) {
// check we are not already flushing logs
if (Boolean.TRUE.equals(flushing.get())) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a matter of style, but I would save a line and write it as:

   // check we are not already flushing logs
   if (!Boolean.TRUE.equals(flushing.get())) {
     super.publish(record);
    }

*/
public TracingLogHandler(String logName, LoggingOptions options, MonitoredResource resource) {
super(logName, options, resource);
monitored = MonitoredResource.newBuilder("gae_app")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're sort of overwriting the resource that is passed in here. Maybe create the resource only in the no-arg constructor instead?

@@ -0,0 +1,9 @@
.level = INFO
io.netty.level = INFO
io.gcr.level = INFO
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to set log levels for netty and gcr here if we have the flushing guard in TracingLogHandler.publish()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added it because netty logs from background threads. However there is still the issue that not all of netty's log mechanism appears not to see these settings anyway, so perhaps we just need an explicit filter in the publish method?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would an explicit filter work so that it deals with the Netty background threads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently the approach taken by the gcloud-java client library is to have fixed list of known log names that it explicitly scans for and removes the stackdriver handler from.

This is the approach that should also stop looping, but it not.

The same list could simply be applied within the log handler and any logs from those handlers would be discarded. This list could either be a complete list, in which case we would not need either the threadlocal approach nor the handler deregistration.

Alternately the list can just be limited to the loggers that are not reacting to the logging properties configuration.

Currently I don't like any solution. My preference would be to have no filtering and just loop protection. Overly verbose logging should be handled gracefully (with discarded logs and a message to that effect) and then the user can just configure logging to only log what they are interested in.

So I need to investigate why those netty classes are ignoring the properties file.

@gregw
Copy link
Contributor Author

gregw commented Dec 5, 2016

@meltsufin this is still a work in progress as I would like to:

  • Find a better filter mechanism for netty background logging.
  • Profile memory usage
  • Log to StdErr if we can detect problem with logging eg OOM or low memory?
  • find out if traceId should be trimmed or not
  • Merge in a fix for Use jetty 9.4 property configuration mechanism #83

@gregw
Copy link
Contributor Author

gregw commented Dec 6, 2016

@meltsufin I measured memory usage more accurately to be around 4KB per log. I have raised googleapis/google-cloud-java#1449 to see if that can be reduced and/or some limits put on the size of the queued logs.

@meltsufin
Copy link
Member

We're really blocking on 2 things:

@meltsufin
Copy link
Member

Noting that a related issue was filed against GRPC: grpc/grpc-java#2515

Removed stackdriver logging, so this just configures JUL to capture the traceid and log to stderr
@gregw
Copy link
Contributor Author

gregw commented Apr 20, 2017

@meltsufin The https://github.com/GoogleCloudPlatform/jetty-runtime/tree/master-68-docker-args branch is based on this branch and now includes:

  • changes to work with latest openjdk-runtime (not released) entry point
  • tests to check that logging classes are hidden from the webapp
  • README updated to suggest that sun.net and io.grpc.netty. logging be kept at INFO when using stackdriver

The #41 branch for sessions is based of this new branch.

@meltsufin
Copy link
Member

@gregw you can merge the master-68-docker-args branch into here.
@aslo can you do a release of openjdk-runtime?

@aslo
Copy link
Contributor

aslo commented Apr 21, 2017

New image has been released and is available at gcr.io/google-appengine/openjdk:8-2017-04-21_15_02.

I just opened #157 to bump the base image.

@gregw
Copy link
Contributor Author

gregw commented Apr 23, 2017

@meltsufin I'm currently seeing problems with the classloading tests when I run them. These tests were passing, but are now failing, so I need to investigate a bit more. Probably wont get a chance until Tuesday.

Copy link
Member

@meltsufin meltsufin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're ready to update the GaeFlexLoggingEnhancer references.

Provides SLF4J API. Requires a slf4j implementation (eg slf4j-simple-impl)
otherwise a noop implementation is used.

TODO Remove once jetty-9.5.4 released
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean jetty-9.4.5?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops yes.

pom.xml Outdated
@@ -29,13 +29,14 @@
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<maven.build.timestamp.format>yyyy-MM-dd-HH-mm</maven.build.timestamp.format>
<gcloud.api.version>0.13.0-beta</gcloud.api.version>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0.17.1 is out now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, on it now. Hopefully ready tomorrow (as will a 9.4.5 release with simpler class hiding)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@meltsufin I can't see a 0.17.1 in maven central. But I can see a 1.0.0! http://repo2.maven.org/maven2/com/google/cloud/google-cloud-logging/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok, even better! :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@meltsufin I just tried 1.0.0, but it failed with this issue: googleapis/google-cloud-java#1994

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix has been merged, please retry when you get a chance.

README.md Outdated
com.google.cloud.logging.LoggingHandler.level=FINE
com.google.cloud.logging.LoggingHandler.log=gae_app.log
com.google.cloud.logging.LoggingHandler.resourceType=gae_app
com.google.cloud.logging.LoggingHandler.enhancers=com.google.cloud.logging.GaeFlexLoggingEnhancer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be TraceLoggingEnhancer now.

Copy link

@jabubake jabubake May 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is running on GAE Flex, you no longer need to specify the resourceType and enhancers. it gets auto-detected based on the GAE_INSTANCE env var. TraceLoggingEnhancer is automatically added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks Jisha!

sun.net.level=INFO

handlers=com.google.cloud.logging.LoggingHandler
com.google.cloud.logging.LoggingHandler.level=FINE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the TraceLoggingEnhancer be here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. It is now automatic and I have updated the README.

@gregw
Copy link
Contributor Author

gregw commented May 10, 2017

@meltsufin while this PR is looking close to ready, @janbartel is having problems merging with her session work because of dependency strangeness in google-cloud-java. See googleapis/google-cloud-java#2050
We are trying to manually correct the dependencies.

@gregw gregw merged commit 30127c8 into master May 11, 2017
@meltsufin meltsufin deleted the master-68 branch May 11, 2017 17:17
@bmenasha
Copy link

Thank you for your hard work on this Greg and Mike! This makes the Flexible experience much better.

@meltsufin
Copy link
Member

@gregw Something in this PR broke our integration with the cloud debugger. I suspect it's the class hiding. Investigating...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants