Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agents stops sending logs after restarting the Azure functions #3173

Closed
ibrahim-abdellaoui opened this issue Jul 1, 2023 · 17 comments · Fixed by #3244
Closed

Agents stops sending logs after restarting the Azure functions #3173

ibrahim-abdellaoui opened this issue Jul 1, 2023 · 17 comments · Fixed by #3244

Comments

@ibrahim-abdellaoui
Copy link

ibrahim-abdellaoui commented Jul 1, 2023

Expected behavior

Describe what you expected to happen.
The agent keeps sending logs to AI after the function is restarted (manually or by the functions runtime)

Actual behavior

Describe what actually happened that was unexpected.
When the function is first deployed the agent seems to work properly also the VM options are set as

JVM arguments:-XX:+TieredCompilation
-XX:TieredStopAtLevel=1
-Xverify:none
-Djava.net.preferIPv4Stack=true
***** -javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar *****

but after restarting the Azure function, or waiting (around 20min idl) for the runtime to stop the function and then invoking it the logs are missing. also the VM options have changed to:

JVM arguments:-XX:+TieredCompilation
-XX:TieredStopAtLevel=1
-Xverify:none
-Djava.net.preferIPv4Stack=true
***** -javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar  -DLazySetOptIn=false *****

there is a new -DLazySetOptIn that is set now, reading the azure java worker config it seems this comes from the env variable INITIALIZED_FROM_PLACEHOLDER: True.
This also impacts using the old AI classic SDK (2.x), for the first run everything works fin and it is logged by the classic sdk, but after the restart, the logs are missing even tho the agent is not explicitly enabled.
I noticed also that after the restart there are no new entries in the applicationinsights.log file (I have enabled APPLICATIONINSIGHTS_SELF_DIAGNOSTICS_LEVEL: debug

I don't know for sure if this is related to the Agent or the java worker, please let me know if I should create the issue in the Java worker repo instead.

To Reproduce

restarting the Azure function manually triggers this bug. and having APPLICATIONINSIGHTS_ENABLE_AGENT: True

System information

Please provide the following information:

  • SDK Version: Azure functions 4.x, Agent 3.4.11
  • OS type and version: Windows
  • Application Server type and version (if applicable):
  • Using spring-boot? No
  • Additional relevant libraries (with version, if applicable):
@ibrahim-abdellaoui
Copy link
Author

the workaround now is to disable the agent explicitly using the config languageWorkers:java:arguments: -DaiAgentOptOut and using the classic AI SDK v2.x

@ibrahim-abdellaoui
Copy link
Author

It seems the bug is coming from the warmup of the worker, the worker is using the FunctionEnvironmentReloadRequestHandler at warmup wish is Instrumented by the agent to configure the client once, but at this point I think the customer env variables are not set still so the agent is not configured properly at this point. but after the agent receives the correct ReloadRequest with the customer's configuration, it would not reconfigure because it was already configured at warmup.

@shreyas-gopalakrishna
Copy link
Member

shreyas-gopalakrishna commented Jul 3, 2023

@ibrahim-abdellaoui what app service plan you notice this bug in?
Could you also share the app name and timestamp you performed this test

@ibrahim-abdellaoui
Copy link
Author

@shreyas-gopalakrishna Azure functions Consumption, OS windows. It will be better if I create a new function for tests that I can provide you

@kaibocai
Copy link
Member

kaibocai commented Jul 5, 2023

Hi @ibrahim-abdellaoui , I have a few questions regards to this issue.
When you first deployed the JVM arguments you saw were

JVM arguments:-XX:+TieredCompilation
-XX:TieredStopAtLevel=1
-Xverify:none
-Djava.net.preferIPv4Stack=true
***** -javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar *****

This seems incorrect to me as for the Consumption plan it should have -DLazySetOptIn=false at the end, which means the agent is not enabled, so there shouldn't be any logs from distributed tracing. Since you don't have -DLazySetOptIn=false, distributed tracing/agent is enabled in this case.

Once you stop and start the function the JVM arguments you saw were

JVM arguments:-XX:+TieredCompilation
-XX:TieredStopAtLevel=1
-Xverify:none
-Djava.net.preferIPv4Stack=true
***** -javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar  -DLazySetOptIn=false *****

this is actually correct and expected. With -DLazySetOptIn=false you have to set the app setting APPLICATIONINSIGHTS_ENABLE_AGENT: True to enable the distributed tracing/agent.

So I think the issue is not why the logs are missing after stopping/starting the function app, instead, it's why the agent is enabled at the first deployment.

Also, I am curious how were you able to check the JVM arguments. I don't think we expose this info anywhere to customers. Thanks.

@kaibocai
Copy link
Member

kaibocai commented Jul 5, 2023

I tried deploy a consumption app, I cannot reproduce the issue. From our internal logs I see the correct profile AppInsightsPlaceholder is chosen, which maps to the correct JVM arguments with -DLazySetOptIn=false
image

@ibrahim-abdellaoui
Copy link
Author

Hello @kaibocai, I had some time to test again and here are my findings:
created a new function app named ai-agent-tests and deployed a new function (using zip deploy az functionapp deployment source config-zip ) the only thing I changed was adding the classic sdk

<dependency>
            <groupId>com.microsoft.azure</groupId>
            <artifactId>applicationinsights-core</artifactId>
            <version>2.6.4</version>
  </dependency>
   private String getJVMArgs() {
        RuntimeMXBean runtimeMxBean = ManagementFactory.getRuntimeMXBean();
        List<String> arguments = runtimeMxBean.getInputArguments();

        return String.join(",", arguments);
    }
    @FunctionName("HttpExample")
    public HttpResponseMessage run(
            @HttpTrigger(
                name = "req",
                methods = {HttpMethod.GET, HttpMethod.POST},
                authLevel = AuthorizationLevel.ANONYMOUS)
                HttpRequestMessage<Optional<String>> request,
            final ExecutionContext context) {

        var telemetryClient = new TelemetryClient();
        telemetryClient.trackTrace("log from ai sdk");
        context.getLogger().info("log from default logger");

        return request.createResponseBuilder(HttpStatus.OK)
                .body(getJVMArgs())
                .build();
    }

Also, I didn't specify APPLICATIONINSIGHTS_ENABLE_AGENT: True
The time is in UTC+2

  1. 12:07 New deployment from zip
  2. 12:08 trigger function, I get the following result
-XX:+TieredCompilation,-XX:TieredStopAtLevel=1,-Xverify:none,-Djava.net.preferIPv4Stack=true

both logs appear in AI, the one logged via classic sdk has the sdkVersion java:2.6.4 and the one logged with the default context logger has sdkVersion azurefunctions: 4.22.0.20804

  1. 12:09 restart the function in the portal
  2. 12:10 trigger the function I get the following results:
-XX:+TieredCompilation,-XX:TieredStopAtLevel=1,-Xverify:none,-Djava.net.preferIPv4Stack=true,-javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar,-DLazySetOptIn=false

It seems now the worker starting using the warmup
The log from AI sdk is not showing in AI, I only see the log that was logged using the context logger.
It seems that the agent is instrumenting the classic sdk even tho the agent is disabled
my hypothesis for this bug is that it is caused by the warmup as I described in 2nd comment

@ibrahim-abdellaoui
Copy link
Author

same test but now with APPLICATIONINSIGHTS_ENABLE_AGENT: True

12:30 Added APPLICATIONINSIGHTS_ENABLE_AGENT: True to the function configuration
12:30 triggered the function and got:

-XX:+TieredCompilation,-XX:TieredStopAtLevel=1,-Xverify:none,-Djava.net.preferIPv4Stack=true,-javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar

the log from the classic sdk shows up in AI, and has the expected sdkVersion fwr_java:3.4.11

12:31 restart the function
12:32 trigger the function and got

-XX:+TieredCompilation,-XX:TieredStopAtLevel=1,-Xverify:none,-Djava.net.preferIPv4Stack=true,-javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar,-DLazySetOptIn=false

It seems after a config change or a deployment the function is not warmed up, the warmup only happens on restart or when the function goes in idle
the log from classic sdk doesn't show in AI even tho the agent is enabled
in this case, the agent is attached and enabled but it is not configured appropriately (connection string) so it's not able to send the logs to AI

@kaibocai
Copy link
Member

kaibocai commented Jul 6, 2023

There are two agents I think we need to be clear about. azurefunctions: 4.22.0.20804 this one is azure function runtime dotnet agent, it's turned on by default. The context.getLogger().info will be monitored by this agent, but telemetryClient.trackTrace will not, since it's in a downstream process (java process that running your code. The java ai agent can be enabled in this process), other than the process where the dotnet agent is running in.
When you have APPLICATIONINSIGHTS_ENABLE_AGENT: True set, you turned on the java ai agent/distributed tracing (fwr_java:3.4.11). https://learn.microsoft.com/en-us/azure/azure-monitor/app/monitor-functions. The agent will take over monitoring scope for the java process that running your java code, so any loggers (logback, log4j2, etc) you are using in your java function app, also context.getLogger().info and telemetryClient.trackTrace will start being monitored by the java ai agent(fwr_java:3.4.11).

@ibrahim-abdellaoui , I tried to reproduce the issue, with your code, I cannot reproduce the issue, everytime I deploy I am seeing
image
Which is expected.

The first deployment I only see
image
As you can see it's being monitored by the dotnet agent, not the java ai agent.

Restart the app and still get the same JVM arguments
image
and the ai logs look the same as above.


Once I enable the java agent using APPLICATIONINSIGHTS_ENABLE_AGENT: True
I am seeing
image
and the log, as you can see the monitor agent changed to java ai agent and the telemetryClient.trackTrace are there.
image

This works as expected. Also I am using Maven plugin to deploy the app. I doubt there is a difference between using Maven and azure cli, since the backend API is the same, but if you get a chance can you try out using Maven plugin to deploy an Windows consumption app and test all this out? The command is really simple.
mvn azure-functions:deploy

Thanks.

@ibrahim-abdellaoui
Copy link
Author

@kaibocai you are able to produce the bug in your tests
In the fist image you posted you should see two logs one from the dotnet agent, and one from the agent that is attached to the JVM the log from ai sdk. but in the image, there is only one, and per the results you got this run was using a warmup instance (or host, don't the exact terminology here). this is the bug I'm talking about
image

and on your second run we see that the instance was run without warmup the missing -DLazySetOptIn=false in the VM args we see the expected behavior. the normal behavior that should happen if the instance was warmed up or not
image

To summarize all this, the bug is that when the instance used was warmed up (we know that because the env variable INITIALIZED_FROM_PLACEHOLDER is set to true or the JVM is started with the arg -DLazySetOptIn=false) the java agent attached to the JVM is not sending logs to AI.

I also think deploying via maven will be the same but I will test it tomorrow for double checking.

@kaibocai
Copy link
Member

kaibocai commented Jul 6, 2023

This is not a bug, it's a feature requires you to enable through appsetting APPLICATIONINSIGHTS_ENABLE_AGENT
Can you read through this please https://learn.microsoft.com/en-us/azure/azure-monitor/app/monitor-functions

Distributed Tracing GA announcement

@ibrahim-abdellaoui
Copy link
Author

but the classic sdk v2.x should work even without the agent attached, and that is the workaround that we are using now, forcing the agent not to be attached via the setting languageWorkers:java:arguments: -DaiAgentOptOut.
ignoring all this, it seems in your test you have enabled the agent, the JVM args show that the agent is attached but in the first picture we see the log is missing.
in my second test, I enabled the agent and the logs are still missing if the instance is warmed up.

same test but now with APPLICATIONINSIGHTS_ENABLE_AGENT: True

12:30 Added APPLICATIONINSIGHTS_ENABLE_AGENT: True to the function configuration 12:30 triggered the function and got:

-XX:+TieredCompilation,-XX:TieredStopAtLevel=1,-Xverify:none,-Djava.net.preferIPv4Stack=true,-javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar

the log from the classic sdk shows up in AI, and has the expected sdkVersion fwr_java:3.4.11

12:31 restart the function 12:32 trigger the function and got

-XX:+TieredCompilation,-XX:TieredStopAtLevel=1,-Xverify:none,-Djava.net.preferIPv4Stack=true,-javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar,-DLazySetOptIn=false

It seems after a config change or a deployment the function is not warmed up, the warmup only happens on restart or when the function goes in idle the log from classic sdk doesn't show in AI even tho the agent is enabled in this case, the agent is attached and enabled but it is not configured appropriately (connection string) so it's not able to send the logs to AI

@ibrahim-abdellaoui
Copy link
Author

to reproduce it set the APPLICATIONINSIGHTS_ENABLE_AGENT: True and restart the function, if it is started using a warmed instance you won't see the logs in AI

@kaibocai
Copy link
Member

kaibocai commented Jul 6, 2023

but the classic sdk v2.x should work even without the agent attached, and that is the workaround that we are using now, forcing the agent not to be attached via the setting languageWorkers:java:arguments: -DaiAgentOptOut.
ignoring all this, it seems in your test you have enabled the agent, the JVM args show that the agent is attached but in the first picture we see the log is missing.

Checkout the code here

The agent is not initialized if -DLazySetOptIn=false is there. So in my first deployment agent is started but not initialized which means it's not sending data to AI collector.

@trask should be able to confirm and help with

the classic sdk v2.x should work even without the agent attached

@ibrahim-abdellaoui
Copy link
Author

same test but now with APPLICATIONINSIGHTS_ENABLE_AGENT: True

12:30 Added APPLICATIONINSIGHTS_ENABLE_AGENT: True to the function configuration 12:30 triggered the function and got:

-XX:+TieredCompilation,-XX:TieredStopAtLevel=1,-Xverify:none,-Djava.net.preferIPv4Stack=true,-javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar

the log from the classic sdk shows up in AI, and has the expected sdkVersion fwr_java:3.4.11

12:31 restart the function 12:32 trigger the function and got

-XX:+TieredCompilation,-XX:TieredStopAtLevel=1,-Xverify:none,-Djava.net.preferIPv4Stack=true,-javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar,-DLazySetOptIn=false

It seems after a config change or a deployment the function is not warmed up, the warmup only happens on restart or when the function goes in idle the log from classic sdk doesn't show in AI even tho the agent is enabled in this case, the agent is attached and enabled but it is not configured appropriately (connection string) so it's not able to send the logs to AI

still this doesn't explain why in this test the logs are missing, I have the APPLICATIONINSIGHTS_ENABLE_AGENT: True

@kaibocai
Copy link
Member

kaibocai commented Jul 6, 2023

12:31 restart the function
12:32 trigger the function and got
-XX:+TieredCompilation,-XX:TieredStopAtLevel=1,-Xverify:none,-Djava.net.preferIPv4Stack=true,-javaagent:C:\Program Files (x86)\SiteExtensions\Functions\4.22.0\workers\java/agent/applicationinsights-agent.jar,-DLazySetOptIn=false
It seems after a config change or a deployment the function is not warmed up, the warmup only happens on restart or when the function goes in idle
the log from classic sdk doesn't show in AI even tho the agent is enabled
in this case, the agent is attached and enabled but it is not configured appropriately (connection string) so it's not able to send the logs to AI

Confirmed restart the app with the appsetting added will cause the log from classic SDK missing. I will work with the team on investigating this issue. It seems the agent is disabled as the SDKversion changed to dotnet agent.
image

Will keep you updated. Thanks for all the info and tests.

@ibrahim-abdellaoui
Copy link
Author

It seems the bug is coming from the warmup of the worker, the worker is using the FunctionEnvironmentReloadRequestHandler at warmup wish is Instrumented by the agent to configure the client once, but at this point I think the customer env variables are not set still so the agent is not configured properly at this point. but after the agent receives the correct ReloadRequest with the customer's configuration, it would not reconfigure because it was already configured at warmup.

I think this is the source of the bug, hope this helps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants