Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flushing breaks when log subscription is not available. #386

Open
JonasKunz opened this issue Apr 20, 2023 · 1 comment
Open

Flushing breaks when log subscription is not available. #386

JonasKunz opened this issue Apr 20, 2023 · 1 comment
Labels
aws-λ-extension AWS Lambda Extension bug Something isn't working

Comments

@JonasKunz
Copy link
Contributor

JonasKunz commented Apr 20, 2023

APM AWS Lambda extensiom version (latest (v1.3.1)):

Describe the bug

I’m currently playing around with the lambda extension + java agent + aws lambda runtime interface emulator in an attempt to create a lambda integration test for verifying the agent <-> lambda extension interaction.
This is working nicely so far (lambda invocation from my locally running container appear as lambda transaction in my elastic cloud APM), with one exception:

Even though I’ve set ELASTIC_APM_SEND_STRATEGY to syncflush , the transactions for invocations are always buffered until another lambda invocation is made. On shutdown, the transaction for the last invocation is not sent and therefore lost.

I have the suspicion that this behaviour is related to the edge case, that RIE does not provide a proper extension API for logs, as the extension is complaining about a "subscription failure" due to logs. Nonetheless I think the lambda extension should work properly even if for some reason the logging subscription is not working correctly.

Where did I get that suspicion from?

Excerpt from Logs
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":144},"message":"Received event.","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":145},"message":"{\n\t\"timestamp\": \"2023-04-20T11:43:15.069206006Z\",\n\t\"eventType\": \"INVOKE\",\n\t\"deadlineMs\": 1681991055068,\n\t\"requestId\": \"186ea9c1-da69-4e16-bc89-1d6f6a6b110a\",\n\t\"invokedFunctionArn\": \"arn:aws:lambda:us-east-1:012345678912:function:local-test-lambda\",\n\t\"tracing\": {\n\t\t\"type\": \"\",\n\t\t\"value\": \"\"\n\t}\n}","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":206},"message":"Logs collection not started due to earlier subscription failure","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":230},"message":"Received runtimeDone signal","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":95},"message":"Waiting for background data send to end","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":58},"message":"Invocation context cancelled, not processing any more agent data","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":81},"message":"Flush started - Checking for agent data","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":113},"message":"Flush ended for lambda data - no data in buffer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-20T11:43:15.069Z","log.origin":{"file.name":"app/run.go","file.line":127},"message":"Waiting for next event...","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.306Z","log.origin":{"file.name":"apmproxy/receiver.go","file.line":119},"message":"Handling APM Data Intake","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:15.313Z","log.origin":{"file.name":"apmproxy/receiver.go","file.line":119},"message":"Handling APM Data Intake","ecs.version":"1.6.0"}
...
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.363Z","log.origin":{"file.name":"app/run.go","file.line":144},"message":"Received event.","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"app/run.go","file.line":145},"message":"{\n\t\"timestamp\": \"2023-04-20T11:43:19.363927758Z\",\n\t\"eventType\": \"INVOKE\",\n\t\"deadlineMs\": 1681991059357,\n\t\"requestId\": \"af63ba87-73c1-4b53-b769-1cddf745e7ec\",\n\t\"invokedFunctionArn\": \"arn:aws:lambda:us-east-1:012345678912:function:local-test-lambda\",\n\t\"tracing\": {\n\t\t\"type\": \"\",\n\t\t\"value\": \"\"\n\t}\n}","ecs.version":"1.6.0"}
{"log.level":"warn","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"app/run.go","file.line":206},"message":"Logs collection not started due to earlier subscription failure","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"app/run.go","file.line":228},"message":"APM client has sent flush signal","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"app/run.go","file.line":95},"message":"Waiting for background data send to end","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":58},"message":"Invocation context cancelled, not processing any more agent data","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.364Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":81},"message":"Flush started - Checking for agent data","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.365Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":169},"message":"Sending data chunk to APM server","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.374Z","log.origin":{"file.name":"apmproxy/receiver.go","file.line":119},"message":"Handling APM Data Intake","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.379Z","log.origin":{"file.name":"apmproxy/receiver.go","file.line":119},"message":"Handling APM Data Intake","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.553Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":270},"message":"APM server Transport status set to Healthy","ecs.version":"1.6.0"}
{"log.level":"debug","@timestamp":"2023-04-20T11:43:19.553Z","log.origin":{"file.name":"apmproxy/apmserver.go","file.line":113},"message":"Flush ended for lambda data - no data in buffer","ecs.version":"1.6.0"}
{"log.level":"info","@timestamp":"2023-04-20T11:43:19.553Z","log.origin":{"file.name":"app/run.go","file.line":127},"message":"Waiting for next event...","ecs.version":"1.6.0"}

The logs show that the extension performs a flush before the data from the agent is actually received.
The same behaviour happens on the next lambda invocation, except that now at least the data from the previous invocation is flushed.

With my non-existent Go skill I would deduce that this line might be the problem: In case the logging subscription failed, processEvents() will finish here immediately.

To Reproduce

  1. Create a docker image with a minimal lambda and the extension + any APM agent:
FROM docker.elastic.co/observability/apm-lambda-extension-arm64:latest AS lambda-extension
FROM docker.elastic.co/observability/apm-agent-java:latest AS java-agent

FROM public.ecr.aws/lambda/java:11

ENV AWS_REGION=eu-central-1
ENV AWS_LAMBDA_FUNCTION_NAME=local-test-lambda
ENV AWS_LAMBDA_FUNCTION_MEMORY_SIZE=1024
#timeout in seconds
ENV AWS_LAMBDA_FUNCTION_TIMEOUT=60

ENV AWS_LAMBDA_EXEC_WRAPPER=/opt/elastic-apm-handler
ENV ELASTIC_APM_LAMBDA_APM_SERVER=https://XXXXXXXXa.apm.europe-west3.gcp.cloud.es.io:443
ENV ELASTIC_APM_SECRET_TOKEN=XXXXXXXXXX
ENV ELASTIC_APM_SEND_STRATEGY=syncflush
ENV ELASTIC_APM_LOG_LEVEL=DEBUG

COPY --from=lambda-extension /opt/elastic-apm-extension /opt/extensions/elastic-apm-extension
COPY --from=java-agent /usr/agent/ /opt/

# usual lambda entrypoint setup:
COPY target/LambdaPlayground.jar ${LAMBDA_TASK_ROOT}/lib/LambdaPlayground.jar

CMD [ "co.elastic.playground.TestLambda" ]
  1. Run the container and expose the REI port: docker run -p 9000:8080 <image-name>

  2. Invoke the lambda function locally: curl -XPOST "http://localhost:9000/2015-03-31/functions/function/invocations" -d '{"foo" : "bar"}'
    Expected behavior
    Syncflush should work as expected, even when the log subscription is not available (a transaction is flushed immediately after finishing).

@JonasKunz JonasKunz added the bug Something isn't working label Apr 20, 2023
@github-actions github-actions bot added the aws-λ-extension AWS Lambda Extension label Apr 20, 2023
@ArtyomGabeev
Copy link

ArtyomGabeev commented Jun 5, 2023

Hi, Im facing similar issue, when syncflush happens on next invocation, if Lambda SnapStart feature is enabled.

Not sure it helps, but in snapstart environment there is no log stream environment variable: AWS_LAMBDA_LOG_STREAM_NAME.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws-λ-extension AWS Lambda Extension bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants