Skip to content

Conversation

@ahmarsuhail
Copy link
Collaborator

@ahmarsuhail ahmarsuhail commented Jun 2, 2025

Description of change

S3A PR: apache/hadoop#7723

Use executionAttributes to attach the correct span to each get request made for a stream.

S3A generates a span_id for each operation. For example, when a stream to an object is opened in S3A's executeOpen(), a span_id is generated for this operation. The executeOpen() operation is also given the name op_open.

This span_id is then used to log any S3 requests that happened for the operation in S3A's LoggingAuditor. For example, for an executeOpen(), there can be a HEAD request, and a GET request, and these show up in the logs as:

84ca7693-a471-4564-a6b9-70b6424548c4-00000009 Executing op_open with {action_http_head_request 'raw/2023/017/ohfh/OHFH017d.23_.gz' size=0, mutating=false}; https://audit.example.org/hadoop/1/op_open/84ca7693-a471-4564-a6b9-70b6424548c4-00000009/?op=op_open&p1=raw/2023/017/ohfh/OHFH017d.23_.gz&pr=ahmarsu&ps=77e8a486-2538-4e00-8f75-dfc1b6ded83d&id=84ca7693-a471-4564-a6b9-70b6424548c4-00000009&t0=34&fs=84ca7693-a471-4564-a6b9-70b6424548c4&t1=34&ts=1748877163303


84ca7693-a471-4564-a6b9-70b6424548c4-00000009 Executing op_open with {action_http_get_request 'raw/2023/017/ohfh/OHFH017d.23_.gz' size=8388607, mutating=false}; https://audit.example.org/hadoop/1/initialize/84ca7693-a471-4564-a6b9-70b6424548c4-00000008/?op=initialize&p1=noaa-cors-pds&pr=ahmarsu&ps=77e8a486-2538-4e00-8f75-dfc1b6ded83d&rg=5-8388612&id=84ca7693-a471-4564-a6b9-70b6424548c4-00000008&t0=15&fs=84ca7693-a471-4564-a6b9-70b6424548c4&t1=15&ts=1748877163292

The 84ca7693-a471-4564-a6b9-70b6424548c4-00000009 is the span-id. From the above logs, we can decipher exactly what S3 requests were made per operation, which helps in debugging.

Since the S3 GETs are now made in AAL, we must find a way to attach these back to the correct span and operation. This is done as follows:

  • S3A passes in the span_id and operation_name of the stream. This span_id is generated at stream creation time.
  • When AAL makes a GET or a HEAD request, it attaches these values to the request via ExecutionAttributes.

S3A's LoggingAuditor will then be able to access these in it's ExecutionInterceptor, and use it for it's logging and building the referrer header.

Relevant issues

Does this contribution introduce any breaking changes to the existing APIs or behaviors?

Does this contribution introduce any new public APIs or behaviors?

How was the contribution tested?

Does this contribution need a changelog entry?

  • I have updated the CHANGELOG or README if appropriate

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and I agree to the terms of the Developer Certificate of Origin (DCO).

public class StreamContext {
private final String spanId;
private final String operationName;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we be removing the referrer header builder support?

final String referrerHeader;
AwsRequestOverrideConfiguration.Builder requestOverrideConfigurationBuilder =
AwsRequestOverrideConfiguration.builder()
.putHeader(HEADER_REFERER, getRequest.getReferrer().toString())
Copy link
Contributor

@rajdchak rajdchak Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we have all the request related constants like HEADER_REFERER and HEADER_USER_AGENT also as part of the request attributes class?

public String modifyAndBuildReferrerHeader(GetRequest getRequestContext);
@Builder(access = AccessLevel.PUBLIC)
@Getter
public class StreamContext {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Should we rename this class to better reflect its auditing-specific purpose? like StreamAuditingContext

@ahmarsuhail ahmarsuhail temporarily deployed to integration-tests June 5, 2025 08:44 — with GitHub Actions Inactive
@ahmarsuhail ahmarsuhail merged commit a4ca8c8 into awslabs:main Jun 5, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants