[DSM] - Set DSM checkpoints for S3 put / get operations #6859

kr-igor · 2024-03-29T18:18:43Z

What Does This Do

Adding DSM checkpoints for S3 put/get operations
Adding request/response content length to span tags (only if DSM is enabled)
Introduces concept of datasources for DSM checkpoints inline with OpenLineage spec (ds.name / ds.namespace).
Changes the way DSM stats are aggregated. Instead of hash-based aggregation stats are now aggregated per data source (datasource hash is generated using checkpoint hash as a base).

Motivation

Setting "data source" checkpoints unlocks end-to-end pipeline lineage. For instance, this particular change allows visualizing data flow between streaming and batch processing. The same mechanism can be used to support more types of data sources.

Additional Notes

Jira ticket

kr-igor · 2024-04-02T20:20:32Z

...s-java-sdk-2.2/src/main/java/datadog/trace/instrumentation/aws/v2/AwsSdkClientDecorator.java

+
+        if (key != null && bucket != null && awsOperation != null) {
+          if ("GetObject".equalsIgnoreCase(awsOperation.toString())) {
+            LinkedHashMap<String, String> sortedTags = new LinkedHashMap<>();


LinkedHashMap can be replaced with a fixed size array. This will be done in a followup PR in Q2.

I think a simple struct-like object would be better than an array.

dougqh · 2024-04-03T16:50:03Z

...ava-sdk-1.11.0/src/main/java/datadog/trace/instrumentation/aws/v0/AwsSdkClientDecorator.java

@@ -88,6 +92,9 @@ public AgentSpan onRequest(final AgentSpan span, final Request request) {
    CharSequence awsRequestName = AwsNameCache.getQualifiedName(request);

    span.setResourceName(awsRequestName, RPC_COMMAND_NAME);
+    if ("Amazon S3".equalsIgnoreCase(awsServiceName) && span.traceConfig().isDataStreamsEnabled()) {


Basing this on service name seems brittle to me, it would make more sense to base this check on the raw information -- e.g. the type of Request.

I've switched to "s3" service name for both versions of AWS SDK. I didn't use the request type to avoid adding new dependencies.

dougqh · 2024-04-03T16:53:17Z

...ava-sdk-1.11.0/src/main/java/datadog/trace/instrumentation/aws/v0/AwsSdkClientDecorator.java

+
+      Object key = span.getTag(InstrumentationTags.AWS_OBJECT_KEY);
+      Object bucket = span.getTag(InstrumentationTags.AWS_BUCKET_NAME);
+      Object awsOperation = span.getTag(InstrumentationTags.AWS_OPERATION);


There are a lot or repetitive toString calls here.
If these are Strings or CharSequences, I would type check and cast them sooner.

Or, just create variables to hold the String values immediately after calling getTag.
Something like the line below would be fine...
String keyStr = (key == nul) ? null : key.toString()

dougqh · 2024-04-03T16:54:08Z

...ava-sdk-1.11.0/src/main/java/datadog/trace/instrumentation/aws/v0/AwsSdkClientDecorator.java

+            payloadSize = (long) requestSize;
+          }
+
+          LinkedHashMap<String, String> sortedTags = new LinkedHashMap<>();


I would like to see DSM replace the use of LinkedHashMap with a simple struct-like object or builder.
LinkedHashMap is creating a lot of unnecessary allocation.

dougqh · 2024-04-03T16:54:45Z

...s-java-sdk-2.2/src/main/java/datadog/trace/instrumentation/aws/v2/AwsSdkClientDecorator.java

    final String awsServiceName = attributes.getAttribute(SdkExecutionAttribute.SERVICE_NAME);
    final String awsOperationName = attributes.getAttribute(SdkExecutionAttribute.OPERATION_NAME);
    onOperation(span, awsServiceName, awsOperationName);

    // S3
    request.getValueForField("Bucket", String.class).ifPresent(name -> setBucketName(span, name));
+    if ("s3".equalsIgnoreCase(awsServiceName) && span.traceConfig().isDataStreamsEnabled()) {


Why is service name different here?

SDK v1 and V2 have different naming for services. I'll check if this can be unified.

Both implementations now use 's3'

dougqh · 2024-04-03T16:56:54Z

dd-trace-core/src/main/java/datadog/trace/core/datastreams/DefaultPathwayContext.java

+    private final StringBuilder builder;
+
+    public DataSetHashBuilder() {
+      builder = new StringBuilder();


Builder should be given an initial size if possible.

dougqh · 2024-04-03T16:57:55Z

dd-trace-core/src/main/java/datadog/trace/core/datastreams/DefaultPathwayContext.java

+
+    public long generateDataSourceHash(long parentHash) {
+      builder.append(parentHash);
+      return FNV64Hash.generateHash(builder.toString(), FNV64Hash.Version.v1);


It seems like the builder & string aren't actually needed for the hash calculation.
I presume the FNV64Hash computation could be done in a streaming fashion as the tags are added.

Makes sense, I'll update the code.

dougqh

I think there's some opportunity for clean-up and optimization.

kr-igor added 30 commits March 6, 2024 16:24

Added some debug logs

e4ae031

Added more logs

223488b

Changed debug logs

c615afc

Restructured logs

4d387f5

Restructured logs

2392930

Removed DSM checkpoints, restructured logs

ddf8975

Log stack trace + span tags

661abc2

Added some more debug logs

f1e49d8

Added content length tags and aws.object.key

0b97d6d

Use dataset hash per stats group

f6b5994

Added some debug code

cc731f5

Added even more logs

d58758c

Added extra debug logs, just in case

d32bdc9

Added DSM checkpoints

3ba4042

Fixed naming for aws services

8a13dad

Printing out headers for debug

6370eab

Added extra filters and logs for SDK v1

d697593

Added some null checks

3136c4c

Restructured debug logs

cf474d8

Added some debug info on spark spans

7db3cda

Added some debug info on spark spans

240c494

Added more spark logs

492a7ae

Added debug logs for kstreams topology builder

10904df

Updated Kafka streams supported versions

eee7ce9

Streaming context improvements

d0726a8

Added more debug logs for KStreams

d5515ff

Merge branch 'master' into kr-igor/dsm-s3-checkpoints

41496f3

Reverted some debug code from Kafka integration

87acc4c

Removed debug code from Spark integration

d7667fe

Spotless apply

1344ffe

kr-igor added 11 commits March 29, 2024 14:53

Removed some comments

87d9b8b

Fixed DSM tests

b6df085

Merge branch 'master' into kr-igor/dsm-s3-checkpoints

1c47b16

Fixed one more test

357f27c

Merge branch 'master' into kr-igor/dsm-s3-checkpoints

b5466bc

Temporary excluded play instrumentation

1c79f46

Added set checkpoint log

b55e0c8

Added some new tests

6e6ade0

Removed debug logs

fdee08a

Merge branch 'master' into kr-igor/dsm-s3-checkpoints

d523853

Removed debug code

652c6c1

kr-igor marked this pull request as ready for review April 2, 2024 20:16

kr-igor requested a review from a team as a code owner April 2, 2024 20:16

kr-igor requested review from dougqh and nayeem-kamal April 2, 2024 20:16

kr-igor commented Apr 2, 2024

View reviewed changes

dougqh reviewed Apr 3, 2024

View reviewed changes

dougqh approved these changes Apr 3, 2024

View reviewed changes

kr-igor added 2 commits April 4, 2024 13:51

Cleanup + some debug logs to check request types

df089d8

Final cleanup

12e83df

kr-igor requested a review from dougqh April 5, 2024 14:38

kr-igor merged commit 487e069 into master Apr 19, 2024
79 checks passed

kr-igor deleted the kr-igor/dsm-s3-checkpoints branch April 19, 2024 17:30

github-actions bot added this to the 1.34.0 milestone Apr 19, 2024

paul-laffon-dd mentioned this pull request May 13, 2024

Add DSM checkpoint for S3 calls in aws-sdk-1 #6812

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DSM] - Set DSM checkpoints for S3 put / get operations #6859

[DSM] - Set DSM checkpoints for S3 put / get operations #6859

kr-igor commented Mar 29, 2024 •

edited

Loading

kr-igor Apr 2, 2024

dougqh Apr 3, 2024

dougqh Apr 3, 2024

kr-igor Apr 5, 2024

dougqh Apr 3, 2024

dougqh Apr 3, 2024

dougqh Apr 3, 2024

kr-igor Apr 4, 2024

kr-igor Apr 5, 2024

dougqh Apr 3, 2024

dougqh Apr 3, 2024

kr-igor Apr 4, 2024

dougqh left a comment

[DSM] - Set DSM checkpoints for S3 put / get operations #6859

[DSM] - Set DSM checkpoints for S3 put / get operations #6859

Conversation

kr-igor commented Mar 29, 2024 • edited Loading

What Does This Do

Motivation

Additional Notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dougqh left a comment

Choose a reason for hiding this comment

kr-igor commented Mar 29, 2024 •

edited

Loading