Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Report snowplow-specific metrics #178

Open
colmsnowplow opened this issue Jul 26, 2022 · 2 comments
Open

Report snowplow-specific metrics #178

colmsnowplow opened this issue Jul 26, 2022 · 2 comments

Comments

@colmsnowplow
Copy link
Collaborator

colmsnowplow commented Jul 26, 2022

Even though the app is data-agnostic, there's a good argument that we should still report snowplow-specific metrics. Our usage of the app is snowplow-specific, and reporting latency from collector to target is valuable.

I think we should consider how to fit this into the design and see if we can accommodate it. Perhaps some setting that specifies that it's Snowplow data and grabs collector tstamp for metrics reporting purposes.

@colmsnowplow colmsnowplow mentioned this issue Jul 26, 2022
14 tasks
@jbeemster
Copy link
Member

You could try and infer data-input type possibly - perhaps with some pattern matching. I think you could have three different Snowplow inputs generally:

  • raw: thrift decoder required
  • enriched: you already can parse this with analytics SDK
  • bad: JSON -> would need a decoder that would let you pull the correct value
  • other: default to timestamp of the record on the stream (what we do currently as far as I can remember)

One question from me though is what would be the cost implications of parsing every inbound event to extract the timestamp?

@colmsnowplow
Copy link
Collaborator Author

One question from me though is what would be the cost implications of parsing every inbound event to extract the timestamp?

I have the same concern - hopefully it'd be minimal since the analytics we constructed the analytics SDK in such a way as we can retrieve individual fields without processing the entire event. (Filters operate this way and are relatively efficient). But yes I'd want to keep an eye on it.

You could try and infer data-input type possibly - perhaps with some pattern matching. I think you could have three different Snowplow inputs generally:

raw: thrift decoder required
enriched: you already can parse this with analytics SDK
bad: JSON -> would need a decoder that would let you pull the correct value
other: default to timestamp of the record on the stream (what we do currently as far as I can remember)

Decoding thrift for the sake of grabbing the collector tstamp seems like overkill. And we don't have a use case for stream replicator-ing bad data at the moment. So my suggestion here would be to worry about enriched, and wait for requirements for other formats to surface themselves if they exist.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants