Skip to content

chore: Add component specification#8858

Merged
bruceg merged 19 commits intomasterfrom
component-specification
Aug 30, 2021
Merged

chore: Add component specification#8858
bruceg merged 19 commits intomasterfrom
component-specification

Conversation

@binarylogic
Copy link
Contributor

@binarylogic binarylogic commented Aug 24, 2021

Closes #8728
Closes #8709

This introduces the beginnings of a Vector component specification. The immediate need for this is to drive improvements around Vector's internal instrumentation. Issues like #8839, #8710, #8709, and #8693 all expose inconsistencies in Vector's internal instrumentation that presents problems for future work based on this data.

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
@netlify
Copy link

netlify bot commented Aug 24, 2021

✔️ Deploy Preview for vector-project canceled.

🔨 Explore the source changes: 8318f12

🔍 Inspect the deploy log: https://app.netlify.com/sites/vector-project/deploys/612967146c5b4b0008582c71

@bruceg bruceg self-requested a review August 24, 2021 16:07
Copy link
Member

@bruceg bruceg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems pretty minimal. I think it would be worth noting the standard tags logs and metrics are annotated with through the calling context. Should we also mandate counters on events/bytes out for transforms? There is also precedent for discarded events in some components. Some discussion of standardized attributes for logs would be useful as well (ex all error logs must have an error field, etc).

@binarylogic
Copy link
Contributor Author

@bruceg this is the very beginnings of the spec (hence the draft status). I wanted to get feedback on the structure before fleshing it out. I just wanted to make sure you were aware of that. I plan to add many more events and metrics to this spec.

Copy link
Collaborator

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good start to me. I like the direction it is heading.

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
@binarylogic binarylogic marked this pull request as ready for review August 25, 2021 17:45
@binarylogic
Copy link
Contributor Author

@jszwedko @bruceg this is ready for review. Once we generally align, I'd like to include the rest of the team. My goal was to set the foundation so that @bruceg could take over with his upcoming observability epic. Once this is agreed upon we will need to prioritize which components we want audit and fix.

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>
Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>

## Configuration

### Options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is still as stated on #8877 (comment):

  • endpoint(s) for when a URL is used to configure a remote destination that vector is communicating with
  • address for when a simple ip:port combo is used to configure a remote destination
  • bind for any time we are binding to an address

I think bind (or listen) makes it clearer that Vector is binding to the port. address seems, to me, more suitable for outgoing connections.

I think having endpoint separate and only representing URLs is also clearer that you need a full URL and not just a host:port.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What endpoint do we have that is totally separate from address?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What endpoint do we have that is totally separate from address?

Ah, sorry, I just mean the name should be distinct. That is we should call URLs endpoint in the config (like for the http sink). For components that send raw data to an ip:port I think address is a more suitable name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷 I'd favor endpoints for both, personally.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 I feel more strongly about using bind (or listen) for options where Vector binds the given value than endpoint vs. address. I do think the distinction is useful though, so users immediately know what is expected and are less likely to, for example, think that they should provide a URL scheme to an address value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I agree on the endpoint vs address distinction, I am still not a fan of either bind or listen. No existing component uses bind as the configured local address, which means all users configuring one of those components must change their config. Granted, we will be breaking metric names, but that doesn't make a component fail to work after upgrading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there will be a fight to the death over this naming

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've yanked the option under dispute so we can move forward on the rest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which option was yanked? I didn't see any new commits and it seems the same to me since last time I looked.

Granted, we will be breaking metric names, but that doesn't make a component fail to work after upgrading.

In both the rename to bind case and the metric name case, I was thinking we'd maintaining aliases for backwards compatibility.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I pushed but missed the error message saying my HEAD was not up to date. Fixed.

@binarylogic
Copy link
Contributor Author

@bruceg officially transitioning off of this. Can you handle the feedback and final changes?

@bruceg
Copy link
Member

bruceg commented Aug 25, 2021

Will do. 👍🏻

@bruceg
Copy link
Member

bruceg commented Aug 25, 2021

It looks like we need to come to an agreement on the naming of the counters. The internal pattern had been THING_in/out_total, which leads to names like bytes_in_total, while other systems (notably Prometheus, as indicated above) use THING_UNITS_total, which would lead to names like received_bytes_total. I would tend to prefer to maintain our current pattern, but the argument from an external precedent is strong, and has other benefits for predictability.

@binarylogic
Copy link
Contributor Author

@bruceg, I'll defer to you and @jszwedko to settle on naming. I'm indifferent. I just wanted to give context on why I chose the names I did.

binarylogic and others added 7 commits August 26, 2021 14:56
Co-authored-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Bruce Guenter <bruce@untroubled.org>
Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>
Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>
Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>
Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>
Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>
Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>
@bruceg bruceg force-pushed the component-specification branch from 2cc7123 to 7111bec Compare August 26, 2021 20:56
@bruceg
Copy link
Member

bruceg commented Aug 26, 2021

Hmm, @binarylogic the commit you did through GitHub is blocking the DCO check. Do you want to fix that (I tried to sign it myself, but it's not satisfied), or should I just squash it down?

@binarylogic
Copy link
Contributor Author

No need to enforce the DCO for employed Vector team members.

Copy link
Collaborator

@jszwedko jszwedko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, work. This is coming together and feeling more consistent. I still have that one outstanding question about the naming of the address/endpoint/bind fields 😅 , but otherwise I think this looks good.


## Configuration

### Options
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm still a fan of what I have above:

  • endpoint(s) for when a URL is used to configure a remote destination that vector is communicating with
  • address for when a simple ip:port combo is used to configure a remote destination
  • bind for any time we are binding to an address

I think this provides the most clarity in option names that will make it more self-evident what the options are for and what the syntax of the value should be without having to refer to docs.

@bruceg bruceg requested a review from jszwedko August 27, 2021 21:08
Copy link
Contributor

@spencergilbert spencergilbert left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with everything, and also agree with some consistent naming (address v endpoint) - with less personal care about what we land on there.

The discussion over this option was blocking moving forward on the rest
of this specification, which is more important than nailing down every
configuration option.

Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>
@bruceg bruceg merged commit c6c0ea7 into master Aug 30, 2021
@bruceg bruceg deleted the component-specification branch August 30, 2021 20:13
jdrouet pushed a commit that referenced this pull request Sep 6, 2021
* Kick off the specification

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>

* Add a new BytesReceived event

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>

* Better language for event implementation leeway

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>

* Final touches for specification foundation

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>

* Error logs

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>

* Comma separated

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>

* Finish scope section

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>

* Rename metric

Signed-off-by: Ben Johnson <bjohnson@binarylogic.com>

* Update endpoint option wording

Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>

* Update docs/specs/component.md

Co-authored-by: Jesse Szwedko <jesse@szwedko.me>
Signed-off-by: Bruce Guenter <bruce@untroubled.org>

* Wording tweaks

Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>

* Address feedback

Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>

* Address feedback

Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>

* Fix the other byte_size

Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>

* Fix counter naming pattern to `DESCRIPTOR_UNITS_total`

Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>

* Change endpoint wording and http_path

Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>

* Update docs/specs/component.md

* Remove the `address` configuration specification.

The discussion over this option was blocking moving forward on the rest
of this specification, which is more important than nailing down every
configuration option.

Signed-off-by: Bruce Guenter <bruce.guenter@datadoghq.com>

Co-authored-by: Bruce Guenter <bruce.guenter@datadoghq.com>
Co-authored-by: Jesse Szwedko <jesse@szwedko.me>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Create a Vector component specification Define what processed_bytes_total means for sources and sinks

4 participants