Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding Metrics / Logging #28

Open
baldawar opened this issue Aug 22, 2022 · 2 comments
Open

Adding Metrics / Logging #28

baldawar opened this issue Aug 22, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@baldawar
Copy link
Collaborator

What is your idea?

Follow up from #18. Added visibility around ruler internals would be helpful when debugging issues or building / updating the code. We want to do this without taking any unnecessary dependency (esp for logging / metrics, where clients can be very opinionated on what they prefer).

Would you be willing to make the change?

Additional context

I'll paste relevant github PR discussions as snippets as comments.

@baldawar baldawar added the enhancement New feature or request label Aug 22, 2022
@baldawar
Copy link
Collaborator Author

relevant from https://github.com/aws/event-ruler/pull/18/files#r933912038

longzhang-lz 22 days ago

suggest adding .size() method for ByteTransition, we shall monitor the size here to defense large size of transitions ...

Author\
 jonessha 20 days ago

How would we use this .size() method? Especially considering that Ruler does not log or record metrics.

Note that MachineComplexityEvaluator can be used to determine the maximum size possible, given a worst-case input value. Assuming you run your machines through that before accepting/persisting them, you already know and can limit how bad things can get.

 longzhang-lz 19 days ago -\
for example, we can define the max size and throw exception to fail the request loudly,\
that can work as the last resort protection from using up memory or cpu in case some extreme rules enter into system somehow

Author\
 jonessha 18 days ago

If that happens, there is a bug in how we're evaluating complexity and persisting machines upfront. I'd much prefer to make it impossible to persist an overly complex machine rather than allow customers to persist them, then unexpectedly fail for them on the critical data plane path. I have a more detailed plan outside of this CR for how to ensure machines cannot be persisted with high complexity.

 baldawar 13 days ago

It's still a handy way to collect metrics, gain visibility, and act as a backup when control-plane checks fail (say in case of a software bug). I can see it be valuable when someone tries to use Ruler.matchesRule(...) https://github.com/aws/event-ruler/blob/main/src/main/software/amazon/event/ruler/Ruler.java#L37

Though I'm happy to call it a future enhancement (of either exposing ruler size, or starting here's the max ruler size and eval time that I can support). Certainly been days in the past where I'd have appreciated it.

Author\
 jonessha 12 days ago

Fair enough. Although this would have a dependency on an enhancement to support logging or metrics. What do you think of Tim's suggestion to have enhanced APIs that take a PrintStream argument? Would be pretty simple. Given that, would it be sufficiently useful to log the max size encountered during each doTransitionOn? If so, I can create a pair of feature requests.

@baldawar
Copy link
Collaborator Author

relevant from #18 (comment)

### ![@longzhang-lz](https://avatars.githubusercontent.com/u/101074831?s=48&v=4) ** [longzhang-lz](https://github.com/longzhang-lz) ** [22 days ago](https://github.com/aws/event-ruler/pull/18#discussion_r934019279)

suggest to add here to avoid creating duplicated state transition against the same pattern:

    NameState res = findPattern(pattern);
    if(res != null) {
        return res;
    }

Author

### ![@jonessha](https://avatars.githubusercontent.com/u/108424630?s=48&v=4) ** [jonessha](https://github.com/jonessha) ** [19 days ago](https://github.com/aws/event-ruler/pull/18#discussion_r937156093)

This would avoid duplicates for identical patterns, but wouldn't help in the case of almost-identical patterns that only differ trivially at the end of the pattern. Since the addPattern code aims to re-use states whenever possible even for almost-identical patterns, this also avoids duplicates without adding the findPattern check. If we find any cases not covered by my re-use code, I'd like to make sure the underlying state re-use bug gets fixed. Adding a findPattern call, in addition to a small performance hit, may make it harder to notice the state re-use bug.

### ![@longzhang-lz](https://avatars.githubusercontent.com/u/101074831?s=48&v=4) ** [longzhang-lz](https://github.com/longzhang-lz) ** [19 days ago](https://github.com/aws/event-ruler/pull/18#discussion_r937331466)

agree, that is a mitigation approach for your reference if we are not sure we can cover all reusable use cases, we can also add metric and log to ensure adding the duplicated pattern will not write anything.\
at least we need ensure not re-write the pattern firstly, partial re-write is not to enumerate all cases easily, so may take some time to discover and fix.

Author

### ![@jonessha](https://avatars.githubusercontent.com/u/108424630?s=48&v=4) ** [jonessha](https://github.com/jonessha) ** [18 days ago](https://github.com/aws/event-ruler/pull/18#discussion_r937960885)

Good call on adding metrics/logs. Right now, Ruler doesn't have any metrics/logs. On one hand, it would be useful to add some. On the other hand, it would increase the dependency tree.

### ![@timbray](https://avatars.githubusercontent.com/u/17555?s=48&v=4) ** [timbray](https://github.com/timbray) ** [18 days ago](https://github.com/aws/event-ruler/pull/18#discussion_r937965891)

I am very negative about adding any more dependencies - every dependency an open-source package brings decreases adoption. Could we imagine having enhanced API methods with a PrintStream argument - if provided, Ruler could log some basics to that PrintStream?

Author

### ![@jonessha](https://avatars.githubusercontent.com/u/108424630?s=48&v=4) ** [jonessha](https://github.com/jonessha) ** [18 days ago](https://github.com/aws/event-ruler/pull/18#discussion_r937979916)

That feels like the way to go. Would make a nice future improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant