-
Notifications
You must be signed in to change notification settings - Fork 208
Add apistore for caching the latest events #1429
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The following ISSUES will be created once got merged. If you want me to skip creating the issue, you can use Details1. Use UTC and let control-plane convert it into the control-plane's Local timeThis was created by todo plugin since "TODO:" was found in ab31e9a when #1429 was merged. cc: @nakabonne. |
| // TODO: Use UTC and let control-plane convert it into the control-plane's Local time | ||
| // Unexpected behavior can be happened if Piped uses different timezone from the control-plane. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look into this concern when working on another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the timezone difference is not a problem, because we are using Unix time (the number of seconds elapsed since January 1, 1970 UTC.).
But the clock drift may cause the problem. But that is rare.
Btw, maybe we can leave the to parameter as zero to use time.Now of the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the timezone difference is not a problem, because we are using Unix time (the number of seconds elapsed since January 1, 1970 UTC.).
I put off thinking about it but after thinking, it's a basic thing and no problem as you mentiond.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, maybe we can leave the to parameter as zero to use time.Now of the server.
Cool, I'll try it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Btw, maybe we can leave the to parameter as zero to use time.Now of the server.
After thinking, it makes derivating the next's from tougher. As you mentioned, clock drifts is reraly happened. I guess it's enough to be as-is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not tougher, I came up with a better solution just now. I'm going to make From and To optional.
|
Code coverage for golang is
|
pkg/model/event.go
Outdated
| // EventDefinitionID builds a unique identifier based on the given name and labels. | ||
| // It returns the exact same string as long as both are the same. | ||
| func EventDefinitionID(name string, labels map[string]string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about what we should call something whose uniqueness is consists of its Name and Labels. I just settled on calling it Event Definition.
An Event is like an Instance In Object-oriented world, and an event definition is like a Class. Each Event has its own ID and at the same time, each Event definition has its own ID. The event definition ID must be the exact same string as long as Name and Labels are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just figured out it would be nice if we build this when saving and keep it as a field of Event.
pkg/model/event.go
Outdated
|
|
||
| // EventDefinitionID builds a unique identifier based on the given name and labels. | ||
| // It returns the exact same string as long as both are the same. | ||
| func EventDefinitionID(name string, labels map[string]string) string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be used only by EventStore right. So I think it should be moved to that package as an internal function.
So because of an internal function, we will not have to care about its name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually it will be used by #1406. It could be increased the need to build a unique identity made of Name and Labels, that's why I decided to make a helper function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Now we are having our cache/store for the event. So maybe we don't need to merge them for simplicity.
The user can merge by changing their configuration if needed.
We implicitly merge them and combine their changes into a single commit that may cause the behavior they were not expected.
(I am not sure that I understood the merging correctly.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(I am not sure that I understood the merging correctly.)
Your understanding is correct. I attempted to merge it to combine their changes into a single commit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I agree with all of them. So let's this function private and think it if needed in the future!
| return nil, false | ||
| } | ||
|
|
||
| // NOTE: Don't update the cache to prevent it from being overwritten by slightly older ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we don't update the in-memory map?
The event has CreatedAt field, so we can use that to check whether the event should be updated or not.
And I think in the microservices list of our users, there are always some applications that have not been actively developed compared to others. Their events will be very sparse. So without saving into the list, piped will always make the call to control-plane to get the same one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sync function periodically updates the cache, and it works in another goroutine. What I'm most worried about is, while processing GetLatest, sync can update the cache. I mean:
// NOTE: Don't update the cache to prevent it from being overwritten by slightly older ones.
eventMap = s.latestEventMap.Load().(map[string]*model.Event)
eventMap[id] = resp.Event // I'm most worried about the "sync()" function updates the cache at this point.
s.latestEventMap.Store(eventMap) // Then the latest cache is overwritten by the old one here.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If Piped was run on the multi-core processor, I think it can happen, but let me know if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The data race can be resolved in some ways:
- change from using atomic to using a mutex. (fetch the list from control-plane, lock, and update). I think it is fine enough for our case: 1 call/m and EventWatcher.
- use 2 separate maps, one for LIST API, one for GET API
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, not fancy 😄 The first one looks enough! Only two goroutines can access this and it rarely happens to wait for unlocking. Thank you for telling me!
| NONE = 0; | ||
| FROM_OLDEST = 1; | ||
| FROM_NEWEST = 2; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about ListOrder enum that contains NONE, DESC, ASC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds good, that's more versatile. I initially thought about making it but I wasn't sure which should be general, CreatedAt or UpdatedAt, that's why I made a special enum for now. but what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to make ListOrder. It can be used by other RPCs too.
Let's do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fine, I'll do so!
| func (s *store) GetLatest(ctx context.Context, name string, labels map[string]string) (*model.Event, bool) { | ||
| eventMap := s.latestEventMap.Load().(map[string]*model.Event) | ||
| id := model.EventDefinitionID(name, labels) | ||
| event, ok := eventMap[id] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I want to ask about our event matching, in case we have an event
name = foo
labels: { label-1: value-1, label-2: value-2}
That event will match the following case or not?
piVersion: pipecd.dev/v1beta1
kind: EventWatcher
spec:
events:
- name: foo
labels:
label-1: value-1
replacements:
- file: dev/app1/deployment.yaml
yamlField: $.spec.template.spec.containers[0].image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or all labels must be specified to be matched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally assume exact matches are needed, but on second thought, seems like that case should be treated as a matching case. Let me think, just a moment please...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@nghialv I'm still debating but which one do you prefer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, the event should be matched when the name is equal AND the event's labels contain the specified ones.
(Because they are labels (not ID) and users will have more options.)
So I think we can change the map to a new data structure like map[event-name]ListOfEventsHaveSameName.
But one thing we have to consider is what happens when multiple events match the specified label list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But one thing we have to consider is what happens when multiple events match the specified label list.
So in order to avoid this complexity and avoid adding the dynamic label by users (e.g. version number),
I think we should not allow subset matching.
Name and all labels must be matched.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My feeling is exactly the same as yours. Some people can be helped by subset matching, but too complicated to handle for us. Thank you for a great opportunity to re-think it and letting me know your cool thought!
**What this PR does / why we need it**: Just removed a TODO comment. We just settled that we don't have to merge events when committing, on this discussion: #1429 (comment) **Which issue(s) this PR fixes**: Fixes #1406 **Does this PR introduce a user-facing change?**: <!-- If no, just write "NONE" in the release-note block below. --> ```release-note NONE ``` This PR was merged by Kapetanios.
|
/hold |
|
Code coverage for golang is
|
|
/hold cancel |
|
Code coverage for golang is
|
| // Make the cache up-to-date by traversing events sorted by oldest first. | ||
| var latestTime int64 | ||
| s.mu.Lock() | ||
| for _, event := range resp.Events { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resp.Events may contain multiple events for the same key/definition-id, so I think we should build their key/definition-id and filter duplicates before locking the mutex to update the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I initially attempted to do so, but I didn't. That's necessary to keep the time to lock mutex as short as possible. But don't we really have to care about that performance? I guess we should prioritize to keep simplicity than performance at this time, that's why I settled on just sorting by oldest first and traversing all events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But don't we really have to care about that performance?
Actually, I didn't want to say that we don't really care about the performance.
What I wanted to say before is that the complexity of the concurrency model is not worth the performance it brings.
But calculating the ID and filtering the duplicates before locking can be done easily. So it's worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're quite right. Adding filtering is not complicated enough to worry about complexity.
| s.mu.Lock() | ||
| for _, event := range resp.Events { | ||
| id := eventDefinitionID(event.Name, event.Labels) | ||
| s.latestEventMap[id] = event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are doing asynchronous operations so this event can be older than the one stored in the map.
Let's check the CreatedAt before overriding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the beginning of the loop, it is more likely to happen as you mentioned. But eventually, all Events on the cache will be up-to-date because the list of events is sorted by oldest first. I'd say it's enough but what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I could get what you're most worried about. Okay, let me fix them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The list is updated by GetLatest too.
| } | ||
|
|
||
| s.mu.Lock() | ||
| s.latestEventMap[id] = resp.Event |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, check CreatedAt too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this case, I totally agree with you 👍
|
Thank you for your polite review. I've fixed all of them. |
|
Code coverage for golang is
|
| if err != nil { | ||
| return fmt.Errorf("failed to list events: %w", err) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Return fast if no events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, the previous one could override the milestone to zero. I appreciate you.
| } | ||
|
|
||
| // Eliminate events that have duplicated key. | ||
| var latestTime int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: latestTime := resp.Events[len(resp.Events)-1].CreatedAt
| s.mu.Unlock() | ||
|
|
||
| // Set the latest one within the result as the next time's "from". | ||
| s.milestone = latestTime + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: s.milestone := resp.Events[len(resp.Events)-1].CreatedAt + 1
|
Nice. Thank you. |
|
Fixed done. |
|
Code coverage for golang is
|
|
Nice. |
|
🚀 |
What this PR does / why we need it:
This PR adds a new
apistorefor Events according to @nghialv 's suggestion. It allows us to reduce communication number by caching the latest events.This will be used by #1411 at first.
Which issue(s) this PR fixes:
Ref #1381
Does this PR introduce a user-facing change?: