-
Notifications
You must be signed in to change notification settings - Fork 976
Audit the kubernetes pod event type and fix DELETE event process logical #7026
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7026 +/- ##
======================================
Coverage 0.00% 0.00%
======================================
Files 694 695 +1
Lines 42768 42786 +18
Branches 5821 5825 +4
======================================
- Misses 42768 42786 +18 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
pan3793
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense. But this should not be covered by GA, manual test is required
…nt process logical ### Why are the changes needed? 1. Audit the kubernetes resource event type. 2. Fix the process logical for DELETE event. Before this pr: I tried to delete the POD manually, then I saw that, kyuubi thought the `appState=PENDING`. ``` :2025-04-15 13:58:20.320 INFO [-1077768163-pool-36-thread-7] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: eventType=DELETE label=3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8 context=97 namespace=dls-prod pod=kyuubi-spark-3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8-driver podState=Pending containers=[] appId=spark-cd125bbd9fc84ffcae6d6b5d41d4d8ad appState=PENDING appError='' ``` It seems that, the pod status in the event is the snapshot before pod deleted. Then we would not receive any event for this POD, and finally the batch FINISHED with application `NOT_FOUND` . <img width="1389" alt="image" src="https://github.com/user-attachments/assets/5df03db6-0924-4a58-9538-b196fbf87f32" /> Seems we need to process the DELETE event specially. 1. get the app state from the pod/container states 2. if the applicationState got is terminated, return the applicationState directly 3. otherwise, the applicationState should be FAILED, as the pod has been deleted. ### How was this patch tested? <img width="1614" alt="image" src="https://github.com/user-attachments/assets/11e64c6f-ad53-4485-b8d2-a351bb23e8ca" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes #7026 from turboFei/k8s_audit. Closes #7026 4e5695d [Wang, Fei] for delete c167572 [Wang, Fei] audit the pod event type Authored-by: Wang, Fei <[email protected]> Signed-off-by: Wang, Fei <[email protected]> (cherry picked from commit 82e1673) Signed-off-by: Wang, Fei <[email protected]>
|
thanks, merged to 1.11.0 and 1.10.2 |
…TE event process logical ### Why are the changes needed? 1. Audit the kubernetes resource event type. 2. Fix the process logical for DELETE event. Before this pr: I tried to delete the POD manually, then I saw that, kyuubi thought the `appState=PENDING`. ``` :2025-04-15 13:58:20.320 INFO [-1077768163-pool-36-thread-7] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: eventType=DELETE label=3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8 context=97 namespace=dls-prod pod=kyuubi-spark-3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8-driver podState=Pending containers=[] appId=spark-cd125bbd9fc84ffcae6d6b5d41d4d8ad appState=PENDING appError='' ``` It seems that, the pod status in the event is the snapshot before pod deleted. Then we would not receive any event for this POD, and finally the batch FINISHED with application `NOT_FOUND` . <img width="1389" alt="image" src="https://github.com/user-attachments/assets/5df03db6-0924-4a58-9538-b196fbf87f32" /> Seems we need to process the DELETE event specially. 1. get the app state from the pod/container states 2. if the applicationState got is terminated, return the applicationState directly 3. otherwise, the applicationState should be FAILED, as the pod has been deleted. ### How was this patch tested? <img width="1614" alt="image" src="https://github.com/user-attachments/assets/11e64c6f-ad53-4485-b8d2-a351bb23e8ca" /> ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#7026 from turboFei/k8s_audit. Closes apache#7026 4e5695d [Wang, Fei] for delete c167572 [Wang, Fei] audit the pod event type Authored-by: Wang, Fei <[email protected]> Signed-off-by: Wang, Fei <[email protected]>

Why are the changes needed?
Before this pr:
I tried to delete the POD manually, then I saw that, kyuubi thought the
appState=PENDING.It seems that, the pod status in the event is the snapshot before pod deleted.
Then we would not receive any event for this POD, and finally the batch FINISHED with application
NOT_FOUND.Seems we need to process the DELETE event specially.
How was this patch tested?
Was this patch authored or co-authored using generative AI tooling?
No.