Skip to content

Conversation

@turboFei
Copy link
Member

@turboFei turboFei commented Apr 15, 2025

Why are the changes needed?

  1. Audit the kubernetes resource event type.
  2. Fix the process logical for DELETE event.

Before this pr:

I tried to delete the POD manually, then I saw that, kyuubi thought the appState=PENDING.

:2025-04-15 13:58:20.320 INFO [-1077768163-pool-36-thread-7] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: eventType=DELETE	label=3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8	context=97	namespace=dls-prod	pod=kyuubi-spark-3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8-driver	podState=Pending	containers=[]	appId=spark-cd125bbd9fc84ffcae6d6b5d41d4d8ad	appState=PENDING	appError=''

It seems that, the pod status in the event is the snapshot before pod deleted.

Then we would not receive any event for this POD, and finally the batch FINISHED with application NOT_FOUND .

image

Seems we need to process the DELETE event specially.

  1. get the app state from the pod/container states
  2. if the applicationState got is terminated, return the applicationState directly
  3. otherwise, the applicationState should be FAILED, as the pod has been deleted.

How was this patch tested?

image

Was this patch authored or co-authored using generative AI tooling?

No.

@turboFei turboFei changed the title audit the pod event type Audit the kubernetes resource event type Apr 15, 2025
@turboFei turboFei self-assigned this Apr 15, 2025
@turboFei turboFei added this to the v1.10.2 milestone Apr 15, 2025
@turboFei turboFei changed the title Audit the kubernetes resource event type Audit the kubernetes pod event type Apr 15, 2025
@codecov-commenter
Copy link

codecov-commenter commented Apr 15, 2025

Codecov Report

Attention: Patch coverage is 0% with 28 lines in your changes missing coverage. Please review.

Project coverage is 0.00%. Comparing base (fa99183) to head (4e5695d).

Files with missing lines Patch % Lines
...kyuubi/engine/KubernetesApplicationOperation.scala 0.00% 24 Missing ⚠️
...uubi/engine/KubernetesApplicationAuditLogger.scala 0.00% 2 Missing ⚠️
...e/kyuubi/engine/KubernetesResourceEventTypes.scala 0.00% 2 Missing ⚠️
Additional details and impacted files
@@          Coverage Diff           @@
##           master   #7026   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         694     695    +1     
  Lines       42768   42786   +18     
  Branches     5821    5825    +4     
======================================
- Misses      42768   42786   +18     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@turboFei turboFei changed the title Audit the kubernetes pod event type Audit the kubernetes pod event type and fix DELETE event process logical Apr 15, 2025
Copy link
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sense. But this should not be covered by GA, manual test is required

@turboFei
Copy link
Member Author

Updated the integration testing:
image

@pan3793

@turboFei turboFei closed this in 82e1673 Apr 16, 2025
turboFei added a commit that referenced this pull request Apr 16, 2025
…nt process logical

### Why are the changes needed?

1. Audit the kubernetes resource event type.
2. Fix the process logical for DELETE event.

Before this pr:

I tried to delete the POD manually, then I saw that, kyuubi thought the `appState=PENDING`.
```
:2025-04-15 13:58:20.320 INFO [-1077768163-pool-36-thread-7] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: eventType=DELETE	label=3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8	context=97	namespace=dls-prod	pod=kyuubi-spark-3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8-driver	podState=Pending	containers=[]	appId=spark-cd125bbd9fc84ffcae6d6b5d41d4d8ad	appState=PENDING	appError=''
```

It seems that, the pod status in the event is the snapshot before pod deleted.

Then we would not receive any event for this POD, and finally the batch FINISHED with application `NOT_FOUND` .

<img width="1389" alt="image" src="https://github.com/user-attachments/assets/5df03db6-0924-4a58-9538-b196fbf87f32" />

Seems we need to process the DELETE event specially.

1. get the app state from the pod/container states
2. if the applicationState got is terminated, return the applicationState directly
3. otherwise, the applicationState should be FAILED, as the pod has been deleted.

### How was this patch tested?

<img width="1614" alt="image" src="https://github.com/user-attachments/assets/11e64c6f-ad53-4485-b8d2-a351bb23e8ca" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes #7026 from turboFei/k8s_audit.

Closes #7026

4e5695d [Wang, Fei] for delete
c167572 [Wang, Fei] audit the pod event type

Authored-by: Wang, Fei <[email protected]>
Signed-off-by: Wang, Fei <[email protected]>
(cherry picked from commit 82e1673)
Signed-off-by: Wang, Fei <[email protected]>
@turboFei
Copy link
Member Author

thanks, merged to 1.11.0 and 1.10.2

@turboFei turboFei deleted the k8s_audit branch April 16, 2025 05:37
turboFei added a commit to turboFei/kyuubi that referenced this pull request Aug 27, 2025
…TE event process logical

### Why are the changes needed?

1. Audit the kubernetes resource event type.
2. Fix the process logical for DELETE event.

Before this pr:

I tried to delete the POD manually, then I saw that, kyuubi thought the `appState=PENDING`.
```
:2025-04-15 13:58:20.320 INFO [-1077768163-pool-36-thread-7] org.apache.kyuubi.engine.KubernetesApplicationAuditLogger: eventType=DELETE	label=3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8	context=97	namespace=dls-prod	pod=kyuubi-spark-3c58e9fd-cf8c-4cc3-a9aa-82ae40e200d8-driver	podState=Pending	containers=[]	appId=spark-cd125bbd9fc84ffcae6d6b5d41d4d8ad	appState=PENDING	appError=''
```

It seems that, the pod status in the event is the snapshot before pod deleted.

Then we would not receive any event for this POD, and finally the batch FINISHED with application `NOT_FOUND` .

<img width="1389" alt="image" src="https://github.com/user-attachments/assets/5df03db6-0924-4a58-9538-b196fbf87f32" />

Seems we need to process the DELETE event specially.

1. get the app state from the pod/container states
2. if the applicationState got is terminated, return the applicationState directly
3. otherwise, the applicationState should be FAILED, as the pod has been deleted.

### How was this patch tested?

<img width="1614" alt="image" src="https://github.com/user-attachments/assets/11e64c6f-ad53-4485-b8d2-a351bb23e8ca" />

### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#7026 from turboFei/k8s_audit.

Closes apache#7026

4e5695d [Wang, Fei] for delete
c167572 [Wang, Fei] audit the pod event type

Authored-by: Wang, Fei <[email protected]>
Signed-off-by: Wang, Fei <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants