Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ListPager.EachListItem util #75849

Merged
merged 1 commit into from
Apr 11, 2019

Conversation

jpbetz
Copy link
Contributor

@jpbetz jpbetz commented Mar 28, 2019

Introduce a ListPager.EachListItem convenience utility for incrementally processing chunked List results. This makes it easy for a client to only request list chunks from the apiserver that it can actually keep up with processing. EachListItem buffers up to PageBufferSize (default: 10) pages in the background to minimize foreground wait time.

Example usage:

podPager = pager.New(pager.SimplePageFunc(func(opts metav1.ListOptions) (runtime.Object, error) {                                                                                                                 
  return client.CoreV1().Pods().List(opts)                                                                                                                                                   
}))
podPager.EachListItem(ctx, metav1.ListOptions{}, func(obj runtime.Object) error {                                                                                                       
  if pod, ok := obj.(*v1.Pod); ok {
    doSomethingWithThePod(pod)
  } else { ...}
  ...
}

There are some warts with this approach that I'd like to find a way of smoothing over, namely: (1) having to creating the pager (2) having to cast each item to the correct type. But that should be possible to add via code generators in the future.

The name and function signature were picked to be consistent with the existing meta.EachListItem function.

Add ListPager.EachListItem utility function to client-go to enable incremental processing of chunked list responses

/kind feature
/priority important-longterm
/sig api-machinery

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Mar 28, 2019
@k8s-ci-robot k8s-ci-robot requested review from gmarek and liggitt March 28, 2019 23:28
@jpbetz
Copy link
Contributor Author

jpbetz commented Mar 28, 2019

@smarterclayton if you get a minute would you glance at the ListPager.EachListItem function I'm proposing here?

@jpbetz jpbetz changed the title Paginate node List in PodGC controller, add pager.EachListItem util Paginate node List in PodGC controller, add ListPager.EachListItem util Mar 28, 2019
@jpbetz jpbetz changed the title Paginate node List in PodGC controller, add ListPager.EachListItem util [WIP] Paginate node List in PodGC controller, add ListPager.EachListItem util Mar 29, 2019
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 29, 2019
@jpbetz jpbetz changed the title [WIP] Paginate node List in PodGC controller, add ListPager.EachListItem util Add ListPager.EachListItem util Apr 1, 2019
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Apr 1, 2019
@jpbetz jpbetz force-pushed the pagination-podgc branch from 3cdd565 to 5de2b33 Compare April 1, 2019 22:58
@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 1, 2019

/cc @jingyih

@k8s-ci-robot
Copy link
Contributor

@jpbetz: GitHub didn't allow me to request PR reviews from the following users: jingyih.

Note that only kubernetes members and repo collaborators can review this PR, and authors cannot review their own PRs.

In response to this:

/cc @jingyih

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@jpbetz jpbetz force-pushed the pagination-podgc branch 2 times, most recently from 295fa6a to 6708a75 Compare April 2, 2019 17:07
@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 2, 2019

/retest

1 similar comment
@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 2, 2019

/retest

@jpbetz jpbetz force-pushed the pagination-podgc branch from 6708a75 to bb6a508 Compare April 2, 2019 19:31
@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 2, 2019

/retest

@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 2, 2019

/cc @smarterclayton @jingyih @wojtek-t Looking for reviewers for this, would any of you have time to give it a look?

@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 4, 2019

Friendly nudge for review

Copy link
Contributor

@jingyih jingyih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I generally understand the approach. Added a few comments. Thanks!

@@ -115,3 +127,86 @@ func (p *ListPager) List(ctx context.Context, options metav1.ListOptions) (runti
options.Continue = m.GetContinue()
}
}

// EachListItem invokes fn on each runtime.Object in the list. Any error immediately terminates the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be helpful to add more details here.

Function description says it invokes fn on each object in the list. In function signature, there is no list. So it has to be generated (retrieved) internally in this function.

The listing inside this function is similar but not identical to (*ListPager) List(). In the sense that it does not fall back to full list on resource expire error. But I feel user may assume inside this function, listpager is using the same mechanism to list? You already mentioned a "Expired" error may be returned. I just feel a little more clarification on the internal listing mechanism might be helpful.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've rewritten this with more detail.

return nil
})
if err == stoppedErr {
err = nil // stoppedErr is an internal signal
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand what this line does, returning due to been signaled to stop is not actually an error.

Do we need to define an internal error? Can we use context cancel function to signal eachListChunk() to stop? Upon closing stopC, fn can return nil (line 202), but the loop inside eachListChunk() will be stopped by ctx.Done().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. A nested context works here. Since the context error should be return iff the caller cancels the context, reasoning through when/how cancel error get's handled is a bit subtle and I've added a comment explaining it. I've also added test coverage for cancelation, both for then a fn error results in an early exit and when the caller calls cancel.

bgResultC <- err
}()

for o := range chunkC {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If listing chunk in the background fails (such as due to resource expired), we will still finish processing the existing chunks in the buffer before return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think from an API contract perspective, when a error occurs listing chunks, i think it's valid to either (1) stop calling fn as soon as possible, or (2) call fn on as many items as have been successfully retrieved. I went with #1 since it was trivial to implement. I imagine in different situations a client might benefit more from one of these and in other situations the other. Thoughts?


stopC := make(chan struct{})
bgResultC := make(chan error, 1)
go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

defer runtime.HandleCrash() at the top of the goroutine?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I should do something with panics. I've gone with runtime.RecoverFromPanic here. Hopefully it's appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

runtime.RecoverFromPanic didn't work as I had expected (and from a quick grep, doesn't look like we use it in the k8s codebase), so I did a more direct recover.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why you can't use runtime.HandleCrash() - it's pretty widely used in the codebase and allows to use additional custom handler too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had initially misunderstood it to catch the panic and swallow it. But it does still crash. Updating the PR to use it now.

// PageBufferSize chunks of list items concurrently in the background. If the chunk attempt fails a "Expired"
// error may be returned.
func (p *ListPager) eachListChunkBuffered(ctx context.Context, options metav1.ListOptions, fn func(obj runtime.Object) error) error {
chunkC := make(chan runtime.Object, p.PageBufferSize)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error if p.PageBufferSize <= 0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, set it to < since 0 size buffer is supported (tests demo this).


// eachListChunkBuffered invokes fn on each runtimeObject list chunk. It buffers up to
// PageBufferSize chunks of list items concurrently in the background. If the chunk attempt fails a "Expired"
// error may be returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to document the return value ... I think if fn returns an error, processing stops and that error is returned. if fn does not return an error, any error encountered while fetching the list (including timeout or cancelled errors from the context) is returned

Copy link
Contributor Author

@jpbetz jpbetz Apr 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, yes, I've rewritten this with full details on errors.

@jpbetz jpbetz force-pushed the pagination-podgc branch from bb6a508 to b351760 Compare April 5, 2019 21:53
@jpbetz jpbetz force-pushed the pagination-podgc branch 2 times, most recently from 5079ed7 to 0c80995 Compare April 5, 2019 23:37
@jpbetz
Copy link
Contributor Author

jpbetz commented Apr 9, 2019

@jingyih @liggitt Feedback applied. PTAL!

@jpbetz jpbetz force-pushed the pagination-podgc branch from 0c80995 to 6a64ee6 Compare April 10, 2019 22:04
@liggitt
Copy link
Member

liggitt commented Apr 11, 2019

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Apr 11, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: jpbetz, liggitt

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 11, 2019
@k8s-ci-robot k8s-ci-robot merged commit d5dbc00 into kubernetes:master Apr 11, 2019
@@ -48,6 +50,9 @@ type ListPager struct {
PageFn ListPageFunc

FullListIfExpired bool

// Number of pages to buffer
PageBufferSize int32
Copy link
Contributor

@tedyu tedyu Apr 11, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if the buffer should be defined by the memory consumption instead of number of pages.

We can evaluate the size of a few chunks to dynamically determine the number of chunks to buffer, based on desired buffer size (in terms of memory).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. Maybe wait and see how this approach works out and then optimize as needed from there? We previously had quite a bit of code doing full lists, and so as we transition to paginated lists and this sort of incremental processing my expectation is we'll reduce memory usage, particularly for object kinds that have large counts. If we still hit scalability/performance limits once this is in use, that would seem like a good time to look into optimizing this further .

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants