Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sample-apiserver returns 200 with error marshalling protobuf response #50342

Closed
cheftako opened this issue Aug 8, 2017 · 13 comments · Fixed by #67041
Closed

sample-apiserver returns 200 with error marshalling protobuf response #50342

cheftako opened this issue Aug 8, 2017 · 13 comments · Fixed by #67041
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.

Comments

@cheftako
Copy link
Member

cheftako commented Aug 8, 2017

/kind bug

What happened:
Sent a request for an aggregated resource (sample-api wardle/flunders). Response was a 200 status code and an error message saying the response could not be given.

What you expected to happen:
We should have an error status code.

How to reproduce it (as minimally and precisely as possible):

  1. Set up sample-apiserver.
  2. curl -k -v -XGET -H "User-Agent: kubectl/v1.8.0 (linux/amd64) kubernetes/49ae996" -H "Accept: application/vnd.kubernetes.protobuf" -H "Authorization: Bearer " https:///apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders
  3. Response
    < HTTP/1.1 200 OK
    < Content-Length: 173
    < Content-Type: application/vnd.kubernetes.protobuf
    < Date: Tue, 08 Aug 2017 22:57:37 GMT
    <
    k8s
    �v1��Status���
  • Connection #0 to host 35.193.112.40 left intact
    ��Failure�~object *v1alpha1.FlunderList does not implement the protobuf marshalling interface and cannot be encoded to a protobuf message"0���"

Anything else we need to know?:

Environment:

  • Kubernetes version (use kubectl version): 1.8
  • Cloud provider or hardware configuration**: GCE
  • OS (e.g. from /etc/os-release): Debian GNU/Linux 7 (wheezy)
  • Kernel (e.g. uname -a): 3.16.0-4-amd64
  • Install tools:
  • Others:
@k8s-ci-robot k8s-ci-robot added the kind/bug Categorizes issue or PR as related to a bug. label Aug 8, 2017
@k8s-github-robot k8s-github-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 8, 2017
@cheftako
Copy link
Member Author

cheftako commented Aug 8, 2017

/sig api-machinery

@k8s-ci-robot k8s-ci-robot added the sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. label Aug 8, 2017
@k8s-github-robot k8s-github-robot removed the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Aug 8, 2017
@liggitt
Copy link
Member

liggitt commented Aug 9, 2017

I expect that's a shortcoming of the example server, not the aggregator itself

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2018
@nikhita
Copy link
Member

nikhita commented Jan 11, 2018

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 11, 2018
@sttts sttts added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Mar 1, 2018
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 30, 2018
@nikhita
Copy link
Member

nikhita commented Jun 1, 2018

/remove-lifecycle stale
/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 1, 2018
@tristanburgess
Copy link
Contributor

Hello, I'd like to take a look into this as my first contribution.

@cheftako
Copy link
Member Author

cheftako commented Aug 1, 2018

/assign @tristanburgess

@k8s-ci-robot
Copy link
Contributor

@cheftako: GitHub didn't allow me to assign the following users: tristanburgess.

Note that only kubernetes members and repo collaborators can be assigned.
For more information please see the contributor guide

In response to this:

/assign @tristanburgess

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tristanburgess
Copy link
Contributor

tristanburgess commented Aug 3, 2018

Looking into this so far, I'm able to reproduce the issue, and also that using -H "Accept: application/json" does serialize correctly:

~/go/src/k8s.io/sample-apiserver$ curl -k -v  -H "Accept: application/json" -H "Authorization: Bearer " http://localhost:8080/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders --output -
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.58.0
> Accept: application/json
> Authorization: Bearer 
> 
< HTTP/1.1 200 OK
< Content-Length: 673
< Content-Type: application/json
< Date: Fri, 03 Aug 2018 01:51:15 GMT
< 
{
  "kind": "FlunderList",
  "apiVersion": "wardle.k8s.io/v1alpha1",
  "metadata": {
    "selfLink": "/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders",
    "resourceVersion": "5"
  },
  "items": [
    {
      "metadata": {
        "name": "my-first-flunder",
        "namespace": "default",
        "selfLink": "/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders/my-first-flunder",
        "uid": "3286f86a-96bd-11e8-bad0-0242ac110003",
        "resourceVersion": "2",
        "creationTimestamp": "2018-08-03T01:33:17Z",
        "labels": {
          "sample-label": "true"
        }
      },
      "spec": {
        
      },
      "status": {}
    }
  ]
}

There are potentially two separate considerations out of this as far as I can tell (in addition to the original issue):

  1. sample-apiserver's example resource types do serialize JSON correctly, but does not appear implement protobuf serialization. I still need to confirm that it definitely isn't implemented (the error conditions at play may be too general to pick up a more specific error in play). There's a Fischer example resource type in addition to Flunder. Would be worth confirming the behavior of that type as well. If desired, I can also file a feature request to implement protobuf serialization for the example resource types.

  2. I will investigate separately if disabling the JSON serial/deserial impl has a similar effect as this bug, or a more desirable effect; either way, would be good to unify the two behaviors. Lumped in with this part of the investigation will be to figure out what's handling the protobuf unmarshalable error from apimachinery, and ultimately giving the 200.

In terms of the actual status code we want to return, would 406 Not Acceptable make sense for this?
https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/406
Looking at the doc above it seems to fit the situation that the requestor requests a response format that isn't servable. It looks like completely invalid Accept: formats to the API do serve a 406. Any thoughts?

curl -k -v  -H "Accept: application/invalid" -H "Authorization: Bearer " http://localhost:8080/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders --output -
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.58.0
> Accept: application/invalid
> Authorization: Bearer 
> 
< HTTP/1.1 406 Not Acceptable
< Content-Length: 182
< Content-Type: application/json
< Date: Fri, 03 Aug 2018 02:12:26 GMT
< 
{
  "kind": "Status",
  "apiVersion": "v1",
  "metadata": {},
  "status": "Failure",
  "message": "406: Not Acceptable",
  "reason": "NotAcceptable",
  "details": {},
  "code": 406
}

@tristanburgess
Copy link
Contributor

tristanburgess commented Aug 3, 2018

Pertaining to 2), JSON serialization appears to be baked in to the API as the default, so there really isn't a similar unmarshalable error for it. A JSON marshalization fault at the apimachinery level currently would prevent the apiserver from starting up successfully. Therefore, only have to worry about protobuf for this.

@tristanburgess
Copy link
Contributor

tristanburgess commented Aug 4, 2018

For 1), .proto definitions definitely weren't created for the extension API resource types. I generated them and confirmed that API response was successful after that. I'll file a separate ticket/PR for the addition to the sample APIs after this issue is fixed.

~/go/src/k8s.io/sample-apiserver$ curl -k -v  -H "Accept: application/vnd.kubernetes.protobuf" -H "Authorization: Bearer " http://localhost:8080/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders/my-first-flunder --output -
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to localhost (127.0.0.1) port 8080 (#0)
> GET /apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders/my-first-flunder HTTP/1.1
> Host: localhost:8080
> User-Agent: curl/7.58.0
> Accept: application/vnd.kubernetes.protobuf
> Authorization: Bearer 
> 
< HTTP/1.1 200 OK
< Content-Length: 236
< Content-Type: application/vnd.kubernetes.protobuf
< Date: Sat, 04 Aug 2018 00:13:30 GMT
< 
k8s
!
�wardle.k8s.io/v1alpha1�Flunder���
��
�my-first-flunder��default"I/apis/wardle.k8s.io/v1alpha1/namespaces/default/flunders/my-first-flunder*$314ac05b-977b-11e8-a525-0242ac1100032�28�ߓ��Z�

sample-label��truez�

@tristanburgess
Copy link
Contributor

tristanburgess commented Aug 5, 2018

I had a starting implementation that successfully returned a 406 on protobuf marshalling definition not implemented (errNotMarshalable), but on running unit tests I stumbled upon a larger problem.

There's an order of operations issue which is preventing http status code from being handled properly with both serialization encode success and serialization encode failure between apiserver and apimachinery.

http.ResponseWriter::WriteHeader(statusCode) can only be called once.
http.ResponseWriter::Write([]byte) will call WriteHeader(200) if WriteHeader(statusCode) is not called explicitly beforehand.

In k8s.io/apiserver/pkg/endpoints/responsewriters/writers::SerializeObject(), WriteHeader() is called with the statusCode passed in before calling Encode(),which generates a serialized response body with machinery in k8s.io/apimachinery/pkg/runtime/serializer.

The problem in SerializeObject() and child methods is if encoding fails, regardless of the failure reason. We have already called WriteHeader() before Encode(), so the Encode() error handler in SerializeObject(), errorJSONFatal(), has no net effect on the status code in the response, even though it generates an appropriate error and tries to call WriteHeader() with the new status code.

The specific case of encode failing can be resolved by calling WriteHeader() only after Encode(). If there's an error during encoding, no full Write() will occur such that the default 200 status code will be set in place. Thus, errorJSONFatal() would successfully write the new error code resultant from encode failure, which includes handling of the errNotMarshalable protobuf error enhancements I have added.

The catch is, this change would now cause successful encode to return the wrong status code, because our Encode(), and even Golangs built in encoding/json package (which is used for our JSON encoding), will both translate and Write() the data, which in turn will call WriteHeader(200), because we haven't called WriteHeader() ourselves yet.

The solution here is likely to move header modifications closer to Write() time, to allow for this differentiation between original statusCode and a potential statusCode resultant from an encode failure.

Two questions I'm currently weighing are:

  1. If the original status code is failure (4xx, 5xx), and encoding also fails, what's the right status code and response?

  2. If we move WriteHeader() closer to Write(), such that pre-Write() errors can be handled with the correct status, but Write() itself fails, we still have a problem where we've already called WriteHeader(), so we can't further change the status code. What's the correct behavior here? Possibly to return the original status code but generate a raw text response body instead.

cdkbot-zz pushed a commit to juju-solutions/kubernetes that referenced this issue Aug 18, 2018
…andling_refinement_for_serialization_encode

Automatic merge from submit-queue (batch tested with PRs 67041, 66948). If you want to cherry-pick this change to another branch, please follow the instructions <a href="https://github.com/kubernetes/community/blob/master/contributors/devel/cherry-picks.md">here</a>.

50342: Establish '406 Not Acceptable' response for protobuf serializa…

…tion 'errNotMarshalable'

     - Added metav1.Status() that enforces '406 Not Acceptable' response if
    protobuf serialization is not fully supported for the API resource type.
     - JSON and YAML serialization are supposed to be more completely baked
    in, so serialization involving those, and general errors with seralizing
    protobuf, will return '500 Internal Server Error'.
	- If serialization failure occurs and original HTTP status code is
    error, use the original status code, else use the serialization failure
    status code.
     - Write encoded API responses to intermediate buffer
     - Use apimachinery/runtime::Encode() instead of
    apimachinery/runtime/protocol::Encode() in
    apiserver/endpoints/handlers/responsewriters/writers::SerializeObject()
     - This allows for intended encoder error handling to fully work, facilitated by
    apiserver/endpoints/handlers/responsewriters/status::ErrorToAPIResponse() before officially
    writing to the http.ResponseWriter
     - The specific part that wasn't working by ErrorToAPIResponse() was the
    HTTP status code set. A direct call to
    http.ResponseWriter::WriteHeader(statusCode) was made in
    SerializeObject() with the original response status code, before
    performing the encode. Once this
    method is called, it can not again update the status code at a later
    time, with say, an erro status code due to encode failure.
     - Updated relevant apiserver unit test to reflect the new behavior
    (TestWriteJSONDecodeError())
     - Add build deps from make update for protobuf serializer



**What this PR does / why we need it**:
This PR fixes a bug that was blocking extensible error handling in the case that serializing response data fails, and implements a '406 Not Acceptable' status code response if protobuf marshal definitions are not implemented for an API resource type. See commit message for further details.

**Which issue(s) this PR fixes** *(optional, in `fixes #<issue number>(, fixes #<issue_number>, ...)` format, will close the issue(s) when PR gets merged)*:
Fixes kubernetes#50342 

**Special notes for your reviewer**:

**Release note**:

```release-note
Fixed a bug that was blocking extensible error handling when serializing API responses error out. Previously, serialization failures always resulted in the status code of the original response being returned. Now, the following behavior occurs:
   - If the serialization type is application/vnd.kubernetes.protobuf, and protobuf marshaling is not implemented for the requested API resource type, a '406 Not Acceptable is returned'.
   - If the serialization type is 'application/json':
        - If serialization fails, and the original status code was an failure (e.g. 4xx or 5xx), the original status code will be returned.
        - If serialization fails, and the original status code was not a failure (e.g. 2xx), the status code of the serialization failure will be returned. By default, this is '500 Internal Server Error', because JSON serialization is our default, and not supposed to be implemented on a type-by-type basis.

```
@liggitt liggitt changed the title Aggregator returns 200 with error marshalling protobuf response sample-apiserver returns 200 with error marshalling protobuf response Dec 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants