Skip to content

A repro for issue #796#797

Closed
basvandijk wants to merge 13 commits intobrendanhay:mainfrom
basvandijk:put-to-minio-malformed-xml
Closed

A repro for issue #796#797
basvandijk wants to merge 13 commits intobrendanhay:mainfrom
basvandijk:put-to-minio-malformed-xml

Conversation

@basvandijk
Copy link
Contributor

@basvandijk basvandijk commented Jul 9, 2022

This adds a reproducible test case for issue #796 "PutObject request failing on MinIO because of malformed XML" by adding an integration test checking amazonka-s3 against MinIO. To execute the test just run the following:

$ nix-build tests/s3.nix
...
s3: must succeed:
    echo 'Hello World!' > ./some-file
    export AWS_ACCESS_KEY_ID="BKIKJAA5BMMU2RHO6IBB"
    export AWS_SECRET_ACCESS_KEY="V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12"
    amazonka-s3-test-app ./some-file http://s3:9000 my-bucket some-file

s3 # > > > > > [   11.524753] minio[818]: {
  "deploymentid": "0de4a2b3-eaf1-46a0-8ec7-2e706e68d3b4",
  "level": "ERROR",
  "errKind": "MINIO",
  "time": "2022-07-09T18:07:28.335404358Z",
  "api": {
    "name": "SYSTEM",
    "args": {}
  },
  "error": {
    "message": "EOF (*errors.errorString)",
    "source": [
      "cmd/handler-utils.go:55:cmd.parseLocationConstraint()",
      "cmd/auth-handler.go:344:cmd.checkRequestAuthTypeCredential()",
      "cmd/auth-handler.go:296:cmd.checkRequestAuthType()",
      "cmd/bucket-handlers.go:719:cmd.objectAPIHandlers.PutBucketHandler()",
      "net/http/server.go:2047:http.HandlerFunc.ServeHTTP()"
    ]
  }
}
s3 # amazonka-s3-test-app:   ServiceError
    ( ServiceError'
        { _serviceErrorAbbrev = Abbrev {fromAbbrev = "S3"},
          _serviceErrorStatus = Status {statusCode = 400, statusMessage = "Bad Request"},
          _serviceErrorHeaders =
            [ ("Accept-Ranges", "bytes"),
              ("Content-Length", "371"),
              ("Content-Security-Policy", "block-all-mixed-content"),
              ("Content-Type", "application/xml"),
              ("Server", "MinIO"),
              ("Strict-Transport-Security", "max-age=31536000; includeSubDomains"),
              ("Vary", "Origin"),
              ("Vary", "Accept-Encoding"),
              ("X-Amz-Bucket-Region", "us-east-1"),
              ("X-Amz-Request-Id", "17003B77BD220CF2"),
              ("X-Content-Type-Options", "nosniff"),
              ("X-Xss-Protection", "1; mode=block"),
              ("Date", "Sat, 09 Jul 2022 18:07:28 GMT")
            ],
          _serviceErrorCode = ErrorCode "MalformedXML",
          _serviceErrorMessage =
            Just
              ( ErrorMessage
                  { fromErrorMessage = "The XML you provided was not well-formed or did not validate against our published schema."
                  }
              ),
          _serviceErrorRequestId =
            Just
              ( RequestId
                  { fromRequestId = "17003B77BD220CF2"
                  }
              )
        }
    )

This adds an integration test checking `amazonka-s3` against MinIO. To
execute the test just run the following:

```
$ nix-build tests/s3.nix
...
s3: must succeed:
    echo 'Hello World!' > ./some-file
    export AWS_ACCESS_KEY_ID="BKIKJAA5BMMU2RHO6IBB"
    export AWS_SECRET_ACCESS_KEY="V7f1CwQqAcwo80UEIJEjc5gVQUSSx5ohQ9GSrr12"
    amazonka-s3-test-app ./some-file http://s3:9000 my-bucket some-file

s3 # > > > > > [   11.524753] minio[818]: {"deploymentid":"0de4a2b3-eaf1-46a0-8ec7-2e706e68d3b4","level":"ERROR","errKind":"MINIO","time":"2022-07-09T18:07:28.335404358Z","api":{"name":"SYSTEM","args":{}},"error":{"message":"EOF (*errors.errorString)","source":["cmd/handler-utils.go:55:cmd.parseLocationConstraint()","cmd/auth-handler.go:344:cmd.checkRequestAuthTypeCredential()","cmd/auth-handler.go:296:cmd.checkRequestAuthType()","cmd/bucket-handlers.go:719:cmd.objectAPIHandlers.PutBucketHandler()","net/http/server.go:2047:http.HandlerFunc.ServeHTTP()"]}}
s3 # amazonka-s3-test-app: ServiceError (ServiceError' {_serviceErrorAbbrev = Abbrev {fromAbbrev = "S3"}, _serviceErrorStatus = Status {statusCode = 400, statusMessage = "Bad Request"}, _serviceErrorHeaders = [("Accept-Ranges","bytes"),("Content-Length","371"),("Content-Security-Policy","block-all-mixed-content"),("Content-Type","application/xml"),("Server","MinIO"),("Strict-Transport-Security","max-age=31536000; includeSubDomains"),("Vary","Origin"),("Vary","Accept-Encoding"),("X-Amz-Bucket-Region","us-east-1"),("X-Amz-Request-Id","17003B77BD220CF2"),("X-Content-Type-Options","nosniff"),("X-Xss-Protection","1; mode=block"),("Date","Sat, 09 Jul 2022 18:07:28 GMT")], _serviceErrorCode = ErrorCode "MalformedXML", _serviceErrorMessage = Just (ErrorMessage {fromErrorMessage = "The XML you provided was not well-formed or did not validate against our published schema."}), _serviceErrorRequestId = Just (RequestId {fromRequestId = "17003B77BD220CF2"})})
```
@basvandijk basvandijk marked this pull request as ready for review July 9, 2022 20:30
@basvandijk
Copy link
Contributor Author

That ^ commit extends the test with capturing the network packets using tcpdump to inspect the request amazonka generates. This is the dump I get:

10:20:48.571111 IP6 ::1.59144 > ::1.9000: Flags [S], seq 4201602683, win 65476, options [mss 65476,sackOK,TS val 2587980450 ecr 0,nop,wscale 7], length 0
`.b?.(.@..................................#(.o^{.........0.........
.Ar.........
10:20:48.571164 IP6 ::1.9000 > ::1.59144: Flags [S.], seq 1752750534, ack 4201602684, win 65464, options [mss 65476,sackOK,TS val 2587980450 ecr 2587980450,nop,wscale 7], length 0
`....(.@................................#(..hx...o^|.....0.........
.Ar..Ar.....
10:20:48.571226 IP6 ::1.59144 > ::1.9000: Flags [.], ack 1, win 512, options [nop,nop,TS val 2587980450 ecr 2587980450], length 0
`.b?. .@..................................#(.o^|hx.......(.....
.Ar..Ar.
10:20:48.572635 IP6 ::1.59144 > ::1.9000: Flags [P.], seq 1:461, ack 1, win 512, options [nop,nop,TS val 2587980451 ecr 2587980450], length 460
`.b?...@..................................#(.o^|hx.............
.Ar..Ar.PUT /some-file HTTP/1.1
Accept-Encoding: gzip
Content-Length: 13
Host: s3:9000
X-Amz-Date: 20220710T102048Z
X-Amz-Content-SHA256: 03ba204e50d126e4674c005e04d82e84c21366780af1f43bd54a37816b6ab340
Expect: 100-continue
Authorization: AWS4-HMAC-SHA256 Credential=BKIKJAA5BMMU2RHO6IBB/20220710/us-east-1/s3/aws4_request, SignedHeaders=expect;host;x-amz-content-sha256;x-amz-date, Signature=a8a6fa5ae7a99054f3b00e694b4413c9bee04bbefdd574af84a8987fb7a4807e


10:20:48.572657 IP6 ::1.9000 > ::1.59144: Flags [.], ack 461, win 508, options [nop,nop,TS val 2587980451 ecr 2587980451], length 0
`.... .@................................#(..hx...o`H.....(.....
.Ar..Ar.
10:20:48.573627 IP6 ::1.9000 > ::1.59144: Flags [P.], seq 1:26, ack 461, win 512, options [nop,nop,TS val 2587980452 ecr 2587980451], length 25
`....9.@................................#(..hx...o`H.....A.....
.Ar..Ar.HTTP/1.1 100 Continue


10:20:48.573656 IP6 ::1.59144 > ::1.9000: Flags [.], ack 26, win 512, options [nop,nop,TS val 2587980452 ecr 2587980452], length 0
`.b?. .@..................................#(.o`Hhx.......(.....
.Ar..Ar.
10:20:48.573789 IP6 ::1.59144 > ::1.9000: Flags [P.], seq 461:474, ack 26, win 512, options [nop,nop,TS val 2587980453 ecr 2587980452], length 13
`.b?.-.@..................................#(.o`Hhx.......5.....
.Ar..Ar.Hello World!

10:20:48.573807 IP6 ::1.9000 > ::1.59144: Flags [.], ack 474, win 512, options [nop,nop,TS val 2587980453 ecr 2587980453], length 0
`.... .@................................#(..hx...o`U.....(.....
.Ar..Ar.
10:20:48.581357 IP6 ::1.9000 > ::1.59144: Flags [P.], seq 26:836, ack 474, win 512, options [nop,nop,TS val 2587980460 ecr 2587980453], length 810
`....J.@................................#(..hx...o`U.....R.....
.Ar..Ar.HTTP/1.1 400 Bad Request
Accept-Ranges: bytes
Content-Length: 371
Content-Security-Policy: block-all-mixed-content
Content-Type: application/xml
Server: MinIO
Strict-Transport-Security: max-age=31536000; includeSubDomains
Vary: Origin
Vary: Accept-Encoding
X-Amz-Bucket-Region: us-east-1
X-Amz-Request-Id: 170070951ABB6B77
X-Content-Type-Options: nosniff
X-Xss-Protection: 1; mode=block
Date: Sun, 10 Jul 2022 10:20:48 GMT

<?xml version="1.0" encoding="UTF-8"?>
<Error><Code>MalformedXML</Code><Message>The XML you provided was not well-formed or did not validate against our published schema.</Message><BucketName>some-file</BucketName><Resource>/some-file</Resource><Region>us-east-1</Region><RequestId>170070951ABB6B77</RequestId><HostId>b31e5451-e212-4657-9db6-a3d4f5f47922</HostId></Error>
10:20:48.581390 IP6 ::1.59144 > ::1.9000: Flags [.], ack 836, win 506, options [nop,nop,TS val 2587980460 ecr 2587980460], length 0
`.b?. .@..................................#(.o`Uhx.
.....(.....
.Ar..Ar.
10:20:48.605507 IP6 ::1.59144 > ::1.9000: Flags [F.], seq 474, ack 836, win 512, options [nop,nop,TS val 2587980484 ecr 2587980460], length 0
`.b?. .@..................................#(.o`Uhx.
.....(.....
.Ar..Ar.
10:20:48.605588 IP6 ::1.9000 > ::1.59144: Flags [F.], seq 836, ack 475, win 512, options [nop,nop,TS val 2587980484 ecr 2587980484], length 0
`.... .@................................#(..hx.
.o`V.....(.....
.Ar..Ar.
10:20:48.605613 IP6 ::1.59144 > ::1.9000: Flags [.], ack 837, win 512, options [nop,nop,TS val 2587980484 ecr 2587980484], length 0
`.b?. .@..................................#(.o`Vhx.......(.....
.Ar..Ar.

So you can see a packet containing the HTTP request PUT /some-file and later a packet containing the contetns Hello World!.

You can also see the MinIO response The XML you provided was not well-formed . Indeed we didn't provide any XML. We just PUT the contents Hello World! without wrapping it in any XML.

What's at fault here? Amazonka? MinIO? Or am I missing something?

@basvandijk
Copy link
Contributor Author

I'm suspicious of the fact that, even though the name of the bucket is specified in the PutObject request:

PutObject'
    { contentLength = Nothing
    , objectLockMode = Nothing
    , expires = Nothing
    , grantReadACP = Nothing
    , sSECustomerAlgorithm = Nothing
    , sSECustomerKey = Nothing
    , requestPayer = Nothing
    , grantWriteACP = Nothing
    , bucketKeyEnabled = Nothing
    , websiteRedirectLocation = Nothing
    , grantRead = Nothing
    , storageClass = Nothing
    , sSECustomerKeyMD5 = Nothing
    , sSEKMSKeyId = Nothing
    , grantFullControl = Nothing
    , contentEncoding = Nothing
    , tagging = Nothing
    , contentMD5 = Nothing
    , objectLockRetainUntilDate = Nothing
    , metadata = fromList []
    , sSEKMSEncryptionContext = Nothing
    , cacheControl = Nothing
    , contentLanguage = Nothing
    , objectLockLegalHoldStatus = Nothing
    , acl = Nothing
    , contentDisposition = Nothing
    , expectedBucketOwner = Nothing
    , serverSideEncryption = Nothing
    , contentType = Nothing
    , bucket = BucketName "my-bucket"
    , key = ObjectKey "some-file"
    , body = Hashed HashedStream
        { sha256 = 03 ba204e50d126e4674c005e04d82e84c21366780af1f43bd54a37816b6ab340
        , length = 13
        }
    }

that bucket name is not included in the HTTP request to MinIO:

PUT /some-file HTTP/1.1
Accept-Encoding: gzip
Content-Length: 13
Host: s3:9000
X-Amz-Date: 20220710T165128Z
X-Amz-Content-SHA256: 03ba204e50d126e4674c005e04d82e84c21366780af1f43bd54a37816b6ab340
Expect: 100-continue
Authorization: AWS4-HMAC-SHA256 Credential=BKIKJAA5BMMU2RHO6IBB/20220710/us-east-1/s3/aws4_request, SignedHeaders=expect;host;x-amz-content-sha256;x-amz-date, Signature=67a8e6a8f495e6bcad4fc8ea7c65180e113f8a0a4c5e44f2565ff365806c1626


Maybe that explains why MinIO is not expecting to receive an object at URL /some-file but is expecting XML instead because that URL is not for uploading objects.

@basvandijk
Copy link
Contributor Author

If I look at examples of how to put an object to MinIO I see the request path needs to begin with the bucket name. So in our example we need the request to be:

PUT /my-bucket/some-file 

I'll dive into the amazonka source code to see why the bucket name is omitted.

@basvandijk
Copy link
Contributor Author

@basvandijk
Copy link
Contributor Author

I found the root cause: the default request of a PutObject contains the right path with bucketName/objectKey. However it's passed through the s3vhost function which strips of the bucket name from the path and moves it to the host.

The right solution is probably to only apply the s3vhost transformation if we're dealing with an AWS endpoint.

basvandijk added a commit to basvandijk/amazonka that referenced this pull request Jul 12, 2022
@basvandijk
Copy link
Contributor Author

basvandijk commented Jul 12, 2022

@endgame, @akshaymankar basvandijk/amazonka@put-to-minio-malformed-xml...put-to-minio-malformed-xml-fix-attempt-1 tries to fix #760 by adding the field _serviceRewriteS3VHost :: Bool to the Service record which is False for all services except for S3.

A user can then override this field in the environment by setting it to False:

  defEnv <- Amazonka.newEnv Amazonka.Auth.fromKeysEnv
  let env =
        flip Amazonka.override defEnv $
          (Amazonka.serviceEndpoint .~ endpoint)
            . (Amazonka.serviceRewriteS3VHost .~ False)

Then s3vhost will not rewrite the passed request rq if not $ rq ^. requestService . serviceRewriteS3VHost.

This doesn't work unfortunately since the AWSRequest instance for PutObject overrides the Service record in the passed Request with defaultService, thereby setting the _serviceRewriteS3VHost back to True.

Possible solution

To fix this I think the request method implementations should stop setting a defaultService in the Request like we do here. Instead we need to have a type class:

class AWSService a where
  service :: Proxy a -> Service

and have instances for all request types like for example:

instance AWSService PutObject where
  service _proxy = Amazonka.S3.Types.defaultService

then the request method needs to accept a Service argument:

class AWSRequest a where
  ...
  request :: Service -> a -> Request a

The instance for PutObject for example will then look like:

instance Core.AWSRequest PutObject where
  type AWSResponse PutObject = PutObjectResponse
-  request =
+  request srv =
    Request.expectHeader
      Prelude.. Request.s3vhost
-      Prelude.. Request.putBody defaultService
+      Prelude.. Request.putBody srv

configureRequest will then override the service with the user supplied overrides and apply the request method to it:

- configureRequest :: AWSRequest a => Env' withAuth -> a -> Request a
+ configureRequest :: forall a. (AWSRequest a, AWSService a) => Env' withAuth -> a -> Request a
configureRequest env x =
  let overrides = envOverride env
+     srv = appEndo (getDual overrides) $ service (Proxy :: Proxy a)
-    in request x & requestService %~ appEndo (getDual overrides)
+    in request srv x

We could then also consider dropping the _requestService field from Request but I'm not sure if that's used for other things.

@endgame
Copy link
Collaborator

endgame commented Aug 27, 2022

This seems overwrought; can you not just set the rewriteS3VHost in an override with the rest of the amazonka Env? The requestService appears to be modified after the default request is generated by the typeclass instance:

configureRequest :: AWSRequest a => Env' withAuth -> a -> Request a
configureRequest env x =
let overrides = envOverride env
in request x & requestService %~ appEndo (getDual overrides)

@basvandijk
Copy link
Contributor Author

... can you not just set the rewriteS3VHost in an override with the rest of the amazonka Env? ...

I tried that:

basvandijk/amazonka@put-to-minio-malformed-xml...put-to-minio-malformed-xml-fix-attempt-1 tries to fix #760 by adding the field _serviceRewriteS3VHost :: Bool to the Service record which is False for all services except for S3.

But:

This doesn't work unfortunately since the AWSRequest instance for PutObject overrides the Service record in the passed Request with defaultService, thereby setting the _serviceRewriteS3VHost back to True.

Hence my suggestion for that "possible solution".

ivb-supercede pushed a commit to SupercedeTech/amazonka that referenced this pull request Sep 13, 2022
ivb-supercede pushed a commit to SupercedeTech/amazonka that referenced this pull request Sep 13, 2022
ivb-supercede added a commit to SupercedeTech/amazonka that referenced this pull request Sep 14, 2022
This implements a change suggested by this comment:
brendanhay#797 (comment)
as part of a fix to the following issue:
brendanhay#760

This modifies the code generation for AWS endpoints so that requests
which make other requests all share the same (passed-in) `Service`. This
is done by making `request` take an extra `Service` argument. This also
adds a new parent class of `AWSRequest`, called `AWSService`, which
defines the default service used for each request (this being
`defaultService`.)

These are some breaking changes, but they are required to allow modified
`Service`s to be used in complex requests. This is necessary to allow
VHost rewriting to be turned off for local mocking and other use cases
at the `Service` level.

Co-authored-by: Bas van Dijk <v.dijk.bas@gmail.com>
ivb-supercede added a commit to SupercedeTech/amazonka that referenced this pull request Sep 14, 2022
Originally "Non-working attempt to fix the test failure in PR brendanhay#797"

This adds a new field to `Service` which controls if requests to the
service should have their URLs rewritten using `s3vhost`. This is
particularly relevant for mocks of S3 which don't support having the
bucket name in the hostname (e.g. because they are bound to `localhost`,
which isn't a domain and can't have a subdomain.)

This field is defaulted to `True` to preserve the current behaviour for
all services.

Co-authored-by: Isaac van Bakel <isaac@supercede.com>
@ivb-supercede
Copy link
Contributor

FYI @basvandijk your fix is now implemented over on our branch here: https://github.com/SupercedeTech/amazonka/commits/temp-vhost-fix

I've tested it for our use case (using Moto as an S3 mock) and it works as expected.

One particular issue of note was this line, which I believe is the cause for #760, in part. Because the service overrides are originally applied after the request is built, the call to setEndpoint resets the service host after the bucket name is prepended in s3vhost - so the URL which is presigned ends up being incorrect, missing the bucket in both the path and hostname.

The fix involving AWSService (or, at minimum, passing the Service in to request) seems necessary to resolve this, since Service overrides alone don't seem to accurately manage it.

kfigiela added a commit to kfigiela/amazonka that referenced this pull request Oct 18, 2022
This implements a change suggested by this comment:
brendanhay#797 (comment)
as part of a fix to the following issue:
brendanhay#760

This modifies the code generation for AWS endpoints so that requests
which make other requests all share the same (passed-in) `Service`. This
is done by making `request` take an extra `Service` argument. This also
adds a new parent class of `AWSRequest`, called `AWSService`, which
defines the default service used for each request (this being
`defaultService`.)

These are some breaking changes, but they are required to allow modified
`Service`s to be used in complex requests. This is necessary to allow
VHost rewriting to be turned off for local mocking and other use cases
at the `Service` level.

Co-authored-by: Bas van Dijk <v.dijk.bas@gmail.com>

# Conflicts:
#	gen/src/Gen/AST/Data/Syntax.hs
kfigiela pushed a commit to kfigiela/amazonka that referenced this pull request Oct 18, 2022
Originally "Non-working attempt to fix the test failure in PR brendanhay#797"

This adds a new field to `Service` which controls if requests to the
service should have their URLs rewritten using `s3vhost`. This is
particularly relevant for mocks of S3 which don't support having the
bucket name in the hostname (e.g. because they are bound to `localhost`,
which isn't a domain and can't have a subdomain.)

This field is defaulted to `True` to preserve the current behaviour for
all services.

Co-authored-by: Isaac van Bakel <isaac@supercede.com>
@kfigiela
Copy link
Contributor

We're also experiencing the issue with s3vhost rewriting, as we use localstack instead of real AWS to run our test suites.

What are the next steps regarding wrt. the patch proposed by @ivb-supercede?

endgame pushed a commit to endgame/amazonka that referenced this pull request Oct 27, 2022
This implements a change suggested by this comment:
brendanhay#797 (comment)
as part of a fix to the following issue:
brendanhay#760

This modifies the code generation for AWS endpoints so that requests
which make other requests all share the same (passed-in) `Service`. This
is done by making `request` take an extra `Service` argument. This also
adds a new parent class of `AWSRequest`, called `AWSService`, which
defines the default service used for each request (this being
`defaultService`.)

These are some breaking changes, but they are required to allow modified
`Service`s to be used in complex requests. This is necessary to allow
VHost rewriting to be turned off for local mocking and other use cases
at the `Service` level.

Co-authored-by: Bas van Dijk <v.dijk.bas@gmail.com>
endgame pushed a commit to endgame/amazonka that referenced this pull request Oct 27, 2022
Originally "Non-working attempt to fix the test failure in PR brendanhay#797"

This adds a new field to `Service` which controls if requests to the
service should have their URLs rewritten using `s3vhost`. This is
particularly relevant for mocks of S3 which don't support having the
bucket name in the hostname (e.g. because they are bound to `localhost`,
which isn't a domain and can't have a subdomain.)

This field is defaulted to `True` to preserve the current behaviour for
all services.

Co-authored-by: Isaac van Bakel <isaac@supercede.com>
@endgame endgame mentioned this pull request Oct 27, 2022
endgame pushed a commit to endgame/amazonka that referenced this pull request Oct 27, 2022
Originally "Non-working attempt to fix the test failure in PR brendanhay#797"

This adds a new field to `Service` which controls if requests to the
service should have their URLs rewritten using `s3vhost`. This is
particularly relevant for mocks of S3 which don't support having the
bucket name in the hostname (e.g. because they are bound to `localhost`,
which isn't a domain and can't have a subdomain.)

This field is defaulted to `True` to preserve the current behaviour for
all services.

Co-authored-by: Isaac van Bakel <isaac@supercede.com>
@endgame endgame closed this in #832 Nov 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants