Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for multipart/form-data in the request body #2321

Merged

Conversation

ilanashapiro
Copy link
Contributor

@ilanashapiro ilanashapiro commented Jul 3, 2023

Addresses #2008

Summary

Currently, validation always fails whenever a multipart/form-data content type is used in the request body. This prevents the normal usage of prism without any other proxy server (like nginx) on which such requests can be filtered out. Now, users can successfully use multipart/form-data content type in the request body.

Checklist

  • The basics
    • I tested these changes manually in my local or dev environment
  • Tests
    • Added or updated
    • N/A
  • Event Tracking
    • I added event tracking and followed the event tracking guidelines
    • N/A
  • Error Reporting
    • I reported errors and followed the error reporting guidelines
    • N/A

Screenshots

Given the following file called examples/testapi.yaml:

openapi: '3.1.0'
paths:
  /path:
    post:
      responses:
        200:
          content:
            text/plain:
              example: ok
      requestBody:
        content:
          multipart/form-data:
            schema:
              type: object
              properties:
                testData:
                  type: string
              required:
                - "testData"

Writing a non-empty post request to http://127.0.0.1/path with included form-data in the request body now produces the following, successfully returning the response defined in the example:
image
image

In contrast, running this same command on the old version of Prism gives a 422 error:
image
image

@ilanashapiro ilanashapiro changed the title Support for multipart/form-data in the requestBody. Support for multipart/form-data in the request body Jul 3, 2023
@ilanashapiro ilanashapiro marked this pull request as ready for review July 3, 2023 20:51
@ilanashapiro ilanashapiro requested a review from a team as a code owner July 3, 2023 20:51
@ilanashapiro ilanashapiro requested review from EdVinyard and removed request for a team July 3, 2023 20:51
@ilanashapiro
Copy link
Contributor Author

@rattrayalex @RobertCraigie

Copy link
Contributor

@chohmann chohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for submitting this PR!

The main issue we see with this PR is the test harness tests don't actually send multipart form data as expected.

Each of those tests needs to use the -F flag instead of --data in the curl command.

@@ -0,0 +1,28 @@
====test====
If I have an operation with a parameter with the allowEmptyValue set to false to false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If I have an operation with a parameter with the allowEmptyValue set to false to false
If I have an operation with a parameter with the allowEmptyValue set to false

@@ -0,0 +1,28 @@
====test====
If I have an operation with a parameter with the allowEmptyValue set to false to false
and I send a request with a correcly encoded parameter but with an empty key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and I send a request with a correcly encoded parameter but with an empty key
and I send a request with a correctly encoded parameter but with an empty key

@@ -0,0 +1,30 @@
====test====
If I have an operation with a parameter with the allowEmptyValue set to false to false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
If I have an operation with a parameter with the allowEmptyValue set to false to false
If I have an operation with a parameter with the allowEmptyValue set to true

@@ -0,0 +1,30 @@
====test====
If I have an operation with a parameter with the allowEmptyValue set to false to false
and I send a request with a correcly encoded parameter and with the parameter empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and I send a request with a correcly encoded parameter and with the parameter empty
and I send a request with a correctly encoded parameter and with the parameter empty

====server====
mock -p 4010 ${document}
====command====
curl -i -X POST http://localhost:4010/path -H "Content-Type: multipart/form-data" --data "reserved=:/?#[]@!$&'()*+,;"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all of these curl requests you need to use -F instead of --data as that will actually pass form data as multipart.

@ilanashapiro ilanashapiro marked this pull request as draft July 11, 2023 14:51
@ilanashapiro ilanashapiro marked this pull request as ready for review July 12, 2023 12:05
@ilanashapiro
Copy link
Contributor Author

ilanashapiro commented Jul 12, 2023

@chohmann All requested changes are now implemented. Thank you for pointing out that I needed to pass data with the -F flag rather than the --data flag! I initially misunderstood that setting "Content-type: multipart/form-data" was not sufficient. And apologies about the typos in tests I added to the test harness -- I copied these tests from existing files in the application/x-www-form-urlencoded test suite and didn't proofread the original files. I have now fixed all the typos in these as well as in the original files. I also added some regex parsing logic for multipart form data to handle additional cases. Please let me know if you'd like any further changes! Thanks!

Copy link
Contributor

@EdVinyard EdVinyard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really appreciate your update, but I think there are some cases the parseMultipartFormDataParams() function doesn't yet handle regarding the boundary between multipart entities.

@@ -54,6 +54,15 @@ export function splitUriParams(target: string) {
}, {});
}

export function parseMultipartFormDataParams(target: string) {
const pairs = target.split(/--.*\s*.*="/).filter(element => element);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code assumes that the boundary between multipart entities is -- followed by any string. However, the boundary between multipart entities is defined, in each individual request, by the request's Content-Type header's Boundary parameter. This is because the client must choose a value that doesn't occur within any multipart entity that it will transmit. For example, consider the effect of the following example on your implementation.

curl -v -F a=b -F c='--="' <hostname>

Also consider any form value that includes \r\n\r\n.

RFC 2046 § 5.1.1 describes the boundary as...

a line consisting entirely of two hyphen characters ("-", decimal value 45) followed by the boundary parameter value from the Content-Type header field, optional linear whitespace, and a terminating CRLF.

I think you'll need to either:

  1. implement this as the standard describes, with unit tests (not test-harness integration tests) to validate that it works with a variety of input, OR
  2. rely on a library which parses multipart data according to the standard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Ed, thank you so much for your feedback. I discovered a library called parse-multiform-data which I'm installing as an npm package that seemed to be the most reliable method. I made changes according to this, such as parsing the boundary parameter from the Content-Type header field in order to pass it to the library for parsing the body. I left some comments on the recent changes that hopefully help clarify some of the decisions I made when implementing this. Thanks again and please let me know if you'd like any other changes!

@ilanashapiro ilanashapiro marked this pull request as draft July 16, 2023 15:50
// parse boundary string from content-type in case media type is multipart/form-data
const multipart = require('parse-multipart-data');
const contentTypeHeader = headers.get('content-type');
const multipartBoundary = contentTypeHeader ? multipart.getBoundary(contentTypeHeader) : "";
Copy link
Contributor Author

@ilanashapiro ilanashapiro Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we parse the boundary parameter value from the Content-Type header field as specified by https://datatracker.ietf.org/doc/html/rfc2046#section-5.1.1. We do so using the parse-multipart-data library to ensure it's parsed correctly according to the standard. The boundary string is later passed into the parse-multipart-data library to correctly parse the body in the function parseMultipartFormDataParams.


// parse boundary string from content-type in case media type is multipart/form-data
const multipart = require('parse-multipart-data');
const multipartBoundary = multipart.getBoundary(contentTypeHeader);
Copy link
Contributor Author

@ilanashapiro ilanashapiro Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in packages/http/src/validator/index.ts

const multipart = require('parse-multipart-data');
const contentTypeHeader = headers.get('content-type');
const multipartBoundary = contentTypeHeader ? multipart.getBoundary(contentTypeHeader) : "";
const mediaType = contentTypeHeader ? contentTypeHeader.replace(new RegExp(";\\s*boundary=" + multipartBoundary), "") : contentTypeHeader;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After end-end and unit testing, the content-type header appeared in both formats: "multipart/form-data;boundary=--...." and "multipart/form-data; boundary=--...." (with the space after the semicolon). Thus, I'm handling possible whitespace between the media type and the boundary string.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm concerned about your use of a regular expression here. Notice that the boundary may contain any 7-bit ASCII character, which I think includes many characters with special meanings inside a regular expression (e.g., ., *, (). Consider using a library to parse the Content-Type header in its entirety. Here's a very simple example using whatwg-mimetype:

    import MIMEType from 'whatwg-mimetype';
    
    const t = new MIMEType('multipart/form-data; boundary=.*.?.+zzz');
    console.log(t?.essence);
    console.log(t.parameters.get('boundary'));
  2. It looks like this block of code is a duplicate of what you added in mocker/index.ts. Could it be factored out?

// parse boundary string from content-type in case media type is multipart/form-data
const multipart = require('parse-multipart-data');
const multipartBoundary = multipart.getBoundary(contentTypeHeader);
const mediaType = contentTypeHeader.replace(new RegExp(";\\s*boundary=" + multipartBoundary), "");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment in packages/http/src/validator/index.ts

const multipart = require('parse-multipart-data');
const bufferBody = Buffer.from(target, "utf-8");

// the parse-multipart-data package requires that the body is passed in as a buffer, not a string
Copy link
Contributor Author

@ilanashapiro ilanashapiro Jul 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

const parts = multipart.parse(bufferBody, multipartBoundary);

return parts.reduce((result: Dictionary<string>, pair: string) => {
result[pair['name']] = pair['data'].toString();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then convert back from Buffer to String for further processing by Prism

@@ -54,6 +55,30 @@ export function splitUriParams(target: string) {
}, {});
}

export function parseMultipartFormDataParams(target: string, multipartBoundary?: string) {
if(!multipartBoundary) {
const error = "Boundary parameter for multipart/form-data is not defined or generated in the request header. Try removing manually defined content-type from your request header if it exists.";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The boundary string MUST be non-empty if this is multipart/form-data (which it is if we're in this function). Thus, we return with error. According to https://bobbyhadz.com/blog/error-multipart-boundary-not-found-in-express-js, postmanlabs/postman-app-support#191, https://stackoverflow.com/questions/40351429/formdata-how-to-get-or-set-boundary-in-multipart-form-data-angular, this can occur when the user is manually setting the content-type, which means the server is not generating it (and the boundary string). Thus, the possible fix is suggested in the error message.

Copy link
Contributor Author

@ilanashapiro ilanashapiro Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Returning an error seemed to me a better solution than just returning an empty dictionary as it keeps the user more informed. Thus, the return type is Either).

@ilanashapiro ilanashapiro marked this pull request as ready for review July 18, 2023 10:57
@@ -75,17 +76,22 @@ const validateInputBody = (
checkRequiredBodyIsProvided(requestBody, body),
E.map(b => [...b, caseless(headers || {})] as const),
E.chain(([requestBody, body, headers]) => {
// parse boundary string from content-type in case media type is multipart/form-data
const multipart = require('parse-multipart-data');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'd prefer to use import statements rather than dynamic require whenever possible.

const multipart = require('parse-multipart-data');
const contentTypeHeader = headers.get('content-type');
const multipartBoundary = contentTypeHeader ? multipart.getBoundary(contentTypeHeader) : "";
const mediaType = contentTypeHeader ? contentTypeHeader.replace(new RegExp(";\\s*boundary=" + multipartBoundary), "") : contentTypeHeader;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'm concerned about your use of a regular expression here. Notice that the boundary may contain any 7-bit ASCII character, which I think includes many characters with special meanings inside a regular expression (e.g., ., *, (). Consider using a library to parse the Content-Type header in its entirety. Here's a very simple example using whatwg-mimetype:

    import MIMEType from 'whatwg-mimetype';
    
    const t = new MIMEType('multipart/form-data; boundary=.*.?.+zzz');
    console.log(t?.essence);
    console.log(t.parameters.get('boundary'));
  2. It looks like this block of code is a duplicate of what you added in mocker/index.ts. Could it be factored out?

@ilanashapiro ilanashapiro marked this pull request as draft July 24, 2023 12:17
@@ -47,11 +49,35 @@ export function deserializeFormBody(
}

export function splitUriParams(target: string) {
return target.split('&').reduce((result: Dictionary<string>, pair: string) => {
return E.right(target.split('&').reduce((result: Dictionary<string>, pair: string) => {
Copy link
Contributor Author

@ilanashapiro ilanashapiro Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapping in Either monad in order to match the return type of parseMultipartFormDataParams, which is of type Either as it may return an error.


return pipe(
validateAgainstReservedCharacters(encodedUriParams, encodings, prefix),
content.mediaType === "multipart/form-data" ? parseMultipartFormDataParams(target, multipartBoundary) : splitUriParams(target),
Copy link
Contributor Author

@ilanashapiro ilanashapiro Jul 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since parseMultipartFormDataParams returns an error wrapped in E.left if the boundary string is empty, the return type is Either, and I wrapped the result of splitUriParams in E.Right to match the return type (Either) of parseMultipartFormDataParams. Thus, we move this statement into the expression pipeline.


const requestBody = request.body as string;
const encodedUriParams = pipe(
mediaType === "multipart/form-data" ? parseMultipartFormDataParams(requestBody, multipartBoundary) : splitUriParams(requestBody),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parseMultipartFormDataParams returns an error wrapped in E.Left if the boundary sting is empty. Here I convert that to a default value (i.e. empty dictionary) if that occurs. I'm not sure if I'm supposed to be doing anything else here -- this same behavior (unwrapping the monad using defaults) is implemented for findContentByMediaTypeOrFirst just below on line 169, so I attempted to emulate that.

@ilanashapiro
Copy link
Contributor Author

Hi Ed, thanks for your feedback, I am now doing all Content-Type header parsing using the whatwg-mimetype library you recommended, and the dynamic require statements were changed to imports. As for the duplicate code block, I moved it to a small utility function called parseMIMEHeader I created in validators/headers.ts. Finally, I also attempted to improve the error reporting for empty multipart boundary strings by wrapping parseMultipartFormDataParams in the Either monad and returning a descriptive error (which meant I also needed to wrap splitUriParams in the Either monad to maintain equivalent types).

@ilanashapiro ilanashapiro marked this pull request as ready for review July 25, 2023 15:48
Copy link
Contributor

@chohmann chohmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution and for working through our reviews with us!

@chohmann chohmann merged commit 0f46225 into stoplightio:master Jul 28, 2023
6 checks passed
@chohmann
Copy link
Contributor

This is included in version 5.2.0 of Prism.

@rattrayalex
Copy link

Hooray, thanks Ilana, Ed, and Chelsea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants