Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse entire HTTP chunk size #396 #528

Merged
merged 3 commits into from
Mar 31, 2024

Conversation

SkyTrix
Copy link
Contributor

@SkyTrix SkyTrix commented Mar 28, 2024

Continue parsing HTTP chunk size up until Int.MaxValue so the actual size of the chunk can be logged if it exceeds the configured max chunk size.

Continue parsing HTTP chunk size up until Int.MaxValue so the actual
size of the chunk can be logged if it exceeds the configured max chunk
size.
case '\r' if cursor > offset && byteChar(input, cursor + 1) == '\n' =>
parseChunkBody(size.toInt, "", cursor + 2)
case '\n' if cursor > offset => parseChunkBody(size.toInt, "", cursor + 1)
case c if CharacterClasses.WSP(c) => parseSize(cursor + 1, size) // illegal according to the spec but can happen, see issue #1812
case c => failEntityStream(s"Illegal character '${escape(c)}' in chunk start")
}
} else failEntityStream(s"HTTP chunk size exceeds the configured limit of ${settings.maxChunkSize} bytes")
} else failEntityStream(s"HTTP chunk size exceeds ${Int.MaxValue} bytes")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you include the text Int.MAX_VALUE in this message too? Some users may not recognise the numeric value and what it means - still possibly useful to have the actual number too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. I believe it should either be Int.MaxValue or Integer.MAX_VALUE though. Probably the latter due to Java interoperability?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer.MAX_VALUE might be better

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Scala Compiler will replace it to the acutual value.

case '\r' if cursor > offset && byteChar(input, cursor + 1) == '\n' =>
parseChunkBody(size.toInt, "", cursor + 2)
case '\n' if cursor > offset => parseChunkBody(size.toInt, "", cursor + 1)
case c if CharacterClasses.WSP(c) => parseSize(cursor + 1, size) // illegal according to the spec but can happen, see issue #1812
case c => failEntityStream(s"Illegal character '${escape(c)}' in chunk start")
}
} else failEntityStream(s"HTTP chunk size exceeds the configured limit of ${settings.maxChunkSize} bytes")
} else failEntityStream(s"HTTP chunk size exceeds Integer.MAX_VALUE (${Int.MaxValue}) bytes")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this does not start parsing the actual chunk body. It only reads the size value that's written in the chunk header and bails if that's larger than Int.MaxValue. It will never attempt to parse anything larger than settings.maxChunkSize.

Copy link
Contributor

@mdedetrich mdedetrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems good to me, lets get someone else to also have a look

Copy link
Contributor

@pjfanning pjfanning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@pjfanning pjfanning added this to the 1.1.0 milestone Mar 28, 2024
case ';' if cursor > offset => parseChunkExtensions(size.toInt, cursor + 1)()
case c if size > settings.maxChunkSize =>
failEntityStream(
s"HTTP chunk of $size bytes exceeds the configured limit of ${settings.maxChunkSize} bytes")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see now, It was try decoding how big the chunk size is, and it will stop parsing how if the size already exceed MaxValue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about HTTP chunk size:$size exceeds the configured limit of ${settings.maxChunkSize} bytes

Copy link
Member

@He-Pin He-Pin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm after check the code in an IDE

Copy link
Member

@Roiocam Roiocam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks

Copy link
Member

@samueleresca samueleresca left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@@ -307,17 +307,20 @@ private[http] trait HttpMessageParser[Output >: MessageOutput <: ParserOutput] {
s"HTTP chunk extension length exceeds configured limit of ${settings.maxChunkExtLength} characters")

@tailrec def parseSize(cursor: Int, size: Long): StateResult =
if (size <= settings.maxChunkSize) {
if (size <= Int.MaxValue) {
Copy link
Member

@samueleresca samueleresca Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal preference (it feels more clear) would be to have:

if (size > Int.MaxValue) {
// early exit here with failEntityStream
}

// everything else

But I don't feel is binding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

guard clauses. 👍🏻

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to keep the structure consistent with the rest of the file. I also think the above suggestion would be incorrect in the sense that everything else would still be executed after calling failEntityStream? An else block would still be required.

Comment on lines +547 to +548
"""400000
|""") should generalMultiParseTo(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""400000
|""") should generalMultiParseTo(
"400000") should generalMultiParseTo(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the trailing newline is significant in this and many other tests in the file.

@@ -531,13 +531,24 @@ abstract class RequestParserSpec(mode: String, newLine: String) extends AnyFreeS
closeAfterResponseCompletion shouldEqual Seq(false)
}

"too-large chunk size" in new Test {
"too-large chunk size (> Int.MaxValue)" in new Test {
Seq(
start,
"""1a2b3c4d5e
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"1a2b3c4d5e"

Copy link
Contributor Author

@SkyTrix SkyTrix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a maintainer please shed some light on what it would still take to get this merged? There are currently 6 approvals but also some comments. I don't want to invalidate approvals by making new changes if that wouldn't be required.

@@ -307,17 +307,20 @@ private[http] trait HttpMessageParser[Output >: MessageOutput <: ParserOutput] {
s"HTTP chunk extension length exceeds configured limit of ${settings.maxChunkExtLength} characters")

@tailrec def parseSize(cursor: Int, size: Long): StateResult =
if (size <= settings.maxChunkSize) {
if (size <= Int.MaxValue) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to keep the structure consistent with the rest of the file. I also think the above suggestion would be incorrect in the sense that everything else would still be executed after calling failEntityStream? An else block would still be required.

Comment on lines +547 to +548
"""400000
|""") should generalMultiParseTo(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the trailing newline is significant in this and many other tests in the file.

@pjfanning pjfanning merged commit ab1b252 into apache:main Mar 31, 2024
10 checks passed
@pjfanning
Copy link
Contributor

Merged - thanks.

It can be refactored in new PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants