Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser optimization #332

Merged
merged 8 commits into from
Feb 15, 2022
Merged

Parser optimization #332

merged 8 commits into from
Feb 15, 2022

Conversation

The-EDev
Copy link
Member

@The-EDev The-EDev commented Feb 5, 2022

In a nutshell, This PR:

  • Removes any code that parses HTTP responses.
  • Integrates certain parts of the parser into the rest of the framework (such as the HTTP method list and string access function, and builtin_expect).
  • Integrates existing framework parts into the parser (such as the HTTP method enum).
  • Removes functions in the framework that are already carried out by the parser.
  • Adds 2 HTTP version integers to the request.
  • Fixes a problem where a Crow application would not be able to parse a multipart request made using .Net HttpClient (due to the boundary being wrapped with quotation marks (")).
  • Distinguishes between cleartext and SSL HTTP/2 headers.
  • Fixes a problem introduced in (Fixed data leak when sending static files #292) where the connection is closed on every data transfer when a response is streamed (as opposed to once everything has been sent).
  • Adds indications as to who wrote a NOTE or TODO.

More detailed explanation of the changes will be provided as review comments to make the code changes easy to digest.

Closes #306, #320, #331

Also (likely) fixed problem with streaming without keeping connection
alive
Also fixed problem with multipart hanging on request
Also updated TODO entries to indicate Writer
This was linked to issues Feb 5, 2022
@The-EDev The-EDev requested a review from mrozigor February 5, 2022 15:39
@The-EDev
Copy link
Member Author

The-EDev commented Feb 5, 2022

Compression tests apparently used parts of the parser that the rest of Crow wasn't using (and were as such removed), so the test needs to be adapted to work without the parser (cut out the response line and headers, leaving only the body).

@CrowCpp CrowCpp deleted a comment from crow-clang-format bot Feb 6, 2022
@The-EDev The-EDev marked this pull request as draft February 8, 2022 00:55
Comment on lines 9 to 17
// TODO(EDev): Adding C++20's [[likely]] and [[unlikely]] attributes might be useful
#if defined(__GNUG__) || defined(__clang__)
#define CROW_LIKELY(X) __builtin_expect(!!(X), 1)
#define CROW_UNLIKELY(X) __builtin_expect(!!(X), 0)
#else
#define CROW_LIKELY(X) (X)
#define CROW_UNLIKELY(X) (X)
#endif

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came from json.h and the parser (is used in both now)

Comment on lines +20 to +23
const char cr = '\r';
const char lf = '\n';
const std::string crlf("\r\n");

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

came from multipart.h, connection.h, and the parser.

Comment on lines +39 to +68

COPY,
LOCK,
MKCOL,
MOVE,
PROPFIND,
PROPPATCH,
SEARCH,
UNLOCK,
BIND,
REBIND,
UNBIND,
ACL,

REPORT,
MKACTIVITY,
CHECKOUT,
MERGE,

MSEARCH,
NOTIFY,
SUBSCRIBE,
UNSUBSCRIBE,

MKCALENDAR,

LINK,
UNLINK,

SOURCE,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Came from the parser

Comment on lines +119 to +173
const char* method_strings[] =
{
"DELETE",
"GET",
"HEAD",
"POST",
"PUT",

"CONNECT",
"OPTIONS",
"TRACE",

"PATCH",
"PURGE",

"COPY",
"LOCK",
"MKCOL",
"MOVE",
"PROPFIND",
"PROPPATCH",
"SEARCH",
"UNLOCK",
"BIND",
"REBIND",
"UNBIND",
"ACL",

"REPORT",
"MKACTIVITY",
"CHECKOUT",
"MERGE",

"M-SEARCH",
"NOTIFY",
"SUBSCRIBE",
"UNSUBSCRIBE",

"MKCALENDAR",

"LINK",
"UNLINK",

"SOURCE"};


inline std::string method_name(HTTPMethod method)
{
if (CROW_LIKELY(method < HTTPMethod::InternalMethodCount))
{
return method_strings[(unsigned char)method];
}
return "invalid";
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sort of inspired by the parser

Comment on lines -263 to -140
if (parser_.check_version(1, 0))
{
// HTTP/1.0
if (req.headers.count("connection"))
{
if (boost::iequals(req.get_header_value("connection"), "Keep-Alive"))
add_keep_alive_ = true;
}
else
close_connection_ = true;
}
else if (parser_.check_version(1, 1))
add_keep_alive_ = req.keep_alive;
close_connection_ = req.close_connection;

if (req.check_version(1, 1)) // HTTP/1.1
{
// HTTP/1.1
if (req.headers.count("connection"))
{
if (req.get_header_value("connection") == "close")
close_connection_ = true;
else if (boost::iequals(req.get_header_value("connection"), "Keep-Alive"))
add_keep_alive_ = true;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already being done in the parser

@@ -20,8 +20,6 @@ namespace crow
return empty;
}

struct DetachHelper;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

never used anywhere

@@ -32,7 +32,7 @@ namespace crow
#ifdef CROW_ENABLE_COMPRESSION
bool compressed = true; ///< If compression is enabled and this is false, the individual response will not be compressed.
#endif
bool is_head_response = false; ///< Whether this is a response to a HEAD request.
bool skip_body = false; ///< Whether this is a response to a HEAD request.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

body is skipped with upgrade requests as well

@@ -1825,7 +1825,7 @@ TEST_CASE("stream_response")

//Total bytes received
unsigned int received = 0;
sendmsg = "GET /test\r\n\r\n";
sendmsg = "GET /test HTTP/1.0\r\n\r\n";
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this to test the close connection problem

Comment on lines 2123 to 2241
}
response_gzip.push_back(buf_gzip[i]);
}

http_parser parser[2] = {{}, {}};
http_parser_init(&parser[0], HTTP_RESPONSE);
http_parser_init(&parser[1], HTTP_RESPONSE);
parser[0].data = reinterpret_cast<void*>(&response_deflate_no_header);
parser[1].data = reinterpret_cast<void*>(&response_gzip_no_header);
response_deflate = inflate_string(response_deflate.substr(response_deflate.find("\r\n\r\n") + 4));
response_gzip = inflate_string(response_gzip.substr(response_gzip.find("\r\n\r\n") + 4));

http_parser_execute(&parser[0], &settings, buf_deflate, bytes_deflate);
http_parser_execute(&parser[1], &settings, buf_gzip, bytes_gzip);
socket[0].close();
socket[1].close();
CHECK(expected_string == response_deflate);
CHECK(expected_string == response_gzip);
}
// No Header (thus no compression)
{
asio::ip::tcp::socket socket[2] = {asio::ip::tcp::socket(is), asio::ip::tcp::socket(is)};
socket[0].connect(asio::ip::tcp::endpoint(asio::ip::address::from_string(LOCALHOST_ADDRESS), 45451));
socket[1].connect(asio::ip::tcp::endpoint(asio::ip::address::from_string(LOCALHOST_ADDRESS), 45452));

socket[0].close();
socket[1].close();
}
// No compression
{
asio::ip::tcp::socket socket[2] = {asio::ip::tcp::socket(is), asio::ip::tcp::socket(is)};
socket[0].connect(asio::ip::tcp::endpoint(asio::ip::address::from_string(LOCALHOST_ADDRESS), 45451));
socket[1].connect(asio::ip::tcp::endpoint(asio::ip::address::from_string(LOCALHOST_ADDRESS), 45452));
socket[0].send(asio::buffer(test_compress_no_header_msg));
socket[1].send(asio::buffer(test_compress_no_header_msg));

socket[0].send(asio::buffer(test_none_msg));
socket[1].send(asio::buffer(test_none_msg));
socket[0].receive(asio::buffer(buf_deflate, 2048));
socket[1].receive(asio::buffer(buf_gzip, 2048));

size_t bytes_deflate = socket[0].receive(asio::buffer(buf_deflate, 2048));
size_t bytes_gzip = socket[1].receive(asio::buffer(buf_gzip, 2048));
std::string response_deflate(buf_deflate);
std::string response_gzip(buf_gzip);
response_deflate = response_deflate.substr(98);
response_gzip = response_gzip.substr(98);

http_parser parser[2] = {{}, {}};
http_parser_init(&parser[0], HTTP_RESPONSE);
http_parser_init(&parser[1], HTTP_RESPONSE);
parser[0].data = reinterpret_cast<void*>(&response_deflate_none);
parser[1].data = reinterpret_cast<void*>(&response_gzip_none);
socket[0].close();
socket[1].close();
CHECK(expected_string == response_deflate);
CHECK(expected_string == response_gzip);
}
// No compression
{
asio::ip::tcp::socket socket[2] = {asio::ip::tcp::socket(is), asio::ip::tcp::socket(is)};
socket[0].connect(asio::ip::tcp::endpoint(asio::ip::address::from_string(LOCALHOST_ADDRESS), 45451));
socket[1].connect(asio::ip::tcp::endpoint(asio::ip::address::from_string(LOCALHOST_ADDRESS), 45452));

http_parser_execute(&parser[0], &settings, buf_deflate, bytes_deflate);
http_parser_execute(&parser[1], &settings, buf_gzip, bytes_gzip);
socket[0].send(asio::buffer(test_none_msg));
socket[1].send(asio::buffer(test_none_msg));

socket[0].close();
socket[1].close();
}
}
{
CHECK(expected_string == response_deflate);
CHECK(expected_string == response_gzip);
socket[0].receive(asio::buffer(buf_deflate, 2048));
socket[1].receive(asio::buffer(buf_gzip, 2048));

CHECK(expected_string == response_deflate_no_header);
CHECK(expected_string == response_gzip_no_header);
std::string response_deflate(buf_deflate);
std::string response_gzip(buf_gzip);
response_deflate = response_deflate.substr(98);
response_gzip = response_gzip.substr(98);

CHECK(expected_string == response_deflate_none);
CHECK(expected_string == response_gzip_none);
}
socket[0].close();
socket[1].close();
CHECK(expected_string == response_deflate);
CHECK(expected_string == response_gzip);
}
}

app_deflate.stop();
app_gzip.stop();
app_deflate.stop();
app_gzip.stop();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needed to change because it used the parser's response code

@@ -2481,15 +2492,14 @@ TEST_CASE("get_port")

const std::uint16_t port = 12345;

std::thread runTest([&]() {
auto _ = async(launch::async, [&] {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure if it worked, but this change was made to fix a problem where the tests would block for some reason.

@The-EDev The-EDev marked this pull request as ready for review February 8, 2022 02:07
Comment on lines -2798 to -2920
return 1;

/* Skip delimeters */
case s_req_schema_slash:
case s_req_schema_slash_slash:
case s_req_server_start:
case s_req_query_string_start:
case s_req_fragment_start:
continue;

case s_req_schema:
uf = UF_SCHEMA;
break;

case s_req_server_with_at:
found_at = 1;
break;

/* fall through */
case s_req_server:
uf = UF_HOST;
break;

case s_req_path:
uf = UF_PATH;
break;

case s_req_query_string:
uf = UF_QUERY;
break;

case s_req_fragment:
uf = UF_FRAGMENT;
break;

default:
assert(!"Unexpected state");
return 1;
}

/* Nothing's changed; soldier on */
if (uf == old_uf) {
u->field_data[uf].len++;
continue;
}

u->field_data[uf].off = (uint16_t)(p - buf);
u->field_data[uf].len = 1;

u->field_set |= (1 << uf);
old_uf = uf;
}

/* host must be present if there is a schema */
/* parsing http:///toto will fail */
if ((u->field_set & (1 << UF_SCHEMA)) &&
(u->field_set & (1 << UF_HOST)) == 0) {
return 1;
}

if (u->field_set & (1 << UF_HOST)) {
if (http_parse_host(buf, u, found_at) != 0) {
return 1;
}
}

/* CONNECT requests can only contain "hostname:port" */
if (is_connect && u->field_set != ((1 << UF_HOST)|(1 << UF_PORT))) {
return 1;
}

if (u->field_set & (1 << UF_PORT)) {
uint16_t off;
uint16_t len;
const char* p;
const char* end;
unsigned long v;

off = u->field_data[UF_PORT].off;
len = u->field_data[UF_PORT].len;
end = buf + off + len;

/* NOTE: The characters are already validated and are in the [0-9] range */
assert((size_t)(off + len) <= buflen && "Port number overflow");
v = 0;
for (p = buf + off; p < end; p++) {
v *= 10;
v += *p - '0';

/* Ports have a max value of 2^16 */
if (v > 0xffff) {
return 1;
}
}

u->port = static_cast<uint16_t>(v);
}

return 0;
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

parsing the URL for requests is done in the framework

Comment on lines -356 to -365
enum http_parser_url_fields
{ UF_SCHEMA = 0
, UF_HOST = 1
, UF_PORT = 2
, UF_PATH = 3
, UF_QUERY = 4
, UF_FRAGMENT = 5
, UF_USERINFO = 6
, UF_MAX = 7
};
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not needed

include/crow/http_connection.h Outdated Show resolved Hide resolved
include/crow/http_connection.h Show resolved Hide resolved
include/crow/http_parser_merged.h Outdated Show resolved Hide resolved
include/crow/multipart.h Outdated Show resolved Hide resolved
include/crow/parser.h Outdated Show resolved Hide resolved
Also moved builtin_expect to utility.h (for use in sanitizer function)
@crow-clang-format
Copy link

--- include/crow/parser.h	(before formatting)
+++ include/crow/parser.h	(after formatting)
@@ -154,13 +154,13 @@
 
             // HTTP1.1 = always send keep_alive, HTTP1.0 = only send if header exists, HTTP?.? = never send
             keep_alive = (http_major == 1 && http_minor == 0) ?
-                                 ((flags & F_CONNECTION_KEEP_ALIVE) ? true : false) :
-                                 ((http_major == 1 && http_minor == 1) ? true : false);
+                           ((flags & F_CONNECTION_KEEP_ALIVE) ? true : false) :
+                           ((http_major == 1 && http_minor == 1) ? true : false);
 
             // HTTP1.1 = only close if close header exists, HTTP1.0 = always close unless keep_alive header exists, HTTP?.?= never close
             close_connection = (http_major == 1 && http_minor == 0) ?
-                                       ((flags & F_CONNECTION_KEEP_ALIVE) ? false : true) :
-                                       ((http_major == 1 && http_minor == 1) ? ((flags & F_CONNECTION_CLOSE) ? true : false) : false);
+                                 ((flags & F_CONNECTION_KEEP_ALIVE) ? false : true) :
+                                 ((http_major == 1 && http_minor == 1) ? ((flags & F_CONNECTION_CLOSE) ? true : false) : false);
         }
 
         /// Take the parsed HTTP request data and convert it to a \ref crow.request
--- include/crow/utility.h	(before formatting)
+++ include/crow/utility.h	(after formatting)
@@ -702,8 +702,8 @@
                     data[i] = replacement;
                 }
                 else if ((c == '/') || (c == '\\'))
-                    {
-                    if (CROW_UNLIKELY( i == 0 )) //Prevent Unix Absolute Paths (Windows Absolute Paths are prevented with `(c == ':')`)
+                {
+                    if (CROW_UNLIKELY(i == 0)) //Prevent Unix Absolute Paths (Windows Absolute Paths are prevented with `(c == ':')`)
                     {
                         data[i] = replacement;
                     }
@@ -711,9 +711,9 @@
                     {
                         checkForSpecialEntries = true;
                     }
-                    }
                 }
-            }
+                }
+        }
 
     } // namespace utility
 } // namespace crow

@The-EDev The-EDev requested a review from mrozigor February 12, 2022 00:37
@The-EDev The-EDev merged commit d43d680 into master Feb 15, 2022
@The-EDev The-EDev deleted the parser_optimization branch February 22, 2022 03:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

SIGPIPE crow::multipart::message hangs on request Parser Optimizations
2 participants