Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2.1.15 Uncaught HttpBadRequestException: Invalid UTF-8 characters caused by old searchbot requests #4712

Open
FrankWarius opened this issue Jan 8, 2023 · 12 comments

Comments

@FrankWarius
Copy link

I have a lot 500 errors e.g. from bing using old links with umlaut
https://wbt.warius.info/tree/Warius/branches/lammh%C3%B6fer
can You please redirect to an 404 error?

Uncaught Fisharebest\Webtrees\Http\Exceptions\HttpBadRequestException: Invalid UTF-8 characters in request in D:\web\WT21Git\webtrees\app\Validator.php:67 Stack trace:
#0 [internal function]: Fisharebest\Webtrees\Validator::Fisharebest\Webtrees{closure}('lammh\xF6fer', 'surname')
#1 D:\web\WT21Git\webtrees\app\Validator.php(71): array_walk_recursive(Array, Object(Closure))
#2 D:\web\WT21Git\webtrees\app\Validator.php(85): Fisharebest\Webtrees\Validator->__construct(Array, Object(Nyholm\Psr7\ServerRequest), 'UTF-8')
#3 D:\web\WT21Git\webtrees\app\Http\Middleware\HandleExceptions.php(155): Fisharebest\Webtrees\Validator::attributes(Object(Nyholm\Psr7\ServerRequest))
#4 D:\web\WT21Git\webtrees\app\Http\Middleware\HandleExceptions.php(99): Fisharebest\Webtrees\Http\Middleware\HandleExceptions->httpExceptionResponse(Object(Nyholm\Psr7\ServerRequest), Object(Fisharebest\Webtrees\Http\Exceptions\HttpBadRequestException))
#5 D:\web\WT21Git\webtrees\vendor\oscarotero\middleland\src\Dispatcher.php(136): Fisharebest\Webtrees\Http\Middleware\HandleExceptions->process(Object(Nyholm\Psr7\ServerRequest), Object(Middleland\Dispatcher))

@fisharebest
Copy link
Owner

This isn't a problem on the demo server. The URL is valid UTF-8 and is recognised OK.

https://dev.webtrees.net/demo-dev/tree/demo/branches/lammh%C3%B6fer

My guess is that the validation error is occurring on one of the HTTP request headers.

Control panel -> Server information -> PHP Variables.

Are there any "interesting" $_SERVER variables? Perhaps your server is adding geo-lookup headers, and using invalid characters here?

@FrankWarius
Copy link
Author

I don't think that there are added headers, it's nativ IIS10

Variable Value
$_COOKIE['__Secure-WT-ID'] 2e24ba5eb497d1bf0ec0132bacf8f5c5
$_SERVER['FCGI_X_PIPE'] \.\pipe\IISFCGI-1e736672-8688-4dea-8879-a9feb4557a83
$_SERVER['PHPRC'] C:\PHPEnv\PHPini\
$_SERVER['PHP_FCGI_MAX_REQUESTS'] 10000
$_SERVER['ALLUSERSPROFILE'] C:\ProgramData
$_SERVER['APPDATA'] C:\Windows\system32\config\systemprofile\AppData\Roaming
$_SERVER['APP_POOL_CONFIG'] C:\inetpub\temp\apppools\WTProd\WTProd.config
$_SERVER['APP_POOL_ID'] WTProd
$_SERVER['CommonProgramFiles'] C:\Program Files\Common Files
$_SERVER['CommonProgramFiles(x86)'] C:\Program Files (x86)\Common Files
$_SERVER['CommonProgramW6432'] C:\Program Files\Common Files
$_SERVER['COMPUTERNAME'] SRV23-5DP-DE
$_SERVER['ComSpec']
$_SERVER['DriverData']
$_SERVER['LOCALAPPDATA']
$_SERVER['NUMBER_OF_PROCESSORS'] 4
$_SERVER['OS'] Windows_NT
$_SERVER['Path']
$_SERVER['PATHEXT'] .COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH;.MSC
$_SERVER['PROCESSOR_ARCHITECTURE'] AMD64
$_SERVER['PROCESSOR_IDENTIFIER'] Intel64 Family 6 Model 85 Stepping 4, GenuineIntel
$_SERVER['PROCESSOR_LEVEL'] 6
$_SERVER['PROCESSOR_REVISION'] 5504
$_SERVER['ProgramData'] C:\ProgramData
$_SERVER['ProgramFiles'] C:\Program Files
$_SERVER['ProgramFiles(x86)'] C:\Program Files (x86)
$_SERVER['ProgramW6432'] C:\Program Files
$_SERVER['PSModulePath']
$_SERVER['PUBLIC']
$_SERVER['SystemDrive'] C:
$_SERVER['SystemRoot'] C:\Windows
$_SERVER['TEMP'] C:\Windows\TEMP
$_SERVER['TMP'] C:\Windows\TEMP
$_SERVER['USERDOMAIN'] WORKGROUP
$_SERVER['USERNAME'] SRV23-5DP-DE$
$_SERVER['USERPROFILE'] C:\Windows\system32\config\systemprofile
$_SERVER['windir'] C:\Windows
$_SERVER['ORIG_PATH_INFO'] /index.php
$_SERVER['URL'] /index.php
$_SERVER['SERVER_SOFTWARE'] Microsoft-IIS/10.0
$_SERVER['SERVER_PROTOCOL'] HTTP/1.1
$_SERVER['SERVER_PORT_SECURE'] 1
$_SERVER['SERVER_PORT'] 443
$_SERVER['SERVER_NAME'] wbt.warius.info
$_SERVER['SCRIPT_NAME'] /index.php
$_SERVER['SCRIPT_FILENAME'] D:\web\WT21Git\webtrees\index.php
$_SERVER['REQUEST_URI'] /admin/information
$_SERVER['REQUEST_METHOD'] GET
$_SERVER['REMOTE_USER'] no value
$_SERVER['REMOTE_PORT'] 62907
$_SERVER['REMOTE_HOST']
$_SERVER['REMOTE_ADDR']
$_SERVER['QUERY_STRING'] no value
$_SERVER['PATH_TRANSLATED'] D:\web\WT21Git\webtrees\index.php
$_SERVER['LOGON_USER'] no value
$_SERVER['LOCAL_ADDR'] 85.215.178.206
$_SERVER['INSTANCE_META_PATH'] /LM/W3SVC/1
$_SERVER['INSTANCE_NAME'] WTPROD
$_SERVER['INSTANCE_ID'] 1
$_SERVER['HTTPS_SERVER_SUBJECT'] CN=wbt.warius.info
$_SERVER['HTTPS_SERVER_ISSUER'] C=US, O=Let's Encrypt, CN=R3
$_SERVER['HTTPS_SECRETKEYSIZE'] 2048
$_SERVER['HTTPS_KEYSIZE'] 256
$_SERVER['HTTPS'] on
$_SERVER['GATEWAY_INTERFACE'] CGI/1.1
$_SERVER['DOCUMENT_ROOT'] D:\web\WT21Git\webtrees
$_SERVER['CONTENT_TYPE'] no value
$_SERVER['CONTENT_LENGTH'] 0
$_SERVER['CERT_SUBJECT'] no value
$_SERVER['CERT_SERIALNUMBER'] no value
$_SERVER['CERT_ISSUER'] no value
$_SERVER['CERT_FLAGS'] no value
$_SERVER['CERT_COOKIE'] no value
$_SERVER['AUTH_USER'] no value
$_SERVER['AUTH_PASSWORD'] no value
$_SERVER['AUTH_TYPE'] no value
$_SERVER['APPL_PHYSICAL_PATH'] D:\web\WT21Git\webtrees\
$_SERVER['APPL_MD_PATH'] /LM/W3SVC/1/ROOT
$_SERVER['IIS_UrlRewriteModule'] 7,1,1993,2351
$_SERVER['UNENCODED_URL'] /admin/information
$_SERVER['IIS_WasUrlRewritten'] 1
$_SERVER['HTTP_X_ORIGINAL_URL'] /admin/information
$_SERVER['HTTP_SEC_FETCH_USER'] ?1
$_SERVER['HTTP_SEC_FETCH_SITE'] same-origin
$_SERVER['HTTP_SEC_FETCH_MODE'] navigate
$_SERVER['HTTP_SEC_FETCH_DEST'] document
$_SERVER['HTTP_UPGRADE_INSECURE_REQUESTS'] 1
$_SERVER['HTTP_DNT'] 1
$_SERVER['HTTP_USER_AGENT'] Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0
$_SERVER['HTTP_TE'] trailers
$_SERVER['HTTP_REFERER'] https://wbt.warius.info/admin
$_SERVER['HTTP_HOST'] wbt.warius.info
$_SERVER['HTTP_COOKIE'] __Secure-WT-ID=2e24ba5eb497d1bf0ec0132bacf8f5c5
$_SERVER['HTTP_ACCEPT_LANGUAGE'] de,en-US;q=0.7,en;q=0.3
$_SERVER['HTTP_ACCEPT_ENCODING'] gzip, deflate, br
$_SERVER['HTTP_ACCEPT'] text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,/;q=0.8
$_SERVER['HTTP_CONTENT_LENGTH'] 0
$_SERVER['HTTP_CONNECTION'] close
$_SERVER['FCGI_ROLE'] RESPONDER
$_SERVER['PHP_SELF'] /index.php
$_SERVER['REQUEST_TIME_FLOAT'] 1673171317.277
$_SERVER['REQUEST_TIME'] 1673171317

@fisharebest
Copy link
Owner

Perhaps you could add some debug code here:

if (is_string($key) && preg_match('//u', $key) !== 1) {
throw new HttpBadRequestException('Invalid UTF-8 characters in request');
}
if (is_string($value) && preg_match('//u', $value) !== 1) {
throw new HttpBadRequestException('Invalid UTF-8 characters in request');
}

Write $key and $value to a log file. (If they contain invalid UTF characters, you probably cannot write them to the database).

@FrankWarius
Copy link
Author

I added in line 67
$x = preg_match('//u', $value, $match);
throw new HttpBadRequestException('Invalid UTF-8 characters in request (' . $value . ')');
and use XDebug (on 2.1.15)
$match: array(0)
$value: "P�ch" 'P\xE4ch'
$x: false

@fisharebest
Copy link
Owner

If this is CP1252, then \xE4 is ä - Päch

Can you add both $value and $key to the debug?

@FrankWarius
Copy link
Author

$value: "P�ch" 'P\xE4ch'
$key: "surname"
url now: https://wbt.warius.info/tree/Warius/branches/P%C3%A4ch

@FrankWarius
Copy link
Author

@FrankWarius
Copy link
Author

Anforderungs-URL: https://wbt.warius.info/tree/Warius/branches/P%C3%A4ch
Anforderungsmethode: GET
Statuscode: 500
Remoteadresse: 85.215.178.206:443
Referrer-Richtlinie: strict-origin-when-cross-origin
cache-control: no-store, no-cache, must-revalidate
content-encoding: gzip
content-length: 649
content-type: text/html; charset=UTF-8
date: Sun, 08 Jan 2023 15:51:58 GMT
expires: Thu, 19 Nov 1981 08:52:00 GMT
pragma: no-cache
server: Microsoft-IIS/10.0
vary: Accept-Encoding
x-powered-by: PHP/8.1.14
:authority: wbt.warius.info
:method: GET
:path: /tree/Warius/branches/P%C3%A4ch
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,/;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: de,de-DE;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6
cache-control: no-cache
cookie: __Secure-WT-ID=9f965da74fe2d009df90a681f0abb14e
dnt: 1
pragma: no-cache
sec-ch-ua: "Not?A_Brand";v="8", "Chromium";v="108", "Microsoft Edge";v="108"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "Windows"
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: none
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/108.0.0.0 Safari/537.36 Edg/108.0.1462.76

@FrankWarius
Copy link
Author

It's an issue of IIS URL Rewrite module wich decode the REQUEST_URI when rewriting.

XDebug shows the following server variables:
REQUEST_URI: "/tree/Warius/branches/P�ch" which has the wrong code page
UNENCODED_URL: "/tree/Warius/branches/P%C3%A4ch" which should be used
HTTP_X_ORIGINAL_URL: "/tree/Warius/branches/P%C3%A4ch" which is also correct

Webtrees should use UNENCODED_URL for IIS

I can also change the rewrite rule but I need some information
the actual rwrite action is
<action type="Rewrite" url="index.php" appendQueryString="true" />

I can add the unencoded_url to index.php but don't now how webtrees need it
<action type="Rewrite" url="index.php?{UNENCODED_URL}" appendQueryString="false" />?

@FrankWarius
Copy link
Author

fixed by adding
<set name="REQUEST_URI" value="{UNENCODED_URL}" />
to the IIS10 URL Rewirte Rule serverVariables

complete rule:
<rule name="Webtrees Rewrite" enabled="true" stopProcessing="true">
<match url="^" ignoreCase="false" />
<conditions logicalGrouping="MatchAll" trackAllCaptures="false">
<add input="{REQUEST_FILENAME}" matchType="IsDirectory" negate="true" />
<add input="{REQUEST_FILENAME}" matchType="IsFile" negate="true" />
</conditions>
<action type="Rewrite" url="index.php" appendQueryString="true" logRewrittenUrl="false" />
<serverVariables>
<set name="REQUEST_URI" value="{UNENCODED_URL}" />
</serverVariables>
</rule>

should we update the documentation?

@fisharebest
Copy link
Owner

There are two parts to this issue.

  1. webtrees detects this invalid character, and tries to give a 400 Bad Request response.

Currently, we check that the headers contain valid UTF8.
I think we should be more strict. The headers should be 7-bit ASCII

  1. the error page generates a similar error - and this gives a 500 response.

This needs to be fixed, so that we can give the correct 400 response and error message.

@FrankWarius
Copy link
Author

2 Notes:

  1. it is no longer an old search bot request issue. The error (on IIS, pretty-URL) occurs when querying family branches with names containing umlauts. https://wbt.warius.info/tree/Warius/branches/P%C3%A4ch?soundex_dm=0&soundex_std=0
  2. in each call of Validator.php __construct all DB parameters from config.ini.php are checked again (about 10 iterations until the error occurs) - The question arises whether this repetition within a session is necessary. -
    But more important is whether we want to restrict the DB attributes - especially dbpass - to ASCII 7.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants