-
-
Notifications
You must be signed in to change notification settings - Fork 3
Description
The best example of this is probably the stub that gets made for certain URLs that redirect https://github.com/mitchcapper/httrack/blob/master/src/htsparse.c#L3610. When that gets parsed, the Content-Type <meta> tag is skipped, which then means the <meta> tag detection never notices the next one that does the redirect. The URL is still detected, but because the parser hasn't noticed it's in a <meta> tag, it doesn't stop at the closing ", and thinks that's still part of the URL, only stopping at the closing >. This then means a bogus URL gets archived and the redirect page ends up malformed, so doesn't redirect or display the later link to manually click on properly.
Everything seems to work if I add a continue here https://github.com/mitchcapper/httrack/blob/master/src/htsparse.c#L715 to go back to the start of the parsing loop when the position pointer's been advanced past the tag that's being skipped.