-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unescaping already escaped xml URLs #460
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @wblachowski i think you observed something important in AET suite. The fact, that we are escaping URLs: special XML characters are escaped - i.e. &, "
are escaped... to &, "
.
But something is not right here. Isn't it that:
StringEscapeUtils.escapeXml11(StringEscapeUtils.unescapeXml( <something> ) === <something>
??
Motivation:
I believe someone might want to test proper URLs decoding on site with:
https://www.google.pl/search?q=ąę¶m=12
https://www.google.pl/search?param=12&q="ąę ćń"
What should be put into suite XML as URL? Currently we are escaping special chars like &
before parsing content from XML into TestSuite object isn't it? I believe this is done because we wanted to be kind to end-users. In other words we were afraid, that they will not be able to correctly escape special entities for XML attribute.
But what should we do with this URL:
<url href="https://www.google.pl/search?param=12&q="" />
If something like that is placed inside suite - what page should AET test? Currently it will do search for "
, but after the proposed change it will search for "
?
So maybe we could do something different?
Proposition:
Let's agree, that suite XML is a XML file. So content of this file needs to be properly escaped. This is also correct for XML attributes for regular expressions.
In my opinion we should NOT have any escaping utils. User needs to properly escape an URL with XML entities:
https://www.google.pl/search?param=12&q="
or query string (URI component) encodings:
https://www.google.pl/search?param=12&q=%22
EscapeUtils
class could be removed and we can introduce UrlValidator
class, that looks into XML content, finds values of href attributes and validates if the URL is properly escaped - returning nice error message for users that provide invalid URLs.
Hello @wiiitek, you've made a few good points and I feel obliged to clear up some misunderstandings.
That's true only for escaped strings. Take for example this string: Secondly:
You are right that this url will in the end be serialized to In conclusion: |
Thank you, yes I agree :
But it might be confusing what one should do in case of testing escaped special characters on search pages. Maybe we should update documentation? ;p ? |
Dealing with In fact, in your screenshot they are escaped but it appears that chrome tries to be user friendly and displays prettified version in the url bar. Fortunately, if you copy the address in the url bar and paste it in a text editor you get escaped version. I carried out a small experiment and pasted
Dealing with The only problem left is I have to admit this is more complicated than it seemed :p I wonder what are your thoughts |
Thanks @wblachowski for your answer! I see you put an effort to look into this. Please make a decision on this topic after some consultations with @Skejven and @mchrominski. Personally I like the approach with with "valid XML" and making no escaping/unescaping but i understand that this could be not acceptable because of end-users. My other thought is that the XML we are using is not friendly format - we don't have XML schema (I remember there were attempts to prepare one). So any other format would be better ;p Thank you again for your work. I believe you will make good decision. Cheers! |
Whole this topic is resolved here: #524 |
I made a change fixing unescaped URLs in report page. For details see #441.
Description
Current implementation expects unescaped URLs in suite XML (e.g
(...)?a=b&c=d
) which are then escaped before serializing XML to TestSuite. If you put correctly escaped XML in URL (e.g(...)?a=b&c=d
) then the query params are escaped to(...)?a=b&amp;c=d
, parsed back (while serializing) to(...)a=b&c=d
and this version is then displayed in report page.This redundancy in escaping is what causes the issue. If the URLs are already escaped they shouldn't be tampered with. My fix uses the simpliest solution, i.e it tries to unescape before escaping, thus handling correctly both unescaped (unescape has no effect here) and escaped URLs.
I also added a unit test for escaped URLs.
Motivation and Context
Closes #441
Types of changes
Checklist:
I hereby agree to the terms of the AET Contributor License Agreement.