-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Fix indexer breaking words when they are partly enclosed via inline HTML tags #16165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Please tell me if i have gotten the list of desired tags correctly |
| // being transformed into 'TitleParagraph' with no space. | ||
| $input = str_replace('>', '> ', $input); | ||
| // Add a space before both the OPEN and CLOSE tags of BLOCK and LINE BREAKING elements, | ||
| // e.g. 'all<h1><strong>m<strong/>obile</h1>List' will become 'all mobile list' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if you have a header element here this is expected behaviour. Surely a header indicates something semantically different (i.e. not just a space)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is just an example,
you can suggest a better / different example to use in this comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that example doesnt work as it is invalid html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You want me to change the example in the comments ? or mention that it will work even with invalid HTML ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean that H1 is always a block element and even tinymce will correct it so a better example would be
-
// e.g. 'all<h1><em>m</em>obile List</h1>'
|
I have tested this item ✅ successfully on ed675d2 This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/16165. |
| // being transformed into 'TitleParagraph' with no space. | ||
| $input = str_replace('>', '> ', $input); | ||
| // Add a space before both the OPEN and CLOSE tags of BLOCK and LINE BREAKING elements, | ||
| // e.g. 'all<h1><em>m</em>obile List</h1>' will become 'all mobile list' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it correct that List becomes list (lowercase l) and 2 spaces become 1 space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Capital / lowercase, these are not changed, i have corrected the example,
about multiple spaces becoming 1, there is no problem, is there a reason that this should be a problem ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked at it again to remember the expression when it was written.
An extra space is added in front of (BLOCK) tags
The other spaces everywhere are not touched at all
|
I have tested this item ✅ successfully on 2254a9b This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/16165. |
|
@brianteeman can you please retest? |
|
no need to retest as the only change was in a comment |
|
Status "Ready To Commit". |
|
thx |
Pull Request for Issue #7927
Summary of Changes
Currently HTML parser of the com_finder INDEXER is adding spaces after
the endings of all HTML tags so that words at edges of BLOCK Tags are spaced properly
but this destroys words partly enclosed via inline HTML tags!
e.g. with input:
Title Paragraph ....... which is correct (block tags)
M oblie ....... which is broken, (inline tags)
This PR adds the space at the begining of the tags,
thus avoiding a costly / more costly regular expression to add it at the end !
Not all regular expressions are "performance evil", they can be very fast by e.g.
-- Starting with fixed texts, and also using OR on fixed texts
-- Also the performance of a regular expression can depend a lot on the input
Testing Instructions
Documentation Changes Required
None
Maybe add some more unit tests ? to have some test for inline tags too ?