Skip to content

Conversation

@ggppdk
Copy link
Contributor

@ggppdk ggppdk commented May 21, 2017

Pull Request for Issue #7927

Summary of Changes

Currently HTML parser of the com_finder INDEXER is adding spaces after
the endings of all HTML tags so that words at edges of BLOCK Tags are spaced properly
but this destroys words partly enclosed via inline HTML tags!
e.g. with input:

<h1>Title</h1><p>Paragraph</p>
<strong>M</strong>obile

Title Paragraph ....... which is correct (block tags)
M oblie ....... which is broken, (inline tags)


This PR adds the space at the begining of the tags,
thus avoiding a costly / more costly regular expression to add it at the end !

Not all regular expressions are "performance evil", they can be very fast by e.g.
-- Starting with fixed texts, and also using OR on fixed texts
-- Also the performance of a regular expression can depend a lot on the input

Testing Instructions

  • Test that words are spaced correctly for BLOCK tags and NOT SPACED for INLINE Tags
  • Test that performance of indexer is not changed

Documentation Changes Required

None

Maybe add some more unit tests ? to have some test for inline tags too ?

@ggppdk
Copy link
Contributor Author

ggppdk commented May 21, 2017

Please tell me if i have gotten the list of desired tags correctly

// being transformed into 'TitleParagraph' with no space.
$input = str_replace('>', '> ', $input);
// Add a space before both the OPEN and CLOSE tags of BLOCK and LINE BREAKING elements,
// e.g. 'all<h1><strong>m<strong/>obile</h1>List' will become 'all mobile list'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if you have a header element here this is expected behaviour. Surely a header indicates something semantically different (i.e. not just a space)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is just an example,
you can suggest a better / different example to use in this comment

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that example doesnt work as it is invalid html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want me to change the example in the comments ? or mention that it will work even with invalid HTML ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that H1 is always a block element and even tinymce will correct it so a better example would be

  •   // e.g. 'all<h1><em>m</em>obile  List</h1>'
    

@brianteeman
Copy link
Contributor

I have tested this item ✅ successfully on ed675d2

Before test Mobile not found
After test Mobile found


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/16165.

// being transformed into 'TitleParagraph' with no space.
$input = str_replace('>', '> ', $input);
// Add a space before both the OPEN and CLOSE tags of BLOCK and LINE BREAKING elements,
// e.g. 'all<h1><em>m</em>obile List</h1>' will become 'all mobile list'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it correct that List becomes list (lowercase l) and 2 spaces become 1 space?

Copy link
Contributor Author

@ggppdk ggppdk May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Capital / lowercase, these are not changed, i have corrected the example,
about multiple spaces becoming 1, there is no problem, is there a reason that this should be a problem ?

Copy link
Contributor Author

@ggppdk ggppdk May 10, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked at it again to remember the expression when it was written.
An extra space is added in front of (BLOCK) tags
The other spaces everywhere are not touched at all

@ghost ghost added the J3 Issue label Apr 5, 2019
@ghost ghost removed the J3 Issue label Apr 19, 2019
@ghost
Copy link

ghost commented Jul 19, 2019

I have tested this item ✅ successfully on 2254a9b


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/16165.

@ghost
Copy link

ghost commented Jul 19, 2019

@brianteeman can you please retest?

@brianteeman
Copy link
Contributor

no need to retest as the only change was in a comment

@ghost
Copy link

ghost commented Jul 19, 2019

Status "Ready To Commit".

@joomla-cms-bot joomla-cms-bot added the RTC This Pull Request is Ready To Commit label Jul 19, 2019
@HLeithner HLeithner merged commit 2baa652 into joomla:staging Jul 22, 2019
@HLeithner
Copy link
Member

thx

@joomla-cms-bot joomla-cms-bot removed the RTC This Pull Request is Ready To Commit label Jul 22, 2019
@HLeithner HLeithner added this to the Joomla! 3.9.11 milestone Jul 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants