Skip to content

Comments

[4.0] Fixing several issues in com_finder/Smart Search#29428

Merged
wilsonge merged 4 commits intojoomla:4.0-devfrom
Hackwar:j4finderpostgres
Jun 11, 2020
Merged

[4.0] Fixing several issues in com_finder/Smart Search#29428
wilsonge merged 4 commits intojoomla:4.0-devfrom
Hackwar:j4finderpostgres

Conversation

@Hackwar
Copy link
Member

@Hackwar Hackwar commented Jun 4, 2020

Smart Search right now is broken for Postgres. This doesn't fix everything, but I hope with these issues fixed, solving the other issues should become easier.

Clearing where without bounded

In the frontend, we are clearing most of our query at one point in time. However with the parameterized queries, this fails since the bound parameters are still present. So we have to clear that one as well.
Unfortunately now, the search in frontend still doesn't work with Postgres, since it fails on the HAVING clause. If someone can help here, that would be good. This should be solved now.

Indexing doesn't work in Postgres

In the code for the postgres indexing, a bunch of errors have crept in. This fixes those and while doing just that, I noticed something: The code is identical except for just a few lines, so we can thus drop the whole db driver system and simplify the code a lot.

@wilsonge Can we merge this without fixing the frontend search completely? I hope that I can find someone who can fix this when they see the whole code...

fixing several issues, esp Postgres
@richard67
Copy link
Member

richard67 commented Jun 4, 2020

@alikon Could you have a look on the having clause mentioned in the description of this PR?

You can find it here:
https://github.com/joomla/joomla-cms/blob/4.0-dev/components/com_finder/src/Model/SearchModel.php#L175

I tried it already, and to me it seems to be wrong, but I dont know how it was intended to work.

The SUM(t.node_id IN (' . implode(',', $groups[$i]) . ')) definitely means to build the sum of a boolean expression, SUM(t.node_id IN (1,2,3)), with t.node_id IN (1,2,3) being the boolean condition. MySQL seems to automatically cast the boolean condition to integer for summing it up, PostgreSQL correctly doesn't do such a sh..t.

I don't really understand what shall be achieved with this ... maybe you can give me a help?

Even it if shall not be solved with this PR, but maybe we can solve it soon?

@Hackwar
Copy link
Member Author

Hackwar commented Jun 5, 2020

That is not the only place we are adding to the having clause.

@richard67
Copy link
Member

@Hackwar I am busy with real job now, so won't have time before tonight or weekend, but maybe you can test following.
I think it should be 2 having clauses and not the wrong one with summing up a boolean expression (IN clause):

$query->having('t.node_id IN (' . implode(',', $groups[$i]) . ')');
$query->having('SUM(t.node_id) > 0');

Can you test if this works on PostgreSQL and verify if it delivers the same result on MySQL as the existing stuff does?

@alikon
Copy link
Contributor

alikon commented Jun 6, 2020

as this pr fix the index from backend in postgresql + more clean code + etc....
i wish to have this merged asap
so after we can look at:

I don't really understand what shall be achieved with this ... maybe you can give me a help?
Even it if shall not be solved with this PR, but maybe we can solve it soon?

@richard67 i'm in your same shoes here..... better @Hackwar can explain us "what shall be achieved" with that query

to me
https://github.com/joomla/joomla-cms/blob/4.0-dev/components/com_finder/src/Model/SearchModel.php#L172-L176

// Iterate through each taxonomy group.
for ($i = 0, $c = count($groups); $i < $c; $i++)
{
	$query->having('SUM(t.node_id IN (' . implode(',', $groups[$i]) . ')) > 0');
}

smells wrong, but , again i still don't understand the goal of that query ....

@alikon
Copy link
Contributor

alikon commented Jun 6, 2020

I have tested this item ✅ successfully on 6a0cc68

index from backend works now on postgresql


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/29428.

@richard67
Copy link
Member

@Hackwar Yes, I think the last change was it. LGTM.

@Hackwar
Copy link
Member Author

Hackwar commented Jun 6, 2020

The whole having clauses thing is related to making the query performant and making sure that all taxonomies and terms are included in the query result.

We have taxonomies like author, category, tags and we have the search phrase. The search phrase splits up into its terms and for each term we can have several hits in the terms table with stemmed terms, etc.
Consider a search for "Joomla awesome release" in the categories "Documentation" and "News" from the author "Hannes". Those taxonomies are all stored in the same table. The search phrase would be split up into its terms and by stemming the terms, we would end up with matches on "Joomla", "joomlaians", "awesome", "awesomeness", "release", "releaser", "released". We want only search results which contain a match for every term group (the term itself and all stemmed versions of it), which are from the author and which are from at least one of the 2 categories we selected. To achieve that in 3.x, we did a join for each taxonomy group and for every term group. So when I want to DoS a website, I just have to search for a 20-word-sentence and the database is cooking with a query with 20 joins...

In 4.0 I rewrote this to only join once for the taxonomies and once for the terms mapping table. In the where clause we restrict the result set down to whatever matches to any of the matching terms (so it matches not just to "Joomla", "awesome" and "release", but also to only "Joomla", only "awesome" and only "release") and then in the having clause, we require that it has to have at least one match for every term group. For this, we do a SUM() over the checks if the taxonomies or terms are in the results and if that is >0, we are good.

And while I'm writing all this, I found the solution for this and now it all works. Woohoo! Please test both for MySQL and Postgres, also with more complex searches.

@richard67
Copy link
Member

@alikon Could you test it again? Thanks in advance.

@alikon
Copy link
Contributor

alikon commented Jun 7, 2020

while testing i've found some issues with bytea and Gather Search Statistics enabled
sended a fix
Hackwar#46
Hackwar#47

@Hackwar
Copy link
Member Author

Hackwar commented Jun 7, 2020

Merged both. Good catch.

@richard67
Copy link
Member

@alikon Well, even if it was your fix: Could you test again so we have it counted on GitHub as a good test, too? It resets the counter after a new commit.

@richard67
Copy link
Member

I have tested this item ✅ successfully on 24e8633

Tested with MySQLi, MySQL (PDO) and PostgreSQL (PDO).
Search in frontend works, also if search term is not found.
Backend works (index rebuild, gathering statistics).
No SQL errors in database server log file.
No PHP errors or warnings or notices in PHP log file.


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/29428.

@alikon
Copy link
Contributor

alikon commented Jun 7, 2020

I have tested this item ✅ successfully on 24e8633


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/29428.

@alikon
Copy link
Contributor

alikon commented Jun 7, 2020

rtc


This comment was created with the J!Tracker Application at issues.joomla.org/tracker/joomla-cms/29428.

@joomla-cms-bot joomla-cms-bot added the RTC This Pull Request is Ready To Commit label Jun 7, 2020
@wilsonge
Copy link
Contributor

wilsonge commented Jun 7, 2020

@alikon @richard67 what I don't know enough about postgres is the pg_escape_bytea calls in the postgres files that we are removing here. That is going to be escaping quotes in some way. I want to be sure we're not going to be allowing in some sort of weird SQL Injection. I suspect with prepared statements we might now be covered by this. But I'm not really comfortable enough with postgres to be sure

@richard67
Copy link
Member

richard67 commented Jun 7, 2020

@wilsonge I don't know much about this either, so let's hope @alikon can answer that.

@alikon
Copy link
Contributor

alikon commented Jun 11, 2020

for what i can tell you we should be covered by:
https://github.com/joomla/joomla-cms/pull/29428/files#diff-7690a193a5a0fe8abc5651f7bddce847R95
->bind(2, $entry->query, ParameterType::LARGE_OBJECT)

@wilsonge wilsonge merged commit 3990fe2 into joomla:4.0-dev Jun 11, 2020
@wilsonge
Copy link
Contributor

Thanks!

@joomla-cms-bot joomla-cms-bot removed the RTC This Pull Request is Ready To Commit label Jun 11, 2020
@zero-24 zero-24 added this to the Joomla 4.0 milestone Jun 12, 2020
@Hackwar Hackwar deleted the j4finderpostgres branch June 19, 2020 21:27
sakiss pushed a commit to sakiss/joomla-cms that referenced this pull request Oct 16, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants