Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix dedupe finder performance issue on looking up table size #25527

Merged
merged 1 commit into from
Mar 10, 2023

Conversation

eileenmcnaughton
Copy link
Contributor

@eileenmcnaughton eileenmcnaughton commented Feb 8, 2023

Overview

Fix dedupe finder performance issue on looking up table size

Before

Our dedupe queries seem to be particularly slow, rule dependent, and I found that the the issue was slow SELECT COUNT(*) FROM civicrm_address etc queries. We recently fixed some of these in other place (SnapShot, an upgrade script). At the time I was under the impression the technique used here also had performance issues - ie

      SELECT TABLE_ROWS as row_count, TABLE_NAME as table_name FROM information_schema.TABLES WHERE
      AND TABLE_SCHEMA = DATABASE();

However, I tested that query and it is not slow

414 rows in set, 1 warning (0.103 sec)

And is quicker for less tables.

I also traced back the origin of the code comment to that effect & nothing in there seems to back it up

#8636
https://issues.civicrm.org/jira/browse/CRM-19027

After

Use of a faster function, which we can also use from those other places

Technical Details

This is very hard to access under r-run as it deals with the ordering of queries for a perceived performance improvement - however I could step through it with CRM_Contact_Import_Parser_ContactTest.testIgnoreLocationTypeId and per the screenshots you can see that it re-ordered the array to put the civicrm_address query first based on the table size being smaller

Original sort order

See in the variable $tableQueries the query involving the contact table is first

image

It decides the the table lengths

See in the $cachedResults there are more rows in the contact table

image

**And it re-orders based on that **

See in the variable $tableQueries the query involving the (shorter) address table is now first

image

Comments

@civibot
Copy link

civibot bot commented Feb 8, 2023

(Standard links)

@civibot civibot bot added the master label Feb 8, 2023
@eileenmcnaughton
Copy link
Contributor Author

Hmm - well if that is the test that hits it it is not where I expected ....

CRM_Contact_Import_Parser_ContactTest.testIgnoreLocationTypeId

@eileenmcnaughton eileenmcnaughton force-pushed the finder_build branch 2 times, most recently from 4e4f886 to bd83298 Compare February 8, 2023 22:11
@eileenmcnaughton eileenmcnaughton force-pushed the finder_build branch 3 times, most recently from 3c46200 to 35f33db Compare February 9, 2023 06:15
wmfgerrit referenced this pull request in wikimedia/wikimedia-fundraising-crm Feb 16, 2023
Note we have some recent patches that are included. These are mostly ones merged to
civicrm-core since forking for 5.58. One is in 5.58 now so is listed for
completeness & 2 minor ones are not yet merged upstream

|pr|status|notes|
|[25565](https://github.com/civicrm/civicrm-core/pull/25565)|merged to master(5.60)|Supports T327963 - activity types for thank you|
|[25516](https://github.com/civicrm/civicrm-core/pull/25516)|merged to master(5.60)|Helps import - filter imported searches|
|[25527](https://github.com/civicrm/civicrm-core/pull/25527)|not yet merged| but performance fix for dedupe searches (the name & address ones Sandra is doing)
|[25433](https://github.com/civicrm/civicrm-core/pull/25433)|merged to rc (5.59)|fix for search kit display|
|[25482](https://github.com/civicrm/civicrm-core/pull/25482)|Already in 5.58.1|Fix for search kit display|
|[25418](https://github.com/civicrm/civicrm-core/pull/25418)|Merged to rc (5.59)|Permanent imap timout fix|
|[25226](https://github.com/civicrm/civicrm-core/pull/25226)|Merged to rc (5.59)|ReportInstance api - in our install|
|[25568](https://github.com/civicrm/civicrm-core/pull/25568)|not yet merged|Helps with new import code, reduces chance of crash|
|[24515](https://github.com/civicrm/civicrm-core/pull/25415)|Merged to rc (5.59)| Fix to stop CI from breaking, replaces our hack|

List of cherry picked commits from the above

18d82af27b9 (HEAD -> master) Filter expired searches from search kit results
a75b98cb47a Add test to is_current on SavedSearch, fix
e4d69230ad1 Increase timeout on imap
89ab855619f Do not crash the whole SearchKit UI if one entity fails
0142e5dd18c Fix performance on deciding which tables to query
cd710af9733 dev/core#4117 Add is_current to UserJob, Search
2fa17fb88ae Make activity_type_id available to links, test
2d2bd93cda8 (dev/core#4088) ClassScanner - Move unit-test registration
e3e669d4556 Add Report Instance apiv4

Bug: T329681
Change-Id: I839d1bbfabfe3558094f7445a4281f237b609fed
@eileenmcnaughton
Copy link
Contributor Author

we've been running this in production for several weeks now

@colemanw
Copy link
Member

colemanw commented Mar 8, 2023

I'm happy with this between the r-run in production and the test coverage.

@eileenmcnaughton
Copy link
Contributor Author

test this please

@eileenmcnaughton eileenmcnaughton merged commit fd60462 into civicrm:master Mar 10, 2023
@eileenmcnaughton eileenmcnaughton deleted the finder_build branch March 10, 2023 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants