Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed boost for civicrm/ajax/checkemail #15824

Merged
merged 1 commit into from
Feb 14, 2020

Conversation

mfb
Copy link
Contributor

@mfb mfb commented Nov 11, 2019

Overview

This PR provides some speed boosts for the civicrm/ajax/checkemail endpoint (used, for example, when adding a cc: email address to an email message).

Before

Whatever text was entered will be searched anywhere in the email or name columns via LIKE query. This is slow because no index is used. When adding a cc: email address, it can take a minute for ajax search results to show up on a large Civi database (millions of contacts). In addition, for each letter the user types, an additional long-running query is spawned (their old ajax requests continue to run in the background on the server).

After

  • Drop support for some apparently unused $_GET parameters (hopefully no one is using this endpoint like some sort of API? :)
  • Build a UNION query on email and sort_name
  • With automatic wildcard setting enabled, ajax response should be about the same speed as before. With automatic wildcard setting disabled, the ajax response should now be fairly instantaneous (i.e. much faster on large database).

@civibot
Copy link

civibot bot commented Nov 11, 2019

(Standard links)

@civibot civibot bot added the master label Nov 11, 2019
@eileenmcnaughton
Copy link
Contributor

@colemanw do you have any thoughts - it looks a bit hairy from a maintenance point of view since we haven't hacked in ft indexes into any other random places & it's not using an api

@mfb
Copy link
Contributor Author

mfb commented Nov 12, 2019

This is a critical UX improvement for us so I think it's worth maintaining? Our fundraising team won't have to waste extra hours when sending emails :) I'd be happy to do more work on incorporating use of ft indexes into APIs or whatever else is helpful.

@mlutfy
Copy link
Member

mlutfy commented Nov 15, 2019

Would it be worth adding a comment in the code linking back to this PR? I think the PR description by mfb is very helpful, but just from reading the code, I would be wondering why it's written that way.

@demeritcowboy
Copy link
Contributor

I can give this a spin.

@eileenmcnaughton
Copy link
Contributor

Note that the way we fixed the query in quicksearch was to use a UNION rather than an OR - which was not dependent on mysql settings

@mfb
Copy link
Contributor Author

mfb commented Nov 21, 2019

OK I have another idea for this - instead of using the fulltext index, we could do UNION query on sort_name and first_name, and only search from the start of the name column i.e. LIKE '$name%' - this way it can use the indexes on these columns. This would search by both first name and last name, without having to scan the column.

@eileenmcnaughton
Copy link
Contributor

@mfb the quicksearch code respects the setting for adding a wildcard at the beginning - I'm sure you have that setting disabled, as we do, but it would be consistent to respect it.

@mfb
Copy link
Contributor Author

mfb commented Nov 21, 2019

I tried this out and found that when a user types the first letter say "m", many hundreds of thousands of results are returned by each subquery of the UNION - this data has to be loaded into a temporary table before it's eventually limited to 20 rows. It can take 25 seconds for the query to run, which is not great for live typing.

The simple way to avoid this is to add a LIMIT clause to each subquery, so there is no part of the query returning huge amounts of data. Then the UNION query is lightning fast.

But, do you think we could drop support for the offset $_GET parameter? Or is it used some place?

Because it would be tricky to use LIMIT clauses on the subqueries and make the offset work as expected.

Maybe we can also drop support for the noemail $_GET parameter? I don't see evidence of that string anywhere in Civi.

@eileenmcnaughton
Copy link
Contributor

So this is only called from one place

  var sourceDataUrl = "{/literal}{crmURL p='civicrm/ajax/checkemail' q='id=1' h=0 }{literal}";

  function emailSelect(el, prepopulate) {
    $(el, $form).data('api-entity', 'contact').css({width: '40em', 'max-width': '90%'}).crmSelect2({
      minimumInputLength: 1,
      multiple: true,
      ajax: {
        url: sourceDataUrl,
        data: function(term) {
          return {
            name: term
          };
        },
        results: function(response) {
          return {
            results: response
          };
        }
      }
    }).select2('data', prepopulate);
  }

And I can't see how noemail or offset could be set from there

@mfb mfb force-pushed the email-search-speed-boost branch 2 times, most recently from 4e7e2ff to b813e2c Compare November 21, 2019 04:53
@mfb
Copy link
Contributor Author

mfb commented Nov 21, 2019

Ok it's now working nice'n'fast with UNION query. I added support for searching by first_name, but it's not required if we want to limit the search to sort_name and email.

@eileenmcnaughton
Copy link
Contributor

I don't have a strong feeling on first name - this looks generally good to me & if @jitendrapurohit (or maybe @pfigel given it touches on performance & security) are able to review I'd be happy to merge it

@mfb
Copy link
Contributor Author

mfb commented Nov 21, 2019

For a future PR - maybe we could allow users to search for contacts by typing "{first_name} {last_name}" in addition to "{last_name}, {first name}" ?

@jitendrapurohit
Copy link
Contributor

I applied this PR and looked into the queries that are formed when I enter first 4 letters of my name(jite) in the To field of Email form.

I executed these queries on a database having >516k rows in civicrm_contact table and >300K rows in civicrm_email table.

Before the patch -

SELECT sort_name name, ce.email, cc.id
FROM   civicrm_email ce INNER JOIN civicrm_contact cc ON cc.id = ce.contact_id   
WHERE  ce.on_hold = 0 AND cc.is_deceased = 0 AND cc.do_not_email = 0 AND  ( cc.sort_name LIKE '%jite%' OR ce.email LIKE '%jite%' ) 
    AND cc.is_deleted = 0
LIMIT 0, 10

7 rows in set (0.98 sec)

After -

(SELECT sort_name name, ce.email, cc.id 
  FROM   civicrm_email ce 
    INNER JOIN civicrm_contact cc ON cc.id = ce.contact_id 
  WHERE  ce.on_hold = 0 
    AND cc.is_deceased = 0 
    AND cc.do_not_email = 0 
    AND cc.sort_name LIKE '%jite%'
    AND cc.is_deleted = 0 LIMIT 10) 
UNION ( 
  SELECT sort_name name, ce.email, cc.id 
  FROM   civicrm_email ce 
    INNER JOIN civicrm_contact cc ON cc.id = ce.contact_id 
  WHERE  ce.on_hold = 0 
    AND cc.is_deceased = 0 
    AND cc.do_not_email = 0 
    AND cc.first_name LIKE '%jite%'
    AND cc.is_deleted = 0 LIMIT 10)
UNION (
  SELECT sort_name name, ce.email, cc.id 
  FROM   civicrm_email ce 
  INNER JOIN civicrm_contact cc ON cc.id = ce.contact_id 
  WHERE  ce.on_hold = 0 
    AND cc.is_deceased = 0 
    AND cc.do_not_email = 0 
    AND ce.email LIKE '%jite%' 
    AND cc.is_deleted = 0 LIMIT 10 ) 
LIMIT 10

7 rows in set (1.90 sec)

which is ~1 sec more than the previous query. If I remove the clause of first_name from the above UNION statement, the result returned is -

7 rows in set (1.05 sec)

I tried this for 4-5 times and the result was similar in all the cases. Not sure I was able to test the improved performance on the results here. When viewing the loading of emails on UI, I think it was slightly faster before the patch. @mfb Do you think I've missed anything here OR pls let me know how you see different behavior on your side.

@mfb
Copy link
Contributor Author

mfb commented Nov 29, 2019

@jitendrapurohit you need to turn off the setting for adding a wildcard at the beginning to see the huge speed boost.

(I would say that setting should be the default, but that's for another PR.)

@jitendrapurohit
Copy link
Contributor

jitendrapurohit commented Dec 2, 2019

Test 2 with Automatic Wildcard setting disabled.

Query formed -

(SELECT sort_name name, ce.email, cc.id
  FROM   civicrm_email ce 
    INNER JOIN civicrm_contact cc ON cc.id = ce.contact_id
  WHERE  ce.on_hold = 0 
    AND cc.is_deceased = 0 
    AND cc.do_not_email = 0 
    AND cc.sort_name LIKE 'jite%'
    AND cc.is_deleted = 0
  LIMIT 10) 

UNION (
  SELECT sort_name name, ce.email, cc.id
  FROM   civicrm_email ce INNER JOIN civicrm_contact cc ON cc.id = ce.contact_id
  WHERE  ce.on_hold = 0 AND cc.is_deceased = 0 AND cc.do_not_email = 0 AND cc.first_name LIKE 'jite%'
    AND cc.is_deleted = 0
  LIMIT 10) 

UNION (
  SELECT sort_name name, ce.email, cc.id
  FROM   civicrm_email ce INNER JOIN civicrm_contact cc ON cc.id = ce.contact_id
  WHERE  ce.on_hold = 0 AND cc.is_deceased = 0 AND cc.do_not_email = 0 AND ce.email LIKE 'jite%'
    AND cc.is_deleted = 0
    LIMIT 10) 
LIMIT 10

4 rows in set (0.01 sec)

That is surely a huge improvement noticed here related to fetching the emails when the wildcard setting is off.

Also tested some basic functionalities after the PR and looks fine to me. If we're ok to ignore the wildcard performance, I think it is good to merge 👍

@mfb
Copy link
Contributor Author

mfb commented Dec 2, 2019

I think maybe I should back out the search by first name to avoid that slowdown with the include wildcard setting turned on? This would need an additional index to make faster.. so could be worked on in a separate PR.

@colemanw
Copy link
Member

colemanw commented Dec 2, 2019

@mfb thanks for your work on this!
Looking at the previous code I don't see where it was searching first name before. Is that your addition? And if so, did you add that in to mitigate the problem of having to type last name first?

In general, fields like first_name and last_name are not something we perform searches on, preferring to use the calculated fields sort_name and display_name which provide more data for searching and are universal to all contact types not just Individuals.

@mfb
Copy link
Contributor Author

mfb commented Dec 2, 2019

Yes I don't think I should add this feature here (but I would like to work on this in the future, since typical email UIs do allow users to do a quick search/entry via "{first_name} {last_name}")

  * Remove support for unused $_GET params.
  * Build a UNION query on email and sort_name.
  * Respect includeWildCardInName setting.
@mfb mfb force-pushed the email-search-speed-boost branch from c9e6bc2 to 3c0dc53 Compare December 2, 2019 17:29
@mfb
Copy link
Contributor Author

mfb commented Dec 2, 2019

@colemanw display_name isn't useful for searching because it looks like Ms. Kiara Parker - whereas we would want to do an indexed search on Ki% as the user types first_name last_name. In any case, ok I pushed a commit the backs out the first_name feature since it would need a lot more work..

@eileenmcnaughton
Copy link
Contributor

This was pretty hard to read because it included a big cleanup in the same PR as the main change. However, I worked through it & I agree the removed code is not used and once I'd resolved that it got pretty readable :-)

It looks like you backed out changing fields (per discussed above) and went for respecting the DB setting - which is also respected in quicksearch so users of DBs with that enabled will be used to it.

There are more questions here - should the function exist at all , the ones above etc. But I think this change is appropriate to merge as it improves code quality and fixes the performance issue in a way that is consistent with quick search

@eileenmcnaughton eileenmcnaughton merged commit 6e7a095 into civicrm:master Feb 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants