Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free Listings + Paid Ads: Extend product statistics API to return the number of syncable products #1667

Conversation

ianlin
Copy link
Member

@ianlin ianlin commented Sep 6, 2022

Changes proposed in this Pull Request:

This PR implements the backend task 1 📌 Extends mc/product-statistics API to return # of syncable products from the epic issue #1610. Here are what have changed:

  • Extend an existing API mc/product-statistics and return a new data syncable_products in the response.
  • To calculate syncable_products, we use an existing method find_sync_ready_products in src/Product/ProductRepository.php.

Google account connected and Merchant Centre setup complete

Previously when calling mc/product-statistics API:

  • If Google account is not connected, respond 401 Google account is not connected.
    • By checking the option gla_google_connected.
  • Or, if merchant centre setup is not complete, respond 400 Merchant Center account is not set up.
    • By checking the option gla_mc_setup_completed_at
    • This option will be set after calling the API mc/settings/sync.

In the step 4 of new onboarding flow (fqR0EHi63lWahRcVTKCcba-fi-449%3A80562), we haven't called the API mc/settings/sync so the option gla_mc_setup_completed_at is not being set. If we call mc/product-statistics API at this stage it will respond 400 Merchant Center account is not set up.

This PR also modifies the the above behaviour - if the merchant centre account is not set up (i.e. the option gla_mc_setup_completed_at is not being set), the API will still respond 200 with the data with syncable_products and the empty statistics:

{
    "timestamp": null,
    "statistics": null,
    "scheduled_sync": 0,
    "syncable_products": 29
}

Detailed test instructions:

  1. Disconnect all of the accounts from the settings page: wp-admin/admin.php?page=wc-admin&path=/google/settings.
  2. Use Postman, make a GET request to mc/product-statistics
    • The response should be 401 Google account is not connected.
  3. Go through the onboarding flow, stop at the step 4 Complete your campaign with paid ads.
  4. Use Postman, make a GET request to mc/product-statistics, the response should be something like this:
    {
        "timestamp": null,
        "statistics": null,
        "scheduled_sync": 0,
        "syncable_products": 29
    }
  5. Skip the ads setup or complete it, you should be redirected to the GLA home page
  6. Make a GET request to mc/product-statistics again, the response should be something like this:
    {
        "timestamp": 1662449383,
        "statistics": {
            "active": 0,
            "expiring": 0,
            "pending": 0,
            "disapproved": 23,
            "not_synced": 10
        },
        "scheduled_sync": 0,
        "syncable_products": 29
    }

Bonus test

Repeat the same tests as above on a site with a large amount of products 500+. Can use Smooth Generator to generate a set of test products. Note that this might generate some unsyncable products so the total syncable product count might vary.

The expected result is for the request to take longer, however it should still end up with an accurate count for the syncable products. It's stored in a transient, so the second time round should be faster.

The UI is going to be adjusted to show a message like "calculating product count" so it's not an issue if the request takes a little longer to complete.

@ianlin ianlin added php Pull requests that update Php code changelog: none Skip changelog entry for this PR labels Sep 6, 2022
@ianlin ianlin requested a review from a team September 6, 2022 08:30
@ianlin ianlin self-assigned this Sep 6, 2022
@github-actions github-actions bot added the changelog: update Big changes to something that wasn't broken. label Sep 6, 2022
@eason9487 eason9487 requested review from a team and removed request for a team September 15, 2022 12:24
Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, it's working and I'm getting the results as expected. However I have 500 products on my site and I'm already seeing slow responses, I added some suggestions for faster product queries.

I was expecting some more unit tests to break with these changes, but it seems we don't have unit tests for ProductStatisticsController or ProductSyncStats. Since we want to keep this moving I've logged a separate issue to get that done. See #1678

* @return int
*/
public function get_syncable_products_count(): int {
$products = $this->product_repository->find_sync_ready_products()->get();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all we are interested in is the count, then maybe we need to make a new helper function find_sync_ready_product_ids. Otherwise by default it will return objects, which means one DB query for the set of products and many more per product to get the full product data.

An even better way would be to get the create a helper function specifically for returning the total count. See the pagination section here: https://github.com/woocommerce/woocommerce/wiki/wc_get_products-and-WC_Product_Query#parameters

This is accomplished by sending the following args to the product query:

$args = [
	'return'   => 'ids',
	'paginate' => true,
	'limit'    => 1,
];

And then getting the total from $results->total. That prevents it from having to assemble a large collection of ID's and then having to count them all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On further inspection this is more of an issue than I expected. The initial DB query is a rough estimate, we then apply filters on the product to allow extensions and special conditions to be excluded as well.

So we'll have to find a balance between either returning a rough estimate (without filtering), or switch to retrieving the products in batches (so we can apply filtering per batch). Might need some caching of the total so we don't do this too often.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this performance problem and provided the solutions for it.

src/MerchantCenter/MerchantStatuses.php Outdated Show resolved Hide resolved
@mikkamp mikkamp requested a review from a team September 16, 2022 13:40
@puntope
Copy link
Contributor

puntope commented Sep 16, 2022

❓ I get this results

{
    "timestamp": 1662449383,
    "statistics": {
        "active": 0,
        "expiring": 0,
        "pending": 0,
        "disapproved": 20,
        "not_synced": 1
    },
    "scheduled_sync": 0,
    "syncable_products": 22
}

How come syncable products are 22 and disapproved + not synced are 21??

@puntope
Copy link
Contributor

puntope commented Sep 16, 2022

I saw the expected result for this PR after the changes. So It looks good for me:

I was not able to test with a huge amount of products.

So I was finally able to test with 1000 products

When I run the call in step 4 I got a fatal error after 10-15 secs wait

PHP message: PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 12288 bytes)

@mikkamp
Copy link
Contributor

mikkamp commented Sep 19, 2022

How come syncable products are 22 and disapproved + not synced are 21?

Good question, I think @ianlin also identified some discrepancies with the amount of counted products. It could be helpful if you check which product is not being included to see if we can track this down further. Probably best to move it to a separate issue.

When I run the call in step 4 I got a fatal error after 10-15 secs wait
PHP message: PHP Fatal error: Allowed memory size of 134217728 bytes exhausted (tried to allocate 12288 bytes)

Based on that error message your allowed memory size is set at 128MB, which is below the minimum recommendations. I initially didn't want to make the batch size too small, but can you try changing it to something like 200 here. We might need to keep it at a bit more conservative number in case there are servers with limited resources.

@ianlin
Copy link
Member Author

ianlin commented Sep 19, 2022

Good question, I think @ianlin also identified some discrepancies with the amount of counted products. It could be helpful if you check which product is not being included to see if we can track this down further. Probably best to move it to a separate issue.

Yeah, this problem was listed as a sub-task in the backend task 1 of the epic issue here.

Copy link
Contributor

@puntope puntope left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ LGTM

Seems for me that is is ready to go after all the tweaks.

Thanks @mikkamp and @ianlin

@mikkamp
Copy link
Contributor

mikkamp commented Sep 19, 2022

Seems for me that is is ready to go after all the tweaks.

Can you just confirm if changing the batch size worked for you to clear up the out of memory errors?
I still want to lower it but just wondering what amount works for you.

@mikkamp
Copy link
Contributor

mikkamp commented Sep 20, 2022

I was doing some additional testing and was able to reproduce the out of memory error by reducing the memory limit to 128MB and increasing the amount of products I have. Using a smaller batch size does not fix the issue. Apparently somewhere along the line it's not clearing up the previous set of products which I think has to do with populating the caches, but it would need some further digging.

I also noticed that the find_sync_ready_products finds both single products and variations whereas when we calculate the product statistics we get from the Merchant Center we count each variable product as just one item. So there is going to be a discrepancy there with product count. If we need to group the variations to achieve a proper count then the query would need to be completely adjusted and the filters won't work the same.

@ianlin
Copy link
Member Author

ianlin commented Sep 21, 2022

I did some tests and here's my result:

  • Memory limit: 128M
  • Batch size: 500
  • Total products: 1190

The only case I encountered OOM problem is right after I created 100 products at once using wc-smooth-generator, and calling the API without the cache. I think it's because when products were created the GMC product sync jobs were scheduled as well, so calling the product statistics API at this point would cause OOM. If I manually added the cache to my local DB, the OOM issue didn't occur and the API respond correctly. After all the GMC product sync jobs were completed, the OOM issue didn't occur even when the cache didn't exist. As a result I think it'd be acceptable for setting 500 as batch size.

@ianlin
Copy link
Member Author

ianlin commented Sep 21, 2022

I just found a weird issue that I got 4075 syncable_products while the total products I've got in WC is 1190. I will keep looking into this.

@mikkamp
Copy link
Contributor

mikkamp commented Sep 21, 2022

If I manually added the cache to my local DB, the OOM issue didn't occur and the API respond correctly.

Can you clarify what you mean by adding the cache to your local DB? I traced the memory usage and it seems to be a combination of the WP meta cache as well as the WC_Cache_Helper class which caches the product data.

Here is an overview of memory usage (logging offset, count, peak memory usage, current memory usage)

[20-Sep-2022 16:46:14 UTC] 500 99 58403720 57229432
[20-Sep-2022 16:46:16 UTC] 1000 301 75588288 75512832
[20-Sep-2022 16:46:18 UTC] 1500 782 99822976 99768328
[20-Sep-2022 16:46:19 UTC] 2000 1282 121776120 121731080
[20-Sep-2022 16:46:19 UTC] PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 20480 bytes) in wp-includes/meta.php on line 1189

I can reduce the memory usage by calling wp_cache_flush() within the loop but it still uses a substantial amount:

[20-Sep-2022 16:45:42 UTC] 500 99 58403720 57229432
[20-Sep-2022 16:45:44 UTC] 1000 301 66749288 66519048
[20-Sep-2022 16:45:46 UTC] 1500 782 80233296 80178648
[20-Sep-2022 16:45:48 UTC] 2000 1282 92011224 91966184
[20-Sep-2022 16:45:48 UTC] 2279 1561 100752624 94368768

The WC cache does it per product, so I can't find one function to clear all caching. We could try clearing it in a loop but that would slow down the call a bit more.

I just found a weird issue that I got 4075 syncable_products while the total products I've got in WC is 1190. I will keep looking into this

The query retrieves the product types simple and variation (not variable) so with a lot of product variation this is expected to deviate quite a bit (which is what I mentioned here)

@ianlin
Copy link
Member Author

ianlin commented Sep 21, 2022

Can you clarify what you mean by adding the cache to your local DB?

Sorry for the confusion, what I really meant is the transient (_transient_gla_syncable_products_count and _transient_timeout_gla_syncable_products_count) we stored in the wp_options table.

Based on discussion during the dev call, we decided to close this PR and:

  1. Create a new endpoint to retrieve the count of syncable products.
  2. Create a separate job to calculate the number of syncable products, and trigger it when the user just start the onboarding flow.

For the discrepancies with the amount of counted products, per Mik's finding here (#1667 (comment)), we need to group the product with type variations to get the correct count, but shouldn't affect how many products that are really synced to GMC. This will be a separate PR from the above new endpoint.

@mikkamp
Copy link
Contributor

mikkamp commented Sep 21, 2022

we need to group the product with type variations to get the correct count, but shouldn't affect how many products that are really synced to GMC

Since we want to run the same filter the grouping needs to be done after the filter. So my thoughts would be for the Job storing the count to store (unique) included ID's, instead of storing a count as it runs through the batches. This is because of batch one could contain 5 variations of the same product and batch 2 could contain another 5. So if we store the parent ID as counted it won't count it twice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: none Skip changelog entry for this PR changelog: update Big changes to something that wasn't broken. php Pull requests that update Php code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants