Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Free Listings + Paid Ads: Group variations type for syncable products #1713

Conversation

ianlin
Copy link
Member

@ianlin ianlin commented Oct 4, 2022

Changes proposed in this Pull Request:

This is part of the backend task 1 📌 Syncable Products Count from the epic issue #1610.

Updates on 6th Oct 2022

I reverted the commits for calculating synced and non synced products of product statistics API (the second part of change listed below), as there still be some performance problems that need to be fixed. For now this PR only solves the syncable products count problem so it's not blocking the release of the project. A P2 post will be created for the discussion about the solution of performance problems.

Original post

This PR has two changes:

  1. The syncable products count returned by the API GET mc/syncable-product-count includes variation products, so the total number will be a lot higher than the total products count in WC. This PR groups the variations as one if they have the same parent so the count becomes more reasonable.
  2. After the above change, the total number returned by the API GET mc/product-statistics is higher than the syncable product counts, confusion may be raised since they are inconsistent. The reason is the same as above, this PR groups the variations as one if they have the same parents when counting product statistics.

The logic of product statistics API contains two parts: counting synced products and counting not synced products. In this PR, counting synced products uses the similar approach like counting syncable products by getting a list of simple and variation products that marked as MC synced first, and group the variation products as one if they have the same parent.

However, counting not synced products uses a different approach by getting a list of simple and variable products that marked as MC not synced first, and check the variable products' children (variation products) if they are syncable, finally group the variation with the same parent and ignore the variable products with no variations. The reason it uses a different approach is it will process a lot faster than using the approach above. If using the above approach we will get a list of simple and variation that are marked as MC not synced, in my test environment it's more than 3000 items and looping through these items to check if the product is syncable takes a lot of time than looping through a list of simple and variable products, which in my local environment it's about 400 products.

Detailed test instructions:

Note: use GET mc/product-statistics/refresh to refresh product statistics, instead of getting the statuses from the transient.

Fully onboarded and products are already synced

  1. Use a fully onboarded account with products that are already synced to GMC.
  2. Schedule a job to calculate syncable products by calling POST mc/syncable-products-count.
  3. After the job is done, get the number of syncable products by calling GET mc/syncable-products-count.
  4. Get product statistics by calling GET mc/product-statistics/refresh.
  5. The syncable product count should be the same as the total number of product statistics.

New merchant center account that products are not synced yet

  1. Delete all your accounts from GLA's settings page.
  2. Start onboarding by clicking Start listing products →.
  3. Schedule a job to calculate syncable products by calling POST mc/syncable-products-count.
  4. Run through the onboarding, create a new merchant center account.
  5. After onboarded, do the above steps 3 to 5.

You may notice that the total number of statistics is sometimes less than syncable products count. This is because when just onboarded all the products are being marked as MC statuses like not synced or pending by the background product sync job. When getting product statistics the product MC statuses may still being written to the DB, so we couldn't get the exact number for it. Try fetching product statistics after a couple of seconds or minutes and the number will become accurate.

Changelog entry

- Find and return an array of WooCommerce product IDs that are syncable but marked as MC not_synced.
- Excludes:
	- variable products without variations, and
	- variable products with all of their variations are not syncable.
@ianlin ianlin added php Pull requests that update Php code changelog: none Skip changelog entry for this PR labels Oct 4, 2022
@ianlin ianlin requested a review from a team October 4, 2022 10:44
@ianlin ianlin self-assigned this Oct 4, 2022
@github-actions github-actions bot added the changelog: update Big changes to something that wasn't broken. label Oct 4, 2022
Comment on lines 285 to 303
// Get syncable products, and swap variation products for its parent.
$args = [
'status' => [ 'publish' ],
'type' => array_diff( ProductSyncer::get_supported_product_types(), [ 'variable' ] ),
'meta_query' => $this->get_sync_ready_products_meta_query(),
];

$syncable_product_ids = $this->product_helper->maybe_swap_for_parent_ids(
array_filter(
$this->find_ids( $args, $limit, $offset ),
function ( $product_id ) {
$product = $this->product_helper->get_wc_product( $product_id );
return $this->product_helper->is_sync_ready( $product );
}
)
);

// Get the product ids which their MC status is 'not synced' from the syncable products.
$find_by_ids_args = [
Copy link
Member Author

@ianlin ianlin Oct 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is being refactored by the next commit eb9f841 so it can be ignored. Keeping it in the commit log in case anyone is interested in seeing the difference between two approaches.

Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes. For the first part of the PR (combining variations) I think we are good. However as you can see in the comments I'm a bit worried about the way the rest of the calculations are done.

I wonder if we can preserve the syncable product counts (possibly array of ID's) a little longer. Take that count and subtract the products that have been successfully synced, to come up with the number of not_synced products. Although it might be hard to keep that number up to date if products are added/removed later on.

My other thoughts would be to have a DB lookup table that stores product ID's and it's statuses + relation to parent. That would allow direct results, but would need quite a substantial refactor.

$this->options->update( OptionsInterface::SYNCABLE_PRODUCTS_COUNT_INTERMEDIATE_DATA, [ ...$product_ids, ...$items ] );
$grouped_items = $this->product_helper->maybe_swap_for_parent_ids( $items );

$this->options->update( OptionsInterface::SYNCABLE_PRODUCTS_COUNT_INTERMEDIATE_DATA, [ ...$product_ids, ...$grouped_items ] );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not already store unique ID's here, will save us having to do it later when counting.

$args['type'] = array_diff( ProductSyncer::get_supported_product_types(), [ 'variable' ] );
$filtered_product_ids = array_flip( $product_repository->find_synced_product_ids( $args ) );
$filtered_product_ids = array_flip(
$product_helper->maybe_swap_for_parent_ids(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been working with an array of product ID's, but this call will load the product object for each ID to check if it should get the parent ID. Since caching is in place that can cause a lot of resource usage if we have a large amount of synced products. Have we tested whether this will hit any resource limits?

For aggregating by variable product we previously used product_data_lookup which would store the relation between a product ID and parent ID. This array would also get populated using get_post calls which doesn't load all the product data, and consumed less memory. Is that a pattern we should stick to? There might be some more history in previous PR's why we chose that implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For aggregating by variable product we previously used product_data_lookup which would store the relation between a product ID and parent ID.

Thanks for pointing this out, I just did a quick check and looks like I can get the correct synced count from product_data_lookup array. For the synced count I'd lean towards to this solution.

$args = [
'status' => 'publish',
'type' => $types,
// Get simple and variable products that are marked as MC 'not synced'.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have similar thoughts about this section. While switching to variable products can speed it up / limit resources. If we have a site with 10k simple products then this will still be an issue.

}

// Check if a simple product is syncable.
return $this->product_helper->is_sync_ready( $product );
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_sync_ready check here and for variable products will exclude some products but we have built in compatibility for bundles which is attached to woocommerce_gla_get_sync_ready_products_pre_filter. That means they'll be excluded from the syncable products, but not excluded here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out, this is something I've missed.

@ianlin
Copy link
Member Author

ianlin commented Oct 6, 2022

@mikkamp Thanks for the review. Agree that it'd be better to pull the part of counting statistics out of the PR to avoid blocking the release. I've reverted three commits and also updated the PR description.

A P2 post about solutions of counting statistics will be created soon.

I wonder if we can preserve the syncable product counts (possibly array of ID's) a little longer. Take that count and subtract the products that have been successfully synced, to come up with the number of not_synced products. Although it might be hard to keep that number up to date if products are added/removed later on.

I had thought of the same way but it's indeed not easy to keep the array of IDs up to date.

My other thoughts would be to have a DB lookup table that stores product ID's and it's statuses + relation to parent. That would allow direct results, but would need quite a substantial refactor.

I'd prefer using this solution as the query should be pretty fast. I will list it down in the P2 post.

@eason9487 eason9487 removed the changelog: none Skip changelog entry for this PR label Oct 7, 2022
@ianlin ianlin requested a review from mikkamp October 7, 2022 15:10
Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for simplifying the PR. The changes look good, I just noticed we might want to be more careful in case a product gets deleted during the counting process, so we don't get unexpected gaps in the count.

Are we also going to add a changelog entry, or are we content with it using the PR title?

Comment on lines 475 to 476
* @throws InvalidValue If a given ID doesn't reference a valid product. Or if a variation product does not have a
* valid parent ID (i.e. it's an orphan).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was confusing at first. It seems though that an orphan will get excluded when creating the batches of syncable products. So it shouldn't encounter any orphans here. However there is still a small chance that a parent or even just a regular product gets deleted after the batches get created (should that be covered by a unit test).

If that happens then wouldn't we just want to exclude those from the count, instead of making the whole batch fail? Which won't show as an error but instead excludes a large amount of products from the count.

@ianlin ianlin changed the title Free Listings + Paid Ads: Group variations type for syncable products and revise product statistics API to make its total count equals to syncable products Free Listings + Paid Ads: Group variations type for syncable products Oct 11, 2022
@ianlin
Copy link
Member Author

ianlin commented Oct 11, 2022

we might want to be more careful in case a product gets deleted during the counting process, so we don't get unexpected gaps in the count.

@mikkamp Thanks for pointing out this case, I have added new commits for solving this.

@ianlin
Copy link
Member Author

ianlin commented Oct 11, 2022

Are we also going to add a changelog entry, or are we content with it using the PR title?

Yeah, we will be using the PR title for the changelog entry.

Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the extra changes, I'm getting the results as expected. I just added a minor comment, but it looks ready to go.

*
* @since x.x.x
*/
public function maybe_swap_for_parent_ids( array $product_ids, bool $check_product_status = true, bool $ignore_product_on_error = true ) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is needed for a followup PR, but just curious why it needs options to change the behaviour.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My initial though was this function might be used in other cases in the future, so options could be flexible. We can remove them if there are not much uses tho.

@@ -189,4 +267,83 @@ public function test_update_syncable_products_count_no_syncable_products_multipl
do_action( self::CREATE_BATCH_HOOK, 2 );
do_action( self::CREATE_BATCH_HOOK, 3 );
}

public function test_update_syncable_products_count_product_gets_deleted_after_batches_get_created() {
// This is a product in the first batch that will be deteled after the first batch gets created.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a small typo.

Suggested change
// This is a product in the first batch that will be deteled after the first batch gets created.
// This is a product in the first batch that will be deleted after the first batch gets created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 3d64f9d.

@ianlin ianlin merged commit 8bf3aa3 into feature/1610-streamlined-onboarding Oct 12, 2022
@ianlin ianlin deleted the update/1610-group-variations-type-for-syncable-products branch October 12, 2022 03:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: update Big changes to something that wasn't broken. php Pull requests that update Php code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants