Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: dq_category_specific_ingredient_percent_2 #9606

Merged
merged 1 commit into from
Jan 14, 2024

Conversation

benbenben2
Copy link
Collaborator

What

Add quality facets for jams having too small fruit quantity:
errors:

  • en:specific-ingredient-fruit-quantity-is-below-the-minimum-value-of-35-for-category-jams
  • en:specific-ingredient-fruit-quantity-is-below-the-minimum-value-of-25-for-category-redcurrants-jams
  • ["en:specific-ingredient-<$specific_ingredient_id>-quantity-is-below-the-minimum-value-of-$quantity_threshold-for-category-<$category_id>]

info:

  • en:missing-specific-ingredient-for-this-category

Following comment from @CharlesNepote and @aleene, I tried (maybe not perfectly done) to write the thresholds values directly in the taxonomy.

This is only for specific ingredients, only for lower than the value provided. If it works well, in future PR, it could be generalized for ingredients and nutriments and for maximal values (for labels it can be minimum (sugar, for example) or maximum (fibers, for example)). Also, to do eventually in the future, detect if jam/jelly should be extra-jam/extra-jelly.

Screenshot

Screenshot_20231231_002712

Related issue(s) and discussion

Part of #1414

@benbenben2 benbenben2 added categories 🧽 Data quality https://wiki.openfoodfacts.org/Quality 🧽 Data quality - Measure - Quality facets One of the facets available in Open Food Facts is /quality & allows us to spot products w/ bad data labels Dec 30, 2023
@benbenben2 benbenben2 self-assigned this Dec 30, 2023
@benbenben2 benbenben2 requested a review from a team as a code owner December 30, 2023 23:36
@github-actions github-actions bot added 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies 🧪 tests labels Dec 30, 2023
Copy link

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarCloud

@codecov-commenter
Copy link

Codecov Report

Attention: 1 lines in your changes are missing coverage. Please review.

Comparison is base (5b33015) 49.28% compared to head (4a107f4) 49.31%.

Files Patch % Lines
lib/ProductOpener/DataQualityFood.pm 92.85% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #9606      +/-   ##
==========================================
+ Coverage   49.28%   49.31%   +0.02%     
==========================================
  Files          66       66              
  Lines       20546    20560      +14     
  Branches     4946     4951       +5     
==========================================
+ Hits        10126    10139      +13     
  Misses       9132     9132              
- Partials     1288     1289       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@CloCkWeRX
Copy link
Contributor

My only suggestion would be seeing if there's an existing way to say "yes, this fails a data quality check for category, but no; it is not because the data is wrong".

This would cater for scenarios like https://www.npr.org/2020/10/01/919189045/for-subway-a-ruling-not-so-sweet-irish-court-says-its-bread-isnt-bread or https://www.delish.com/food-news/a49216/things-you-didnt-know-about-pringles/ (Potato Chips made from dried potato vs potato crisp vs potato chip)

  • Average person and AI would call it... "bread" or "potato chip" when they are scanning it
  • A regulator would call it something different

Otherwise, this somewhat implies that categories can only be derived from regulations

Copy link
Contributor

@stephanegigandet stephanegigandet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@benbenben2
Copy link
Collaborator Author

My only suggestion would be seeing if there's an existing way to say "yes, this fails a data quality check for category, but no; it is not because the data is wrong".

This would cater for scenarios like https://www.npr.org/2020/10/01/919189045/for-subway-a-ruling-not-so-sweet-irish-court-says-its-bread-isnt-bread or https://www.delish.com/food-news/a49216/things-you-didnt-know-about-pringles/ (Potato Chips made from dried potato vs potato crisp vs potato chip)

* Average person and AI would call it... "bread" or "potato chip" when they are scanning it

* A regulator would call it something different

Otherwise, this somewhat implies that categories can only be derived from regulations

Yes, this is a good point. We really would like to have that. It is not only for this PR, but more generic. To be able to shutdown alerts that are false positives.

@benbenben2 benbenben2 merged commit ac6b912 into main Jan 14, 2024
13 checks passed
@benbenben2 benbenben2 deleted the dq_category_specific_ingredient_percent_2 branch January 14, 2024 10:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
categories 🧽 Data quality - Measure - Quality facets One of the facets available in Open Food Facts is /quality & allows us to spot products w/ bad data 🧽 Data quality https://wiki.openfoodfacts.org/Quality 🧬 Taxonomies https://wiki.openfoodfacts.org/Global_taxonomies 🧪 tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants