Fix zero percentage replacement in get_binned_data Function and update parameter name by boemer00 · Pull Request #1278 · evidentlyai/evidently

boemer00 · 2024-09-02T20:40:14Z

This pull request introduces changes that address the issue: The fixed value for feel_zeroes in get_binned_data may lead to deviation in some case. #334

Dynamic Fill Value Calculation: The fill value used to replace zero percentages is now calculated dynamically based on the actual data, rather than using a fixed value.
Ensuring Correct Fill Value: The fill value is guaranteed to be smaller than the minimum non-zero percentage in both the reference and current datasets. This adjustment ensures that the data distribution remains accurate.
Maintaining Data Distribution: These changes help maintain the correct distribution of data, which is crucial for accurate statistical tests, including Kullback-Leibler divergence drift score calculations.
Parameter Name Update: The parameter name has been updated from "feel_zeroes" to "fill_zeroes" for clarity and consistency.

Hope it helps
boemer00

boemer00 · 2024-09-05T15:39:23Z

Enhance `get_binned_data` Function with Flexible Fill Methods and Dynamic Scaling

This pull request introduces enhancements to my previous PR regarding the get_binned_data function. These changes address issues with handling very small percentages in data drift calculations while providing some flexibility and maintaining its original purpose.

Changes Made:

New Parameters:
- fill_method (str, default='auto'): Allows users to select the method for calculating the fill value for zero percentages.
- dynamic_scale (bool, default=False): Enables dynamic scaling of the fill value based on the data distribution.
Fill Methods: Three fill methods are now supported:
- auto: Calculates a fill value that is the minimum of 1/10 and 1/2 of the smallest non-zero percentage.
- min: Uses the minimum non-zero percentage as the fill value.
- mean: Uses the average of the minimum non-zero percentages from reference and current data.

Dynamic Scaling: When enabled, applies an additional scaling factor to the fill value:

scale_factor = min(min_non_zero_ref, min_non_zero_cur) / max(min_non_zero_ref, min_non_zero_cur)

This is beneficial in cases where there is a significant difference between the minimum non-zero percentages in the reference and current data.

Normalisation: After applying the fill value, the percentages are normalised to ensure they always sum to 1:

reference_percents = reference_percents / np.sum(reference_percents)
current_percents = current_percents / np.sum(current_percents)

Error Handling: Added a ValueError for invalid fill_method values to prevent unexpected behaviour.

Lastly, I am happy to write a test for the new parameters but would appreciate direction on where this should be implemented.

Thanks

emeli-dral · 2025-06-25T17:18:28Z

The linter check has been failing for a while, and now the branch also has merge conflicts with main. Given the current state of the repo, resolving both issues would likely be more work than starting fresh.

If the changes are still relevant, feel free to rebase onto the latest main and open a new PR. Happy to review it then!

bharath03-a · 2025-07-14T23:07:40Z

Hey @boemer00, thank you for the original work here. Since this branch had diverged and had some linter and merge issues, I’ve opened a fresh PR based on your implementation and rebased it on the latest main.

New PR: #1660

Hello, @emeli-dral! Happy to continue iteration there!

boemer00 added 2 commits September 2, 2024 21:24

fixed min non-zero issue v1

63de8d5

name correction feel -> fill

a9bac3f

boemer00 mentioned this pull request Sep 3, 2024

The fixed value for feel_zeroes in get_binned_data may lead to deviation in some case. #334

Open

updated function with new parameters

bd0b12c

emeli-dral closed this Jun 25, 2025

bharath03-a mentioned this pull request Jul 14, 2025

Enhanced get_binned_data Function with Flexible Fill Methods and Dynamic Scaling #1659

Closed

bharath03-a mentioned this pull request Jul 15, 2025

Enhanced get_binned_data Function with Flexible Fill Methods and Dynamic Scaling #1660

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix zero percentage replacement in get_binned_data Function and update parameter name#1278

Fix zero percentage replacement in get_binned_data Function and update parameter name#1278
boemer00 wants to merge 3 commits intoevidentlyai:mainfrom
boemer00:min_non_zero

boemer00 commented Sep 2, 2024

Uh oh!

boemer00 commented Sep 5, 2024

Uh oh!

emeli-dral commented Jun 25, 2025

Uh oh!

bharath03-a commented Jul 14, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

boemer00 commented Sep 2, 2024

Uh oh!

boemer00 commented Sep 5, 2024

Enhance get_binned_data Function with Flexible Fill Methods and Dynamic Scaling

Changes Made:

Uh oh!

emeli-dral commented Jun 25, 2025

Uh oh!

bharath03-a commented Jul 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Enhance `get_binned_data` Function with Flexible Fill Methods and Dynamic Scaling

bharath03-a commented Jul 14, 2025 •

edited

Loading