Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid generating number variations when not needed #799

Merged
merged 1 commit into from
May 15, 2019

Conversation

ClemDoum
Copy link
Collaborator

@ClemDoum ClemDoum commented May 10, 2019

Description:

After analysis generating number entity values variations when validating the dataset can take a very long time when the entity as a lot of value. This is mostly due to running Rustling on each entity value.

To be able to validate the dataset in a reasonable amount of time, number variations are computed only when there are less than 10000 entity values in the entity data.

Checklist:

  • My PR is ready for code review
  • I have added some tests, if applicable, and run the whole test suite, including linting tests
  • I have updated the documentation, if applicable

@ClemDoum ClemDoum requested a review from adrienball May 10, 2019 13:12
@ClemDoum ClemDoum force-pushed the task/limit-string-variations branch 2 times, most recently from 3f087c6 to c6c307d Compare May 10, 2019 13:50
@codecov-io
Copy link

codecov-io commented May 10, 2019

Codecov Report

Merging #799 into develop will increase coverage by 0.05%.
The diff coverage is 100%.

@@             Coverage Diff             @@
##           develop     #799      +/-   ##
===========================================
+ Coverage    88.38%   88.43%   +0.05%     
===========================================
  Files           76       76              
  Lines         4570     4573       +3     
  Branches       882      883       +1     
===========================================
+ Hits          4039     4044       +5     
+ Misses         398      397       -1     
+ Partials       133      132       -1

Copy link
Contributor

@adrienball adrienball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment

# values
number_variations_limit = 0
if len(entity[DATA]) < NUMBER_VARIATIONS_THRESHOLD:
number_variations_limit = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using a boolean flag for number_variations_limit ?

@ClemDoum ClemDoum force-pushed the task/limit-string-variations branch 2 times, most recently from 61c7a51 to 786daca Compare May 10, 2019 14:21
@ClemDoum ClemDoum force-pushed the task/limit-string-variations branch from 786daca to 1b1b27c Compare May 10, 2019 14:38
@ClemDoum ClemDoum requested a review from adrienball May 13, 2019 08:37
@ClemDoum ClemDoum merged commit f6e6f9a into develop May 15, 2019
@ClemDoum ClemDoum deleted the task/limit-string-variations branch May 15, 2019 13:31
@ClemDoum ClemDoum mentioned this pull request Jun 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants