Refactor Differential Entropy for Mutual Information Classification#13574
Refactor Differential Entropy for Mutual Information Classification#13574rschlussel merged 7 commits intoprestodb:masterfrom
Conversation
3599500 to
427a506
Compare
rschlussel
left a comment
There was a problem hiding this comment.
I looked over it generally, and found that there was way to much going on in this commit to understand easily.
- I noted a few places, that I felt like could be extracted into separate commits. If there are other bits of refactoring that can stand on their own, you can create separate commits for those too (all commits should be part of this PR, but having the commits more granular makes it way easier to review)
- It would be helpful to have an overview of what you are doing in this refactoring and for what purpose
- It might be difficult to evaluate whether the refactoring makes sense without seeing its intended use, so it could be helpful to have a preliminary pr (or commit added to this PR) for the function you're using the refactoring for.
...ook/presto/operator/aggregation/differentialentropy/FixedHistogramJacknifeStateStrategy.java
Outdated
Show resolved
Hide resolved
...ook/presto/operator/aggregation/differentialentropy/FixedHistogramJacknifeStateStrategy.java
Show resolved
Hide resolved
.../facebook/presto/operator/aggregation/reservoirsample/TestWeightedDoubleReservoirSample.java
Show resolved
Hide resolved
...facebook/presto/operator/aggregation/differentialentropy/FixedHistogramMleStateStrategy.java
Outdated
Show resolved
Hide resolved
|
|
||
| DifferentialEntropyStateStrategy cloneEmpty(); | ||
|
|
||
| double getTotalPopulationWeight(); |
There was a problem hiding this comment.
add the getTotalPopulationWeight() function + implementations + tests in a separate commit. also, what's it for? it's only used in tests.
...cebook/presto/operator/aggregation/differentialentropy/DifferentialEntropyStateStrategy.java
Show resolved
Hide resolved
a9c9d73 to
af27b8e
Compare
|
To clarify, I'm not suggesting moving changes to the other PR. I'm asking if you can separate out the different logical changes in this PR into different commits so it's easier to review. You can have multiple commits in a single PR (all the commits on a branch since master will be included in the PR), and github lets you review a PR commit by commit, so breaking this down into more granular changes would be very very helpful. Examples of independent logical changes for this pr would be: add mehod x to the interface + implementations, the changes in choosing a strategy (and can you provide an overview of what's going on there/ overall goal. i don't quite understand it), random variable name changes to improve clarity. The easiest way to break apart your commit in git would probably be |
af27b8e to
2be888e
Compare
Worked great, thanks! |
rschlussel
left a comment
There was a problem hiding this comment.
The general approach seems good, but it's still not separated into commits based on the independent logical changes.. I know it can be a bit annoying, but I think it's essential here to break the commits down well because this change touches a lot of code, and it's hard to follow otherwise. Also, each commit should compile on its own, and as currently written, Rename + expand differential entropy calculations utility class would not compile on its own because you haven't changed the places that call that method. Can you break down Refactor factory code into differential_entropy_state_strategy base into multiple commits as follows.
-
Extract calculateEntropy() from FixedHistogramJacknifeStrategy- this commit should be combined withRename + expand differential entropy calculations utility class, so that in a single commit you have
a)rename the calculateEntropyFromSamples -> calculateEntropyFromHistogramAggregates
b) delete code from FixedHistogramJacknifeStrategy
c) add it to EntropyCalculations
d) update all the references of both of those methods -
Rename differential entropy related variables for clarity- this commit would have miscellaneous variable renames, such as samples -> reservoir and bucketCount -> size. Putting this into its own commit, just makes it easier to see the important changed lines in the commits that actually changes. (you can combine this with the documentation change commit if you want, or not whatever you prefer) -
Add getTotalPopulationWeight() to DifferentialEntropyStateStrategy-- This would include adding the method, all it's implementations, and also tests -
Add cloneEmpty() to DifferentialEntropyStateStrategy-- including method, it's implementations, and tests -
Move logic for handling different strategy implementations to DifferentialEntropyStateStrategy- this is the main change that moves the logic for handling which strategy to use and their particularities out of DifferentialEntropyAggregates and DifferentialEntropyStateSerializer into DifferentialEntropyStateStrategy.
...facebook/presto/operator/aggregation/differentialentropy/DifferentialEntropyAggregation.java
Outdated
Show resolved
Hide resolved
...cebook/presto/operator/aggregation/differentialentropy/AbstractTestReservoirAggregation.java
Outdated
Show resolved
Hide resolved
...acebook/presto/operator/aggregation/differentialentropy/DifferentialEntropyStateFactory.java
Outdated
Show resolved
Hide resolved
.../facebook/presto/operator/aggregation/reservoirsample/TestWeightedDoubleReservoirSample.java
Outdated
Show resolved
Hide resolved
4e4a5de to
7594256
Compare
1. Make variable names consistent (e.g., sample->reservoir) 2. Differentiate between $H$ (discrete entropy) an $h$ (differential entropy) in docs
9c89adb to
6fb7235
Compare
6fb7235 to
26853fc
Compare
d29e5f5 to
5ce127c
Compare
.../com/facebook/presto/operator/aggregation/reservoirsample/WeightedDoubleReservoirSample.java
Outdated
Show resolved
Hide resolved
...ook/presto/operator/aggregation/differentialentropy/FixedHistogramJacknifeStateStrategy.java
Outdated
Show resolved
Hide resolved
...cebook/presto/operator/aggregation/differentialentropy/AbstractTestReservoirAggregation.java
Outdated
Show resolved
Hide resolved
...facebook/presto/operator/aggregation/differentialentropy/FixedHistogramMleStateStrategy.java
Outdated
Show resolved
Hide resolved
.../presto/operator/aggregation/differentialentropy/UnweightedReservoirSampleStateStrategy.java
Outdated
Show resolved
Hide resolved
5ce127c to
4b0ead1
Compare
rschlussel
left a comment
There was a problem hiding this comment.
just need to move your recent fixes into the logic moving commit to make sure that that one is correct on its own. Otherwise looks good! Thanks!
...facebook/presto/operator/aggregation/differentialentropy/FixedHistogramMleStateStrategy.java
Outdated
Show resolved
Hide resolved
...cebook/presto/operator/aggregation/differentialentropy/AbstractTestReservoirAggregation.java
Outdated
Show resolved
Hide resolved
.../com/facebook/presto/operator/aggregation/reservoirsample/WeightedDoubleReservoirSample.java
Outdated
Show resolved
Hide resolved
...a/com/facebook/presto/operator/aggregation/fixedhistogram/FixedDoubleBreakdownHistogram.java
Outdated
Show resolved
Hide resolved
988497d to
04ca817
Compare
04ca817 to
b2b8e66
Compare
.../main/java/com/facebook/presto/operator/aggregation/fixedhistogram/FixedDoubleHistogram.java
Outdated
Show resolved
Hide resolved
…ntialEntropyStateStrategy
b2b8e66 to
45e37c1
Compare
45e37c1 to
91e6282
Compare
|
@atavory @rschlussel Please enclose the release notes in a pair of triple backticks(```) in the future. Thanks! |
|
Thanks @caithagoras. These change aren't user facing, so actually no release notes needed. Sorry for the trouble! |

== RELEASE NOTES ==
General Changes
Together with #13203, this forms the basis for mutual information classification udfs #13163.