Skip to content

Comments

Refactor Differential Entropy for Mutual Information Classification#13574

Merged
rschlussel merged 7 commits intoprestodb:masterfrom
atavory:refactoring_differential_entropy
Nov 15, 2019
Merged

Refactor Differential Entropy for Mutual Information Classification#13574
rschlussel merged 7 commits intoprestodb:masterfrom
atavory:refactoring_differential_entropy

Conversation

@atavory
Copy link
Contributor

@atavory atavory commented Oct 18, 2019

== RELEASE NOTES ==

General Changes

  • Move strategy factory code from udf to base class of strategy
  • Add population counts/weights to reservoir samples

Together with #13203, this forms the basis for mutual information classification udfs #13163.

@atavory atavory changed the title refactoring for mutual information classification [WIP] refactoring for mutual information classification Oct 18, 2019
@atavory atavory changed the title [WIP] refactoring for mutual information classification refactoring for mutual information classification Oct 19, 2019
@atavory atavory changed the title refactoring for mutual information classification Refactor for Mutual Information Classification Oct 19, 2019
@atavory atavory mentioned this pull request Oct 21, 2019
@atavory atavory force-pushed the refactoring_differential_entropy branch from 3599500 to 427a506 Compare October 23, 2019 14:27
@rschlussel rschlussel self-assigned this Oct 25, 2019
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked over it generally, and found that there was way to much going on in this commit to understand easily.

  1. I noted a few places, that I felt like could be extracted into separate commits. If there are other bits of refactoring that can stand on their own, you can create separate commits for those too (all commits should be part of this PR, but having the commits more granular makes it way easier to review)
  2. It would be helpful to have an overview of what you are doing in this refactoring and for what purpose
  3. It might be difficult to evaluate whether the refactoring makes sense without seeing its intended use, so it could be helpful to have a preliminary pr (or commit added to this PR) for the function you're using the refactoring for.


DifferentialEntropyStateStrategy cloneEmpty();

double getTotalPopulationWeight();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add the getTotalPopulationWeight() function + implementations + tests in a separate commit. also, what's it for? it's only used in tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's used in #13163. I could move it there, but, as we discussed, this PR is basically changes that are needed for #13163, so not sure why to move this one in particular.

@atavory atavory force-pushed the refactoring_differential_entropy branch 3 times, most recently from a9c9d73 to af27b8e Compare October 26, 2019 19:10
@rschlussel
Copy link
Contributor

To clarify, I'm not suggesting moving changes to the other PR. I'm asking if you can separate out the different logical changes in this PR into different commits so it's easier to review. You can have multiple commits in a single PR (all the commits on a branch since master will be included in the PR), and github lets you review a PR commit by commit, so breaking this down into more granular changes would be very very helpful.

Examples of independent logical changes for this pr would be: add mehod x to the interface + implementations, the changes in choosing a strategy (and can you provide an overview of what's going on there/ overall goal. i don't quite understand it), random variable name changes to improve clarity.

The easiest way to break apart your commit in git would probably be git reset HEAD^ and then git add -p to select the changes you want for each commit.

@atavory atavory force-pushed the refactoring_differential_entropy branch from af27b8e to 2be888e Compare October 29, 2019 20:25
@atavory
Copy link
Contributor Author

atavory commented Oct 30, 2019

The easiest way to break apart your commit in git would probably be git reset HEAD^ and then git add -p to select the changes you want for each commit.

Worked great, thanks!

Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general approach seems good, but it's still not separated into commits based on the independent logical changes.. I know it can be a bit annoying, but I think it's essential here to break the commits down well because this change touches a lot of code, and it's hard to follow otherwise. Also, each commit should compile on its own, and as currently written, Rename + expand differential entropy calculations utility class would not compile on its own because you haven't changed the places that call that method. Can you break down Refactor factory code into differential_entropy_state_strategy base into multiple commits as follows.

  1. Extract calculateEntropy() from FixedHistogramJacknifeStrategy - this commit should be combined with Rename + expand differential entropy calculations utility class, so that in a single commit you have
    a)rename the calculateEntropyFromSamples -> calculateEntropyFromHistogramAggregates
    b) delete code from FixedHistogramJacknifeStrategy
    c) add it to EntropyCalculations
    d) update all the references of both of those methods

  2. Rename differential entropy related variables for clarity - this commit would have miscellaneous variable renames, such as samples -> reservoir and bucketCount -> size. Putting this into its own commit, just makes it easier to see the important changed lines in the commits that actually changes. (you can combine this with the documentation change commit if you want, or not whatever you prefer)

  3. Add getTotalPopulationWeight() to DifferentialEntropyStateStrategy -- This would include adding the method, all it's implementations, and also tests

  4. Add cloneEmpty() to DifferentialEntropyStateStrategy -- including method, it's implementations, and tests

  5. Move logic for handling different strategy implementations to DifferentialEntropyStateStrategy - this is the main change that moves the logic for handling which strategy to use and their particularities out of DifferentialEntropyAggregates and DifferentialEntropyStateSerializer into DifferentialEntropyStateStrategy.

@atavory atavory changed the title Refactor for Mutual Information Classification Refactor Differential Entropy for Mutual Information Classification Nov 5, 2019
@atavory atavory force-pushed the refactoring_differential_entropy branch 2 times, most recently from 4e4a5de to 7594256 Compare November 6, 2019 20:22
1. Make variable names consistent (e.g., sample->reservoir)
2. Differentiate between $H$ (discrete entropy) an $h$ (differential entropy) in docs
@atavory atavory force-pushed the refactoring_differential_entropy branch 3 times, most recently from 9c89adb to 6fb7235 Compare November 7, 2019 08:16
@atavory atavory force-pushed the refactoring_differential_entropy branch from 6fb7235 to 26853fc Compare November 7, 2019 17:09
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Nov 7, 2019

CLA Check
The committers are authorized under a signed CLA.

  • ✅ Ami Tavory (25f7c55, 26853fc, 2de474e9bbbf5cb1ceb68965eb4b949b4b0bbe19, 03413b14783938f73d18087c18dcb8f0c9d5ce3a, 6770e3f013907e8f05d977bbd4128a7dbfd4e972, 4b0ead1454bf7876e7f6e423e10d680f8c826825)

@atavory atavory force-pushed the refactoring_differential_entropy branch 2 times, most recently from d29e5f5 to 5ce127c Compare November 8, 2019 05:56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: static import

@atavory atavory force-pushed the refactoring_differential_entropy branch from 5ce127c to 4b0ead1 Compare November 13, 2019 06:46
Copy link
Contributor

@rschlussel rschlussel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just need to move your recent fixes into the logic moving commit to make sure that that one is correct on its own. Otherwise looks good! Thanks!

@atavory atavory force-pushed the refactoring_differential_entropy branch 3 times, most recently from 988497d to 04ca817 Compare November 14, 2019 08:46
@atavory atavory force-pushed the refactoring_differential_entropy branch from 04ca817 to b2b8e66 Compare November 14, 2019 14:33
@atavory atavory force-pushed the refactoring_differential_entropy branch from b2b8e66 to 45e37c1 Compare November 14, 2019 17:15
@atavory atavory force-pushed the refactoring_differential_entropy branch from 45e37c1 to 91e6282 Compare November 14, 2019 17:18
@caithagoras
Copy link
Contributor

@atavory @rschlussel Please enclose the release notes in a pair of triple backticks(```) in the future. Thanks!

@rschlussel
Copy link
Contributor

rschlussel commented Jan 2, 2020

Thanks @caithagoras. These change aren't user facing, so actually no release notes needed. Sorry for the trouble!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants