Skip to content

Replace constants with static factory methods#11660

Merged
arhimondr merged 6 commits intoprestodb:masterfrom
arhimondr:replace-constants-with-static-factory-methods
Oct 10, 2018
Merged

Replace constants with static factory methods#11660
arhimondr merged 6 commits intoprestodb:masterfrom
arhimondr:replace-constants-with-static-factory-methods

Conversation

@arhimondr
Copy link
Member

No description provided.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i prefer to keep static factory methods above the instance fields (and below class constants)

Copy link
Member Author

@arhimondr arhimondr Oct 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everywhere else in the code the static factory methods are below the fields.

(Symbol, ExpressionInterpreter, TupleDomain, Domain, MetadataManager, ...)

Some clases define static factor methods after constructor, but i don't think it is right. As we try to define methods in the order of their appearance. So if method A uses method B, method A comes first and the method B comes second. Similar with static factory methods. Since factory methods use constructor, they should come before constructor.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a build check for that to not to go back and forth

@arhimondr
Copy link
Member Author

@findepi I addressed all the comments, could you please accept this? So i can merge it and open the second refactoring PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be 2 separate situations. When rows count is 0, and if it is not, and distinctValuesCount is zero - that means that all the remaining values are null. Added a separate branch for outputRowCount == 0.0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After i added a separate branch for outputRowCount == 0.0 i'm getting a lot of test failures. I'm not fixing them. Removing the branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN if outputRowCount is 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

average row size is NaN

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If all values are null - row size is zero. If row count is zero - row size is NaN

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NaN and NaN?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like inlining zero is error prone, see number of places where you missed NaN

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to have a build check for that to not to go back and forth

Copy link
Contributor

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except for "Remove SymbolStatsEstimate#ZERO constant" commit.

We need to verify empirically this change does not inadvertently introduce planning changes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After this change, there is no UNKNOWN_STATS constant, so you could as well return symbolStats

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning SymbolStatsEstimate.unknown() is more explicit. Think of it as of

if(variable == null){
  return null;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orElseGet

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any practical difference in doing this? SymbolStatsEstimate.unknown() is a constant, and always be a constant. Why do we need to generate additional bootstrap methods for invokedynamic to pass it as a lambda?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Introduce local variables for these.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why?

@arhimondr
Copy link
Member Author

arhimondr commented Oct 9, 2018

@kokosing @findepi

I was badly trying to remove the zero constant, and was honestly trying to use NaN. But unfortunately it opens Pandora's box, as we thread NaN as an unknown value.

Also i realized that we threat NaN as unknown almost everywhere. Almost, becasue NaN in range estimate is not unknown anymore, but represents an empty range: https://github.com/prestodb/presto/blob/master/presto-main/src/main/java/com/facebook/presto/cost/SymbolStatsEstimate.java#L68. And if the range is empty - we "normalize" the stats estimate in the constructor setting the nulls count to 1.0. I tried to remove this normalization as well, but gave up.

I'm pretty sure that we would have to reopen that box at some point, but for now let's keep it closed. This PR is supposed to be simple refactor, and not something bigger.

I replaced the commit that removes ZERO constant with a commit that changes it to the static factory method.

@arhimondr arhimondr force-pushed the replace-constants-with-static-factory-methods branch from 246a54c to 29cc280 Compare October 9, 2018 19:01
@arhimondr arhimondr force-pushed the replace-constants-with-static-factory-methods branch from 29cc280 to 3ab6145 Compare October 9, 2018 21:34
@arhimondr
Copy link
Member Author

We need to verify empirically this change does not inadvertently introduce planning changes.

I ran the tests from the #11665, the plans are the same.

Copy link
Contributor

@kokosing kokosing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But please wait for others.

Copy link
Contributor

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM although I would prefer that we don't do this refactors at all.
In the absence of precise rules, this can be viewed as "some style" vs "some other style". In particular, my preference was with the code before.

@arhimondr
Copy link
Member Author

arhimondr commented Oct 10, 2018

@findepi

I think i already tried to explain why am i doing this, but apparently that comment got lost.

My goal is to avoid making estimates out of the blue. We had a real problem with a greatly overestimated joins when column statistics were missing. That over estimated join led to completely wrong join order.

First of all i was trying to fix that without doing any major refactoring. I didn't want to remove Optional from the FilterStatsCalculator. But then i realized that the code is getting really messy, as i effectively need to double check if the stats are unknown. Optional.empty() meant that the stats are uknown. And because of implicit logic in the ComparisonStatsCalculator and in the StatisticRange it was possible that method that returns Optional<PlanNodeStatsEstimate> could've returned effectively Optional.of(PlanNodeStatsEstimate.UNKNOWN_STATS).

At that step i realized that having Optional in the interface is extra, makes the code messy and hard to reason about. So i tried to remove Optional from the interfaces.

Once i removed Optional from the interfaces i realized that i need to return PlanNodeStatsEstimate.UNKNOWN_STATS in the places where Optional.empty() was returned. It introduced an import conflict with SymbolStatsEstimate.UNKNOWN_STATS. Having them imported in non static manner looks tautological and cumbersome. So obviously the refactoring was needed to rename the constants. I could've renamed them to UNKNOWN_PLAN_NODE_STATS and UNKNOWN_SYMBOL_STATS. But it wouldn't have decreased the scope of this change much, as most likely i would've end up renaming the other constants in that class for consistency as well. Because of the reasons explained here and here i decided to do this right, and make it consistent with JDK/Guava and Airlift style.

@arhimondr arhimondr merged commit f112441 into prestodb:master Oct 10, 2018
@arhimondr arhimondr deleted the replace-constants-with-static-factory-methods branch October 10, 2018 14:56
@findepi
Copy link
Contributor

findepi commented Oct 10, 2018

@arhimondr thanks for taking time to explain this to me (once again). Really appreciated.
Also, apologies -- i know that in #11660 (review) i was repeating myself. I am sometimes a bit stubborn, you know. ;)

@arhimondr
Copy link
Member Author

I am sometimes a bit stubborn, you know. ;)

Haha =) We found each other =)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants