-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Applies the Baseline plugin for iceberg-api only. #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Applies the Baseline plugin for iceberg-api only. #143
Conversation
This reverts commit debdae8.
Appears to only be related to Gradle 5.0.
|
|
||
| test { | ||
| // Only for TestSplitScan as of Gradle 5.0+ | ||
| maxHeapSize '1500m' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I verified this on master - if you just change the Gradle version to 5.0+, the TestSplitScan test fails with the same error. So the memory settings must have changed in Gradle 5.0. Nothing that was changed with regards to style should have had an effect on the memory usage.
Might as well upgrade to latest while we're at it
|
|
||
| public Builder withSpecId(int specId) { | ||
| this.specId = specId; | ||
| public Builder withSpecId(int newSpecId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like this pattern is used throughout the code, specifically in builders and we'd have to come up with creative names for the parameters, even though this.name = name provides good enough disambiguation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters cannot hide field names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again we run into the same problem of the majority of cases being better to follow the standard - generally naming vars and parameters the same as fields is dubious. I think it's ok that in some cases where it might be considered acceptable to break this trend, to nevertheless conform to the global standard. Otherwise, we'd have to add a SuppressWarnings here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use newSpecId than SuppressWarnings for cases like this. But, I agree that this makes things more awkward. I wish we could allow cases where the field is assigned to an argument with the same name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This rule will not apply to constructors, right?
| PartitionField that = (PartitionField) other; | ||
| return ( | ||
| sourceId == that.sourceId && | ||
| return sourceId == that.sourceId && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm okay with this, but I don't see a lot of benefit to removing unnecessary parens. If extra parens make something more readable (like this) or clarify order of operations even when matching the default, I would say we should keep them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the parens added readability - quite the opposite in fact, given that whenever I see parentheses I assume there would be some other clause in the boolean clause, e.g. (...) && (...). It's a built-in rule in Checkstyle: http://checkstyle.sourceforge.net/config_coding.html#UnnecessaryParentheses. Doesn't look like it's strictly annotated in the style guide though:
api/src/main/java/org/apache/iceberg/expressions/ExpressionVisitors.java
Show resolved
Hide resolved
api/src/main/java/org/apache/iceberg/expressions/ResidualEvaluator.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/transforms/TestProjection.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/expressions/TestEvaluatior.java
Outdated
Show resolved
Hide resolved
api/src/test/java/org/apache/iceberg/transforms/TestIdentity.java
Outdated
Show resolved
Hide resolved
|
I usually prefer line breaks at 100 characters. This uses 120? Do we want to go with that? |
|
One more thing: I don't see a lot of changes to method parameter lists. What is the convention for those? |
Interestingly, Google Style Guide wants the column limit to be 100 characters: https://google.github.io/styleguide/javaguide.html#s4.4-column-limit But Baseline wants 120: https://github.com/palantir/gradle-baseline/blob/develop/docs/java-style-guide/readme.md#column-limit-120. I've always found in my coding that 100 characters is far too short, even for Scala (when building on Spark), much less so for Java which is more verbose. I'd be a vote for the 120 character per line limit, but open to changing if the community builds consensus on shorter lines.
I believe the convention I've always seen when using Baseline is this: So, each parameter on its own line, with 4 space indent from the visibility modifier keyword ( |
I've not had a problem with 100. I'm fine with either one as long as we enforce something. 120 is probably a better choice because there are cases where merges and refactors have accidentally created longer lines. I think we will have fewer changes to standardize if we use 120.
My vote is to set some reasonable guidelines and not worry about it. Here's what I would do for this project:
That would look like this: public void shortMethodArgs(int foo) throws IOException {
} public void longMethodArgs(
Map<String, String> properties1, Map<String, String> properties2,
Map<String, String> properties3) throws IOException, ClassNotFoundException {
}I don't really care for the Spark style that puts one arg on every line. It wastes a lot of vertical space. |
|
One more comment on line length: I checked the default view for github and it looks like 120 doesn't cause horizontal scrolling, which would be bad. The limit is around 130 characters. |
api/src/test/java/org/apache/iceberg/transforms/TestProjection.java
Outdated
Show resolved
Hide resolved
|
Overall, the commit looks good to me. Everyone else okay with it? Should we also have a |
+1 to eventually document anything that a reviewer could just cite, instead of writing it manually all the time.
Looks like Baseline will indeed give us Nice stuff @mccheah. LGTM. |
The default project configuration given by Baseline adheres to the default project style guidelines Baseline is opinionated about. I changed the default project configuration to match as many of the style guideline deviations as I could remember here, but there might be cases where the generated IDEA project doesn't match what we changed in |
|
I should also mention that I only changed the project configuration for IntelliJ IDEA. I didn't adjust the style configuration for Eclipse. I can do that now, or in a follow-up patch. |
Not sure I follow. I work on other projects were the equivalent to Here is an example from a Maven Scala project I work on: <!-- Scala style checking configuration -->
<plugin>
<groupId>org.scalastyle</groupId>
<artifactId>scalastyle-maven-plugin</artifactId>
<version>${scalastyle-maven-plugin.version}</version>
<configuration>
<verbose>false</verbose>
<failOnViolation>true</failOnViolation>
<includeTestSourceDirectory>true</includeTestSourceDirectory>
<failOnWarning>false</failOnWarning>
<sourceDirectory>${basedir}/src/main/scala</sourceDirectory>
<testSourceDirectory>${basedir}/src/test/scala</testSourceDirectory>
<configLocation>${basedir}/../scalastyle-config.xml</configLocation>
<outputFile>${basedir}/target/scalastyle-output.xml</outputFile>
<inputEncoding>${project.build.sourceEncoding}</inputEncoding>
</configuration>
<executions>
<execution>
<id>scalastyle</id>
<phase>validate</phase>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>Are we saying that a similar integration is not possible with Gradle + Baseline? |
|
I think from the discussion here that there is consensus for these changes. There are still some open issues around IDE integration, but I'd rather get this in so we can work on other modules than solve all of those challenges at once. I'm going to commit this. |
|
@xabriel sorry I didn't get to this earlier. In Baseline, the IDE project files are not tied to the checkstyle.xml files. So it's possible to follow the conventions the IDE thinks it should follow, but proceed to fail when CI runs If we notice that our work done in our IDE is causing bad style behaviors, we can look into the specific parts of the IDE projects to fix. |
Part of #24. Replaces #28.
Previously, whenever styles were inconsistent across the Iceberg codebase, it was up to code reviewers to catch the inconsistencies and to catch them before merging. This is prone to error, and little code linting problems can add up over time.
Baseline is a code linting toolkit for Gradle. It consists of multiple submodules that when taken together allow automation to enforce consistent coding guidelines. For more information, refer to the Baseline docs.
As a proof of concept, the full Baseline suite minus spotless-java is now only applied for the iceberg-api project, while IntelliJ project configuration is applied for all projects. We apply the Baseline linting changes only to
iceberg-apito minimize code churn and to show how we can introduce baseline checks incrementally. Eventually we can add the same tooling to all the submodules.Baseline's coding conventions are inherited from the Google Style Guide. For more information on the style rules that are given out of the box, refer to this documentation. There were a number of style conventions that we've adopted from Baseline that were not previously enforced in the project. A subset of them are listed as follows.
name, no parameters of any methods in the class can be calledname.Wis no longer a valid field name inTruncate, for example. There were a lot of cases like this.Preconditionsmust use a constant format string. Variance in what the preconditions message would produce must be given as format string arguments.StandardCharsetsinstead of Guava'sCharsets.LinkedListandArrayList, is disallowed. Use the interface types, likeListandDeque, instead.final, but they were named with all-capital letters and using underscores instead of camelCase, and Checkstyle didn't like that naming convention. We mark all such constants asfinal.There's a number of Baseline-opinionated defaults that we don't adopt here to reduce the code churn and also just because Iceberg appears to hold different opinions. Here's some of them:
UUID,ID,IO, andUTC. In one case we allow a method to be calledflipLRby applying aSuppressWarningsannotation on the method directly.