Skip to content

Conversation

@mccheah
Copy link
Contributor

@mccheah mccheah commented Mar 21, 2019

Part of #24. Replaces #28.

Previously, whenever styles were inconsistent across the Iceberg codebase, it was up to code reviewers to catch the inconsistencies and to catch them before merging. This is prone to error, and little code linting problems can add up over time.

Baseline is a code linting toolkit for Gradle. It consists of multiple submodules that when taken together allow automation to enforce consistent coding guidelines. For more information, refer to the Baseline docs.

As a proof of concept, the full Baseline suite minus spotless-java is now only applied for the iceberg-api project, while IntelliJ project configuration is applied for all projects. We apply the Baseline linting changes only to iceberg-api to minimize code churn and to show how we can introduce baseline checks incrementally. Eventually we can add the same tooling to all the submodules.

Baseline's coding conventions are inherited from the Google Style Guide. For more information on the style rules that are given out of the box, refer to this documentation. There were a number of style conventions that we've adopted from Baseline that were not previously enforced in the project. A subset of them are listed as follows.

  • We no longer allow parameter names and local variable names to hide the names of fields. So if a field is named name, no parameters of any methods in the class can be called name.
  • All field, parameter, and variable names must be more than 1 character long. So W is no longer a valid field name in Truncate, for example. There were a lot of cases like this.
  • All fields must be private with accessor methods. I believe in one case this affects a public API.
  • Utility classes must have a private no-arg constructor.
  • All unnecessary parentheses are disallowed.
  • Preconditions must use a constant format string. Variance in what the preconditions message would produce must be given as format string arguments.
  • Overloaded methods must be grouped together.
  • Use Java's StandardCharsets instead of Guava's Charsets.
  • Declaring collection types using the concrete types, such as LinkedList and ArrayList, is disallowed. Use the interface types, like List and Deque, instead.
  • There were some cases where static constants weren't marked as final, but they were named with all-capital letters and using underscores instead of camelCase, and Checkstyle didn't like that naming convention. We mark all such constants as final.

There's a number of Baseline-opinionated defaults that we don't adopt here to reduce the code churn and also just because Iceberg appears to hold different opinions. Here's some of them:

  • Baseline's spotless plugin doesn't work when we allow static imports, so we just disable it entirely.
  • Indentation is still 2 spaces with 4 space continuation indent.
  • No opinions are held about Javadoc formatting (should we set a standard for this?)
  • We allow a larger variety of static imports, though by default static imports are still forbidden
  • Static imports are listed after non-static imports, not before
  • Cyclomatic complexity is bumped to 12, up from 10
  • We allow some more abbreviations - particularly UUID, ID, IO, and UTC. In one case we allow a method to be called flipLR by applying a SuppressWarnings annotation on the method directly.
  • No opinions are held about the assertion methods we use for tests.


test {
// Only for TestSplitScan as of Gradle 5.0+
maxHeapSize '1500m'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I verified this on master - if you just change the Gradle version to 5.0+, the TestSplitScan test fails with the same error. So the memory settings must have changed in Gradle 5.0. Nothing that was changed with regards to style should have had an effect on the memory usage.


public Builder withSpecId(int specId) {
this.specId = specId;
public Builder withSpecId(int newSpecId) {
Copy link
Contributor

@rdsr rdsr Mar 22, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like this pattern is used throughout the code, specifically in builders and we'd have to come up with creative names for the parameters, even though this.name = name provides good enough disambiguation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameters cannot hide field names.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again we run into the same problem of the majority of cases being better to follow the standard - generally naming vars and parameters the same as fields is dubious. I think it's ok that in some cases where it might be considered acceptable to break this trend, to nevertheless conform to the global standard. Otherwise, we'd have to add a SuppressWarnings here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use newSpecId than SuppressWarnings for cases like this. But, I agree that this makes things more awkward. I wish we could allow cases where the field is assigned to an argument with the same name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This rule will not apply to constructors, right?

PartitionField that = (PartitionField) other;
return (
sourceId == that.sourceId &&
return sourceId == that.sourceId &&
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm okay with this, but I don't see a lot of benefit to removing unnecessary parens. If extra parens make something more readable (like this) or clarify order of operations even when matching the default, I would say we should keep them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the parens added readability - quite the opposite in fact, given that whenever I see parentheses I assume there would be some other clause in the boolean clause, e.g. (...) && (...). It's a built-in rule in Checkstyle: http://checkstyle.sourceforge.net/config_coding.html#UnnecessaryParentheses. Doesn't look like it's strictly annotated in the style guide though:

@rdblue
Copy link
Contributor

rdblue commented Mar 22, 2019

I usually prefer line breaks at 100 characters. This uses 120? Do we want to go with that?

@rdblue
Copy link
Contributor

rdblue commented Mar 22, 2019

One more thing: I don't see a lot of changes to method parameter lists. What is the convention for those?

@mccheah
Copy link
Contributor Author

mccheah commented Mar 22, 2019

I usually prefer line breaks at 100 characters. This uses 120? Do we want to go with that?

Interestingly, Google Style Guide wants the column limit to be 100 characters: https://google.github.io/styleguide/javaguide.html#s4.4-column-limit

But Baseline wants 120: https://github.com/palantir/gradle-baseline/blob/develop/docs/java-style-guide/readme.md#column-limit-120.

I've always found in my coding that 100 characters is far too short, even for Scala (when building on Spark), much less so for Java which is more verbose. I'd be a vote for the 120 character per line limit, but open to changing if the community builds consensus on shorter lines.

One more thing: I don't see a lot of changes to method parameter lists. What is the convention for those?

I believe the convention I've always seen when using Baseline is this:

public Type myMethod(
    int param1,
    int param2,
    ...

So, each parameter on its own line, with 4 space indent from the visibility modifier keyword (public in this example. Also, it would be 8 spaces in projects that use 4 spaces as a base for indentation.). But, notably, Baseline checkstyle does not enforce a hard rule on this. I did some brief searching on Google and Checkstyle doesn't have a built-in rule for this, so we'd have to write one ourselves. Sounds a bit involved. What do you think we should do here to proceed @rdblue?

@rdblue
Copy link
Contributor

rdblue commented Mar 22, 2019

I've always found in my coding that 100 characters is far too short, even for Scala

I've not had a problem with 100. I'm fine with either one as long as we enforce something. 120 is probably a better choice because there are cases where merges and refactors have accidentally created longer lines. I think we will have fewer changes to standardize if we use 120.

Checkstyle doesn't have a built-in rule for this, so we'd have to write one ourselves. Sounds a bit involved.

My vote is to set some reasonable guidelines and not worry about it.

Here's what I would do for this project:

  1. If all arguments etc. fit on one line, leave it that way
  2. If not, start a new line for arguments at 2 indents / 4 spaces from the start of the method definition (e.g. public)
  3. When the argument line is full, start a new one at 2 indents / 4 spaces

That would look like this:

  public void shortMethodArgs(int foo) throws IOException {
  }
  public void longMethodArgs(
      Map<String, String> properties1, Map<String, String> properties2,
      Map<String, String> properties3) throws IOException, ClassNotFoundException {
  }

I don't really care for the Spark style that puts one arg on every line. It wastes a lot of vertical space.

@rdblue
Copy link
Contributor

rdblue commented Mar 22, 2019

One more comment on line length: I checked the default view for github and it looks like 120 doesn't cause horizontal scrolling, which would be bad. The limit is around 130 characters.

@mccheah
Copy link
Contributor Author

mccheah commented Mar 25, 2019

@rdblue @rdsr @xabriel - I addressed the comments so far. Let me know if there's anything else to change. Thanks!

@rdblue
Copy link
Contributor

rdblue commented Mar 25, 2019

Overall, the commit looks good to me. Everyone else okay with it?

Should we also have a STYLE.md file somewhere that documents what we decide? If method parameter style isn't enforced, then we should write it down. It would also help to have things like the line limit and import settings documented so that people can configure their editors (or is that automatic using the idea task?). We don't have to add it in this PR though, if you'd like to close this out and get it in.

@xabriel
Copy link
Contributor

xabriel commented Mar 25, 2019

Should we also have a STYLE.md file somewhere that documents what we decide? If method parameter style isn't enforced, then we should write it down.

+1 to eventually document anything that a reviewer could just cite, instead of writing it manually all the time.

(or is that automatic using the idea task?)

Looks like Baseline will indeed give us ./gradlew idea and ./gradlew eclipse to import the guidelines to IDEs: https://github.com/palantir/gradle-baseline#compalantirbaseline-idea

Nice stuff @mccheah. LGTM.

@mccheah
Copy link
Contributor Author

mccheah commented Mar 25, 2019

Looks like Baseline will indeed give us ./gradlew idea and ./gradlew eclipse to import the guidelines to IDEs: https://github.com/palantir/gradle-baseline#compalantirbaseline-idea

The default project configuration given by Baseline adheres to the default project style guidelines Baseline is opinionated about. I changed the default project configuration to match as many of the style guideline deviations as I could remember here, but there might be cases where the generated IDEA project doesn't match what we changed in checkstyle.xml. The main style setup I changed in the IDEA project were license headers, indentation numbers, and import ordering (static vs. non-static). I think we'll have to unfortunately catch the other deviations we made on a case by case basis moving forward.

@mccheah
Copy link
Contributor Author

mccheah commented Mar 26, 2019

I should also mention that I only changed the project configuration for IntelliJ IDEA. I didn't adjust the style configuration for Eclipse. I can do that now, or in a follow-up patch.

@xabriel
Copy link
Contributor

xabriel commented Mar 26, 2019

The main style setup I changed in the IDEA project were license headers, indentation numbers, and import ordering (static vs. non-static). I think we'll have to unfortunately catch the other deviations we made on a case by case basis moving forward.

Not sure I follow. I work on other projects were the equivalent to checkstyle.xml is part of the Maven build, and thus the build will fail if the rules set is not abided by.

Here is an example from a Maven Scala project I work on:

        <!-- Scala style checking configuration -->
        <plugin>
          <groupId>org.scalastyle</groupId>
          <artifactId>scalastyle-maven-plugin</artifactId>
          <version>${scalastyle-maven-plugin.version}</version>
          <configuration>
            <verbose>false</verbose>
            <failOnViolation>true</failOnViolation>
            <includeTestSourceDirectory>true</includeTestSourceDirectory>
            <failOnWarning>false</failOnWarning>
            <sourceDirectory>${basedir}/src/main/scala</sourceDirectory>
            <testSourceDirectory>${basedir}/src/test/scala</testSourceDirectory>
            <configLocation>${basedir}/../scalastyle-config.xml</configLocation>
            <outputFile>${basedir}/target/scalastyle-output.xml</outputFile>
            <inputEncoding>${project.build.sourceEncoding}</inputEncoding>
          </configuration>
          <executions>
            <execution>
              <id>scalastyle</id>
              <phase>validate</phase>
              <goals>
                <goal>check</goal>
              </goals>
            </execution>
          </executions>
        </plugin>

Are we saying that a similar integration is not possible with Gradle + Baseline?

@rdblue
Copy link
Contributor

rdblue commented Apr 3, 2019

I think from the discussion here that there is consensus for these changes. There are still some open issues around IDE integration, but I'd rather get this in so we can work on other modules than solve all of those challenges at once. I'm going to commit this.

@rdblue rdblue merged commit 439a20d into apache:master Apr 3, 2019
@mccheah
Copy link
Contributor Author

mccheah commented Apr 8, 2019

@xabriel sorry I didn't get to this earlier. In Baseline, the IDE project files are not tied to the checkstyle.xml files. So it's possible to follow the conventions the IDE thinks it should follow, but proceed to fail when CI runs ./gradlew checkstyleMain. However, the IDE project settings should be picking up the correct checkstyle.xml files.

If we notice that our work done in our IDE is causing bad style behaviors, we can look into the specific parts of the IDE projects to fix.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants