Skip to content

Conversation

@mccheah
Copy link
Contributor

@mccheah mccheah commented Dec 6, 2018

An experimental proposition for standardizing style across the Iceberg project.

Part of, but not all, #24

Previously, whenever styles were inconsistent across the Iceberg codebase, it was up to code reviewers to catch the inconsistencies and to catch them before merging. This is prone to error, and little code linting problems can add up over time.

Baseline is a code linting toolkit for Gradle. It consists of multiple submodules that when taken together allow automation to enforce consistent coding guidelines. For more information, refer to the Baseline docs.

As a proof of concept, checkstyle is now only applied for the iceberg-api project, while IntelliJ project configuration is applied for all projects. We apply checkstyle changes only to iceberg-api to minimize code churn and to show how we can introduce baseline checks incrementally. Eventually we can add the same checkstyle rules for each submodule, and also, if we desire, apply the other baseline plugins to get additional linting capabilities such as Findbugs and error-prone.

Some opinions that baseline holds, which were often not in line with what Iceberg did - these include, but are not limited to:

  • Avoid static imports in main code
  • Static imports are ordered before regular imports
  • java.* imports are grouped with all other imports
  • Various minor Javadoc tweaks
  • Fields should always be private with accessor methods
  • Utility classes should always have a private no-arg constructor
  • Constructors shouldn't be marked as public if doing so would be redundant (e.g. private nested classes)

Any of the above and others can be changed according to whatever we see fit. In this PR we've specifically adjusted some of the checkstyle rules to be more consistent with what the codebase has historically held to. These include, but are not limited to:

  • Indentation is with 2 spaces and continuation indents are 4 spaces (Baseline normally holds that indentation should be with 4 spaces and continuation indents are 8 spaces)
  • Classes do not have to be final
  • No opinions are held about what kinds of assertion methods are used in tests
  • Javadocs do not require a summary statement before the @returns annotation.

For IntelliJ, the baseline plugin allows the IntelliJ project to be configured via ./gradlew idea with the following desirable settings, regardless of the developer's local IntelliJ settings for their other projects:

  • New files automatically have the correct license header
  • Indentation is 2 spaces, continuation indents is 4 spaces

We lastly note that baseline is not the only way to automate linting. Its main advantage is that it is very comprehensive. But the opinions it holds are numerous, and we can feel free to reject this proposal in favor of developing our own opinions from scratch.

@mccheah mccheah changed the title Apply baseline checkstyle api Apply baseline checkstyle for iceberg-api only Dec 6, 2018
@mccheah
Copy link
Contributor Author

mccheah commented Dec 6, 2018

One thing that's tricky about applying this to iceberg-api is that the style changes can impact public-facing APIs. If we want, we can temporarily suppress the checkstyle warnings on those API methods and fields in order to avoid breaking users. On the other hand, we can hold the opinion that we're choosing to break API now for the express purpose of getting our APIs consistent with the standards across the codebase.

They take up a lot of space and we can choose to add eclipse support later
@rdblue
Copy link
Contributor

rdblue commented Dec 6, 2018

I think most of the changes are fine, but I noticed a few things:

  • Static imports should come after non-static imports.
  • I like using static imports to cut down on code in symbol references, like EQ instead of Expressions.Operators.EQ.
  • Non-static imports should be a single group in alphabetical order. Not sure why some moved.
  • I definitely prefer UUIDLiteral to UuidLiteral. That's also public API so we should avoid changing it.
  • Some short variable name changes are fine (o -> other) but the L and W variables used in Transforms are from the spec, so I'd rather add an exclusion for them.
  • Some variable name changes lose information, like the use of classes instead of javaClasses. That was intentional because the class is the one used for the Java in-memory representation.
  • Is fieldIdToFind really better than fieldId? What rule is this matching? Same with newStruct instead of struct.
  • Seems like it would be easier to allow or require javadoc sentences to end in '.'
  • Like allowing '.', using operators at the end of a line than a the start of the next would require fewer changes.

I like the updates for:

  • Removing public modifiers when the class is package-private or private
  • Requiring a private constructor for classes with only static helpers

Does this enforce a line length of 100 characters?

Why does this require so many Palantir plugins? Are those required?

@mccheah
Copy link
Contributor Author

mccheah commented Dec 6, 2018

@rdblue thanks for the review!

Static imports should come after non-static imports.

Per Google's style guide (actually much of Baseline inherits the Google style guide).

Edit: Think it's fine if we want to be opinionated about moving static imports to be after non-static if we want to minimize code churn. In the long term it would be good to think about changing the import order to be consistent with the Google style guide.

  • I like using static imports to cut down on code in symbol references, like EQ instead of Expressions.Operators.EQ.

Fair enough, let's add exclusions as we see fit. I already added some exclusion, I believe for literals and operators. We can add them for expressions as well. I don't think we want to allow static imports globally though, it leads to difficulty to trace a method call back to its source without an IDE (i.e. in CR).

  • Non-static imports should be a single group in alphabetical order. Not sure why some moved.

That's what checkstyle should be supporting with this patch though? There were some that moved the core JDK imports into the main import block. I'll double check.

  • I definitely prefer UUIDLiteral to UuidLiteral. That's also public API so we should avoid changing it.

Fair enough. Baseline wants to enforce camelCase for everything, so names with consecutive capital letters are not allowed. See here and here. We can add an exclusion for ID and UUID, among others.

  • Some variable name changes lose information, like the use of classes instead of javaClasses. That was intentional because the class is the one used for the Java in-memory representation.
  • Is fieldIdToFind really better than fieldId? What rule is this matching? Same with newStruct instead of struct.

Baseline's checkstyle requires all variables to not hide fields. This meant that for fieldIdToFind, the parameter fieldId was hiding the field fieldId of the same class. All these variable name changes were to deconflict. Can we define parameter and field names that don't conflict but that are sensible on a case by case basis? I made an initial attempt here, can revise these again.

  • Seems like it would be easier to allow or require javadoc sentences to end in '.'

It's for the @return blocks. Opinion from baseline but we can revert it.

  • Like allowing '.', using operators at the end of a line than a the start of the next would require fewer changes.

I think we want to only enforce one or the other; we can find how to make the regex enforce the operators to be on the same line rather than on the next. Or we can just start making all operators move to the next line, as we do here.

Does this enforce a line length of 100 characters?

120 characters by default.

Why does this require so many Palantir plugins? Are those required?

These are all Palantir's open-source tooling! Granted one can get all of them automatically applied by just applying the com.palantir.baseline plugin. The ideal end state is to apply com.palantir.baseline globally, but doing that requires not only fixing checkstyle everywhere but also applying error-prone and other linting plugins that require further patches. For example I noticed error-prone complaining about some of the ways we write Preconditions checks.

@@ -0,0 +1,461 @@
<project version="4">
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iamdanfox would it be reasonable to allow the name of this file to be configurable? I was looking at the Baseline source code and the file name is hardcoded to this path. But including a reference to Palantir seems strange for projects that want to use baseline that don't find their origins with Palantir itself.

What I'd like to do instead is to run ./gradlew baselineUpdateConfig to get this template, but then be able to rename this file and reference the renamed path in build.gradle.

@rdblue
Copy link
Contributor

rdblue commented Dec 13, 2018

I think we should get this in, but I'd like to try to do it when we move from com.netflix packages to org.apache. That is going to touch every file as well, so we may as well do it at the same time.

@mccheah
Copy link
Contributor Author

mccheah commented Dec 15, 2018

What's the timeline for that? Should we start doing both sets of changes now?

@rdblue
Copy link
Contributor

rdblue commented Dec 18, 2018

I don't have a timeline yet, but I'd like to get it done in the next few weeks. I've added blocking issues to the 0.1.0 release milestone.

@mccheah
Copy link
Contributor Author

mccheah commented Jan 17, 2019

Let's prepare to do this after #69

@mccheah
Copy link
Contributor Author

mccheah commented Mar 20, 2019

I'm closing this in preparation for rewriting this against the most up to date master and on top of work to rename the packages.

@mccheah mccheah closed this Mar 20, 2019
jun-ma-0 pushed a commit to jun-ma-0/incubator-iceberg that referenced this pull request Dec 3, 2019
…nov-15

Merge Iceberg master as of Nov 15
puchengy added a commit to puchengy/iceberg that referenced this pull request Feb 10, 2023
puchengy added a commit to puchengy/iceberg that referenced this pull request Jun 7, 2023
…cked Iceberg Parquet table (apache#28)

(cherry picked from commit 165c5e8)
(cherry picked from commit 13e9a1bc1e544c419676fbdf0457056eb6706b51)
puchengy added a commit to puchengy/iceberg that referenced this pull request Jun 20, 2023
…cked Iceberg Parquet table (apache#28) (apache#52)

(cherry picked from commit 165c5e8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants