Skip to content

Conversation

okumin
Copy link
Contributor

@okumin okumin commented Oct 10, 2025

What changes were proposed in this pull request?

Normalize the license headers. This pull request reformats files that don't allow the following Javadoc format. FYI: The expected header file is located at checkstyle/asf.header.

^\/\*$
^( \*| \* +.*)$
^ \*\/$

Originally, I attempted to update all at once in #6115 or #6112, but it is likely to be challenging to review. So, I created a separate pull request focusing on the minimal reformat as the first step.

Why are the changes needed?

Our CI checks the license header format of new files. Some (or most) people copy and paste the header from existing Java files. Unfortunately, existing files don't share the same format, and SonarQube often reports a license issue. A reviewer has to say "you must not copy and paste the header from existing files, you must pick it up from checkstyle/asf.header.

Does this PR introduce any user-facing change?

No

How was this patch tested?

I put the following configuration on checkstyle/checkstyle.xml, standalone-metastore/checkstyle/checkstyle.xml, and storage-api/checkstyle/checkstyle.xml. After that, I ran mvn checkstyle:check`.

<?xml version="1.0"?>
<!DOCTYPE module PUBLIC
    "-//Puppy Crawl//DTD Check Configuration 1.2//EN"
    "http://www.puppycrawl.com/dtds/configuration_1_2.dtd">
  <module name="Checker">
    <module name="RegexpHeader">
      <property name="headerFile" value="${config_loc}/asf.header.incremental"/>
      <property name="fileExtensions" value="java"/>
      <property name="multiLines" value="2"/>
    </module>
  </module>

How to reformat

I ran the following script to convert \A\/\*\*$ into \A\/\\*$.

git ls-files '*.java' | grep -v 'src/gen/thrift' | grep -v '^iceberg/' \
  | while read -r f; do
  perl -0777 -i -pe 's/\A[ \t]*\/\*\*(\R)/\/*$1/' "$f"
done

As the number of the other patterns was not so huge, everything else has been hand-maded.

@okumin
Copy link
Contributor Author

okumin commented Oct 11, 2025

Some indent issues will be handled by the next ticket/PR based on the comment in #6115.
https://sonarcloud.io/project/issues?id=apache_hive&pullRequest=6125&issueStatuses=OPEN,CONFIRMED&sinceLeakPeriod=true

Or we may find a reasonable regex that replaces asf.header.

@okumin okumin marked this pull request as ready for review October 11, 2025 01:18
@InvisibleProgrammer
Copy link
Contributor

Hi, I did the following to check this out:

  • checked out master
  • added the checkstyle rules and the asf.header.incremental files to the following:
    -- checkstyle/checkstyle.xml
    -- standalone-metastore/checkstyle/checkstyle.xml
    -- storage-api/checkstyle/checkstyle.xml
  • reviewed and ran the regex script
  • ran mvn checkstyle:check

The interesting thing that I experienced, that I got green checkstyle run just after those changes.
So, at this point, I think I miss something or don't understand a couple of things.
Could you please help answering the following:

  • My run changed only half of the files mentioned in this PR. It is a significant difference. Is there an additional script that collects files with incorrect header format?
  • What is the reason of the suppressions? The checkstyle run for me went well without them.
  • Why do we need the configuration exclusion at storage-api?

Additionally, I pushed my changes into a branch. Could you please check if I missed something in it? https://github.com/apache/hive/compare/master...InvisibleProgrammer:hive:header_checkstyle?expand=1

@okumin
Copy link
Contributor Author

okumin commented Oct 15, 2025

@InvisibleProgrammer

Thanks for your review!

My run changed only half of the files mentioned in this PR. It is a significant difference. Is there an additional script that collects files with incorrect header format?

As I mentioned, some changes are hand-maded. That's because it is overkill to write a bash script for a pattern involving only a couple of files. See the description of this PR for the details.

How to reformat

I ran the following script to convert \A\/\*\*$ into \A\/\\*$.

git ls-files '*.java' | grep -v 'src/gen/thrift' | grep -v '^iceberg/' \
  | while read -r f; do
  perl -0777 -i -pe 's/\A[ \t]*\/\*\*(\R)/\/*$1/' "$f"
done

As the number of the other patterns was not so huge, everything else has been hand-maded.

What is the reason of the suppressions? The checkstyle run for me went well without them.

You added the new check to the existing checkstyle.xml. Our current configuration does not raise an error but shows a warning. Please follow the following part in the description, or you can run mvn checkstyle:checkstyle-aggregate and manually see ./target/reports/checkstyle-aggregate.html. I recommend my way because the latter method shows you a million warnings unrelated to the header issue.

How was this patch tested?

I put the following configuration on checkstyle/checkstyle.xml, standalone-metastore/checkstyle/checkstyle.xml, and storage-api/checkstyle/checkstyle.xml. After that, I ran mvn checkstyle:check`.

<?xml version="1.0"?>
<!DOCTYPE module PUBLIC
    "-//Puppy Crawl//DTD Check Configuration 1.2//EN"
    "http://www.puppycrawl.com/dtds/configuration_1_2.dtd">
  <module name="Checker">
    <module name="RegexpHeader">
      <property name="headerFile" value="${config_loc}/asf.header.incremental"/>
      <property name="fileExtensions" value="java"/>
      <property name="multiLines" value="2"/>
    </module>
  </module>

Why do we need the configuration exclusion at storage-api?

Do you mean this change? It excludes checkstyle/asf.header.incremental from a target of Apache RAT. That's because the following content can not have the ASL header. We don't need to change the root ./pom.xml because it is originally excluded.

^\/\*$
^( \*| \* +.*)$
^ \*\/$

<suppress checks="RegexpHeader" files="serde\/src\/java\/org\/apache\/hadoop\/hive\/serde2\/lazy\/fast\/StringToDouble\.java"/>
<suppress checks="RegexpHeader" files="llap-server\/src\/java\/org\/apache\/hadoop\/hive\/llap\/daemon\/impl\/PriorityBlockingDeque\.java"/>
<!-- Generated by Avro -->
<suppress checks="RegexpHeader" files="hbase-handler\/src\/test\/org\/apache\/hadoop\/hive\/hbase\/avro\/.*\.java"/>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EmployeeAvro.java is modified in this PR. And I see no other file that ends with avro.java.

@InvisibleProgrammer
Copy link
Contributor

I have one last question: what do you think? Does it make sense to set the severity level to error for this rule only?
I'm afraid, if we keep it as a warning, on a long-term it will have no effect. Let's be honest: nobody reads the warnings.

I'm thinking about this:

<module name="RegexpHeader">
    <property name="severity" value="error"/>
    <property name="headerFile" value="${config_loc}/asf.header.incremental"/>
    <property name="fileExtensions" value="java"/>
    <property name="multiLines" value="2"/>
  </module>

@okumin
Copy link
Contributor Author

okumin commented Oct 17, 2025

@InvisibleProgrammer
Thanks for proposing. Probably, I could not explain the problem statement clearly.

As a Hive committer, I review all reports from SonarQube and don't merge pull requests with addressable issues. So, "nobody reads the warnings" doesn't apply to me. My position: SonarQube helps keep the code quality. Changing the severity per rule makes sense to me. Let me try it(it works! Thank you!).

image

As a PR author and reviewer, I spent some time on the header issue, as well as we consumed CI resources. There are two reasons. (1) I copy and paste them from existing files. (2) The error message is a little confusing. As a PR author, I struggled to find the SonarQube's mechanism and fix them. As a reviewer, I put two comments to resolve the header issue completely (first, second).

image

I assume each contributor no longer needs to spend human minutes + CI resources once we spend a couple of hours resolving the issue.

Important points: I'm open to alternative solutions or alternative ideas(e.g., discontinue the SonarQube integration). Also, since this is not urgent or critical, I'm open to voiding this ticket or putting off reviews. We can say "checkstyle/asf.header is the source of truth" on each pull request. We can revisit here once we feel it should be addressed.

Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants