Skip to content

Conversation

@manuzhang
Copy link
Member

@manuzhang manuzhang commented Sep 2, 2025

In this PR, I add a step check_markdown_files in make build. The step scans markdown files in following paths with Pymarkdown Linter with pymarkdown scan.

  • site/docs/docs/nightly/docs/*.md (symlinked to versioned docs in docs/docs/*.md)
  • site/docs/*.md (site docs)
  • README.md

I selectively enabled the following rules through config file markdownlint.yml, which I believe are good to apply and can be applied automatically with pymarkdown fix command (see below).

  • list-indent
  • no-trailing-spaces
  • no-hard-tabs
  • no-multiple-blanks
  • no-multiple-space-atx
  • no-multiple-space-closed-atx
  • heading-start-left
  • no-multiple-space-blockquote
  • list-marker-space
  • hr-style
  • no-space-in-emphasis
  • no-space-in-code
  • no-space-in-links
  • proper-names
  • single-trailing-newline
  • code-fence-style

Issues found during scan can be fixed with dev/lint.sh --fix which actually runs pymarkdown fix to apply the rules.

@github-actions github-actions bot added the docs label Sep 2, 2025
@github-actions github-actions bot added the INFRA label Sep 2, 2025
@github-actions github-actions bot added the Specification Issues that may introduce spec changes. label Sep 12, 2025
@github-actions github-actions bot removed the INFRA label Sep 12, 2025
- limitations under the License.
-->

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD009: Trailing spaces


### Amazon Data Firehose
You can use [Firehose](https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-destination.html) to directly deliver streaming data to Apache Iceberg Tables in Amazon S3. With this feature, you can route records from a single stream into different Apache Iceberg Tables, and automatically apply insert, update, and delete operations to records in the Apache Iceberg Tables. This feature requires using the AWS Glue Data Catalog.
You can use [Firehose](https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-destination.html) to directly deliver streaming data to Apache Iceberg Tables in Amazon S3. With this feature, you can route records from a single stream into different Apache Iceberg Tables, and automatically apply insert, update, and delete operations to records in the Apache Iceberg Tables. This feature requires using the AWS Glue Data Catalog.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD047: Each file should end with a single newline character.

2. Dropping a column or field does not change the values in any other column.
3. Updating a column or field does not change values in any other column.
4. Changing the order of columns or fields in a struct does not change the values associated with a column or field name.
1. Added columns never read existing values from another column.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD030: Spaces after list markers

## DDL commands

### `CREATE Catalog`
### `CREATE Catalog`
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD019: Multiple spaces are present after hash character on Atx Heading.

@manuzhang manuzhang force-pushed the lint_doc_files branch 2 times, most recently from f326676 to b6b97fd Compare September 13, 2025 14:42
@manuzhang manuzhang marked this pull request as ready for review September 14, 2025 15:53
@manuzhang
Copy link
Member Author

@kevinjqliu May I get some early review from you?

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Looks like we have to make a lot of changes.

WDYT about incrementally fixing the .md files to make this easier to review? We can add the site/ changes first and lint more files in sequent PRs.

@kevinjqliu
Copy link
Contributor

#14154

this might be helpful for development :)

1 a 1.0
2 b 2.0
3 c 3.0
1 a 1.0
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD010: Hard Tabs

- `Magic` is four bytes 0x41, 0x47, 0x53, 0x31 ("AGS1", short for: AES GCM Stream, version 1)
- `BlockLength` is four bytes (little endian) integer keeping the length of the equal-size split blocks before encryption. The length is specified in bytes.
- `CipherBlockᵢ` is the i-th enciphered block in the file, with the structure defined below.
* `Magic` is four bytes 0x41, 0x47, 0x53, 0x31 ("AGS1", short for: AES GCM Stream, version 1)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MD004: Inconsistent Unordered List Start style.

@manuzhang manuzhang force-pushed the lint_doc_files branch 2 times, most recently from 54288c0 to 715ea1c Compare October 13, 2025 06:25
@manuzhang
Copy link
Member Author

@kevinjqliu @nastra This is ready for another round of review now. PTAL. Thanks!

Copy link
Contributor

@nastra nastra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but I think we should also set up a CI action so that this runs as part of the build. Also please document how to check/fix markdown files in this section here:

iceberg/README.md

Lines 54 to 59 in fc88614

Iceberg is built using Gradle with Java 11, 17, or 21.
* To invoke a build and run tests: `./gradlew build`
* To skip tests: `./gradlew build -x test -x integrationTest`
* To fix code style for default versions: `./gradlew spotlessApply`
* To fix code style for all versions of Spark/Hive/Flink:`./gradlew spotlessApply -DallModules`

@manuzhang
Copy link
Member Author

@nastra We already have a Docs Build CI which will run check_markdown_files. I've updated the instruction in the site/README.md

@nastra
Copy link
Contributor

nastra commented Oct 16, 2025

@nastra We already have a Docs Build CI which will run check_markdown_files.

@manuzhang I don't see this being called anywhere. Could you please point to the file that runs this when CI runs?

@nastra nastra requested a review from Fokko October 16, 2025 09:52
@manuzhang
Copy link
Member Author

@kevinjqliu @Fokko could you please take another look?

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, i rendered the site locally and spot checked most of the markdown files.

small nit about restructuring the check_markdown_files and fix_markdown_files command. Instead of calling check_markdown_files in site/dev/setup_env.sh, i would prefer to only lint when make lint is called.

we can add make lint to CI too

jobs:
build-docs:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: 3.x
- name: Build Iceberg documentation
run: make build
working-directory: ./site

@manuzhang
Copy link
Member Author

manuzhang commented Oct 27, 2025

@kevinjqliu @nastra Now dev/lint.sh has two modes.

The default mode dev/lint.sh can be run alone, and in dev/build.sh and dev/serve.sh to check markdown files for style issues. The fix mode dev/lint.sh --fix can be run to fix those files.

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

have a minor nit but we can address as a follow up

Comment on lines +241 to +242
echo "Markdown style issues found. Please run './dev/lint.sh --fix' to fix them."
exit 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: was there a make lint command before? i think thats pretty helpful so that i can call like so

make lint
make lint --fix

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That doesn't work since --fix will be parsed as Make options. On the other hand, I think dev/lint.sh and dev/lint.sh --fix are already easy to call.

@kevinjqliu
Copy link
Contributor

kevinjqliu commented Oct 27, 2025

doing some sanity checks, i added trailing whitespaces to site/docs/status.md and docs/docs/api.md
running ./dev/lint.sh and ./dev/lint.sh --fix doesnt affect the changed files.

i'd expect the whitespaces to be fixed

EDIT: adding back the changes in docs/docs/api.md and running ./dev/lint.sh worked

@kevinjqliu
Copy link
Contributor

I spot checked many of the markdown flies locally. LGTM

@kevinjqliu kevinjqliu merged commit 6a0d4a0 into apache:main Oct 27, 2025
3 checks passed
@kevinjqliu
Copy link
Contributor

Thank you @manuzhang for adding this and thanks @nastra for the review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

docs Specification Issues that may introduce spec changes.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants