-
Notifications
You must be signed in to change notification settings - Fork 3k
Docs: lint markdown files in site build #13977
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
06484cf to
f88ac9c
Compare
f88ac9c to
15c353d
Compare
15c353d to
99421e9
Compare
99421e9 to
256ff71
Compare
| - limitations under the License. | ||
| --> | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MD009: Trailing spaces
|
|
||
| ### Amazon Data Firehose | ||
| You can use [Firehose](https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-destination.html) to directly deliver streaming data to Apache Iceberg Tables in Amazon S3. With this feature, you can route records from a single stream into different Apache Iceberg Tables, and automatically apply insert, update, and delete operations to records in the Apache Iceberg Tables. This feature requires using the AWS Glue Data Catalog. | ||
| You can use [Firehose](https://docs.aws.amazon.com/firehose/latest/dev/apache-iceberg-destination.html) to directly deliver streaming data to Apache Iceberg Tables in Amazon S3. With this feature, you can route records from a single stream into different Apache Iceberg Tables, and automatically apply insert, update, and delete operations to records in the Apache Iceberg Tables. This feature requires using the AWS Glue Data Catalog. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MD047: Each file should end with a single newline character.
| 2. Dropping a column or field does not change the values in any other column. | ||
| 3. Updating a column or field does not change values in any other column. | ||
| 4. Changing the order of columns or fields in a struct does not change the values associated with a column or field name. | ||
| 1. Added columns never read existing values from another column. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MD030: Spaces after list markers
| ## DDL commands | ||
|
|
||
| ### `CREATE Catalog` | ||
| ### `CREATE Catalog` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MD019: Multiple spaces are present after hash character on Atx Heading.
f326676 to
b6b97fd
Compare
b6b97fd to
05622c1
Compare
|
@kevinjqliu May I get some early review from you? |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Looks like we have to make a lot of changes.
WDYT about incrementally fixing the .md files to make this easier to review? We can add the site/ changes first and lint more files in sequent PRs.
|
this might be helpful for development :) |
05622c1 to
28b8e7a
Compare
| 1 a 1.0 | ||
| 2 b 2.0 | ||
| 3 c 3.0 | ||
| 1 a 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MD010: Hard Tabs
28b8e7a to
c84704d
Compare
| - `Magic` is four bytes 0x41, 0x47, 0x53, 0x31 ("AGS1", short for: AES GCM Stream, version 1) | ||
| - `BlockLength` is four bytes (little endian) integer keeping the length of the equal-size split blocks before encryption. The length is specified in bytes. | ||
| - `CipherBlockᵢ` is the i-th enciphered block in the file, with the structure defined below. | ||
| * `Magic` is four bytes 0x41, 0x47, 0x53, 0x31 ("AGS1", short for: AES GCM Stream, version 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
MD004: Inconsistent Unordered List Start style.
54288c0 to
715ea1c
Compare
715ea1c to
db6d69a
Compare
|
@kevinjqliu @nastra This is ready for another round of review now. PTAL. Thanks! |
nastra
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but I think we should also set up a CI action so that this runs as part of the build. Also please document how to check/fix markdown files in this section here:
Lines 54 to 59 in fc88614
| Iceberg is built using Gradle with Java 11, 17, or 21. | |
| * To invoke a build and run tests: `./gradlew build` | |
| * To skip tests: `./gradlew build -x test -x integrationTest` | |
| * To fix code style for default versions: `./gradlew spotlessApply` | |
| * To fix code style for all versions of Spark/Hive/Flink:`./gradlew spotlessApply -DallModules` |
db6d69a to
a35c2fa
Compare
|
@nastra We already have a Docs Build CI which will run |
@manuzhang I don't see this being called anywhere. Could you please point to the file that runs this when CI runs? |
a35c2fa to
2e6094c
Compare
|
@kevinjqliu @Fokko could you please take another look? |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, i rendered the site locally and spot checked most of the markdown files.
small nit about restructuring the check_markdown_files and fix_markdown_files command. Instead of calling check_markdown_files in site/dev/setup_env.sh, i would prefer to only lint when make lint is called.
we can add make lint to CI too
iceberg/.github/workflows/docs-ci.yml
Lines 28 to 38 in 68e555b
| jobs: | |
| build-docs: | |
| runs-on: ubuntu-latest | |
| steps: | |
| - uses: actions/checkout@v4 | |
| - uses: actions/setup-python@v5 | |
| with: | |
| python-version: 3.x | |
| - name: Build Iceberg documentation | |
| run: make build | |
| working-directory: ./site |
2e6094c to
3a4c37f
Compare
|
@kevinjqliu @nastra Now The default mode |
kevinjqliu
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
have a minor nit but we can address as a follow up
| echo "Markdown style issues found. Please run './dev/lint.sh --fix' to fix them." | ||
| exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: was there a make lint command before? i think thats pretty helpful so that i can call like so
make lint
make lint --fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't work since --fix will be parsed as Make options. On the other hand, I think dev/lint.sh and dev/lint.sh --fix are already easy to call.
|
doing some sanity checks, i added trailing whitespaces to i'd expect the whitespaces to be fixed EDIT: adding back the changes in |
|
I spot checked many of the markdown flies locally. LGTM |
|
Thank you @manuzhang for adding this and thanks @nastra for the review |
In this PR, I add a step
check_markdown_filesinmake build. The step scans markdown files in following paths with Pymarkdown Linter withpymarkdown scan.site/docs/docs/nightly/docs/*.md(symlinked to versioned docs indocs/docs/*.md)site/docs/*.md(site docs)README.mdI selectively enabled the following rules through config file
markdownlint.yml, which I believe are good to apply and can be applied automatically withpymarkdown fixcommand (see below).Issues found during scan can be fixed with
dev/lint.sh --fixwhich actually runspymarkdown fixto apply the rules.