Skip to content

Conversation

@JakeSCahill
Copy link
Contributor

@JakeSCahill JakeSCahill commented Apr 7, 2025

Resolves https://redpandadata.atlassian.net/browse/DOC-1275

This pull request adds a doc-tools CLI command available when you install the docs extensions and macros. This CLI allows writers and our internal tooling (such as Doc Detective) to have access to the same tools and allows us to manage them in one place.

The CLI will incorporate our rpk, cluster config properties, and metrics automation to make generating docs easier.

To test this change:

  1. Checkout the doc-tools branch.
  2. npm link
  3. doc-tools -h to find all available commands.
  4. doc-tools get-redpanda-version

You should see the version information.

Make sure to do npm unlink doc-tools after testing.

Wiki (won't work until this PR is merged): https://redpandadata.atlassian.net/wiki/spaces/DOC/pages/1185054748/Doc+Tools+CLI

Summary by CodeRabbit

  • New Features

    • Introduced a CLI tool for automating documentation workflows with dependency checks and commands for version retrieval and doc generation.
    • Added a Docker Compose setup for a secure multi-broker Redpanda cluster with tiered storage, audit logging, Iceberg integration, and related services including Console, Connect, MinIO, and Iceberg catalog.
    • Provided data pipelines for synthetic user profiles and example transaction schemas with documentation.
    • Delivered a customizable WebAssembly transform with source code, build metadata, and deployment instructions.
    • Automated extraction and generation of Redpanda CLI (rpk) and configuration property documentation in JSON and AsciiDoc formats.
    • Added scripts for converting Markdown to AsciiDoc, preprocessing and post-processing docs, and appending carets to external links.
    • Included utilities for installing test dependencies, managing Python virtual environments, and managing Redpanda clusters.
    • Added a GitHub file fetch utility to download files or directories from repositories.
  • Bug Fixes

    • Expanded .gitignore and .npmignore to exclude more temporary, generated, and environment-specific files.
  • Documentation

    • Added detailed READMEs for property extraction tools, WebAssembly transform customization, and transaction topic schema.
    • Provided schema and metadata files defining configuration properties and Redpanda models.
  • Tests

    • Added unit tests for multiple property extraction transformers verifying correct parsing and metadata handling.
  • Chores

    • Updated package version and dependencies, introducing new binaries and npm scripts.
    • Added configuration files for new tools, Docker Compose profiles, and property extraction workflows.

@netlify
Copy link

netlify bot commented Apr 7, 2025

Deploy Preview for docs-extensions-and-macros ready!

Name Link
🔨 Latest commit ecde8ba
🔍 Latest deploy log https://app.netlify.com/sites/docs-extensions-and-macros/deploys/6811e73f284c3200082829b2
😎 Deploy Preview https://deploy-preview-99--docs-extensions-and-macros.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@JakeSCahill JakeSCahill requested a review from a team April 7, 2025 15:52
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Apr 28, 2025

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

This update introduces a comprehensive suite of new features, scripts, and documentation for automating Redpanda documentation workflows, cluster orchestration, and property extraction. It adds a new CLI tool (doc-tools) for managing documentation-related tasks, a multi-service Docker Compose setup for Redpanda clusters, and several scripts for extracting, processing, and generating documentation from Redpanda source code and runtime metrics. Supporting utilities for environment setup, dependency installation, and data transformation are also included. The changes encompass new Python, Bash, Go, and Node.js scripts, updated configuration files, and extensive documentation for both users and developers.

Changes

File(s) Change Summary
.gitignore, .npmignore Expanded ignore lists to exclude additional directories and files related to builds, environments, and temporary artifacts.
bin/doc-tools.js Introduced a new CLI tool for automating documentation workflows, dependency checks, and generation commands.
package.json Updated version to 4.4.0, added new binary entry for doc-tools, new scripts, expanded files array, and added/updated dependencies.
macros/data-template.js Updated import paths to use ../extension-utils/ instead of ../util/.
docker-compose/bootstrap.yml, docker-compose/docker-compose.yml, docker-compose/generate-profiles.yaml, docker-compose/rpk-profile.yaml, docker-compose/transactions-schema.json, docker-compose/transactions.md Added new Docker Compose cluster configuration, service orchestration, data generation pipeline, rpk profile, JSON schema, and topic documentation files.
docker-compose/transform/README.adoc, docker-compose/transform/go.mod, docker-compose/transform/transform.go, docker-compose/transform/transform.yaml Added a customizable Go-based WASM transform, its metadata, Go module file, and usage documentation.
cli-utils/add-caret-external-links.py, cli-utils/beta-from-antora.js, cli-utils/generate-cluster-docs.sh, cli-utils/install-test-dependencies.sh, cli-utils/python-venv.sh, cli-utils/start-cluster.sh Introduced various utility scripts for link processing, prerelease detection, cluster doc generation, dependency installation, Python venv setup, and cluster orchestration.
tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh, tools/docusaurus-to-antora-conversion-scripts/get-file-changes.sh, tools/docusaurus-to-antora-conversion-scripts/post-process-asciidoc.js, tools/docusaurus-to-antora-conversion-scripts/pre-process-markdown.js Added scripts for converting Docusaurus docs to Antora, processing AsciiDoc files, and summarizing file changes between branches.
tools/get-console-version.js, tools/get-redpanda-version.js Added scripts to fetch the latest Console and Redpanda versions from GitHub, with support for beta/prerelease detection.
tools/metrics/metrics.py, tools/metrics/requirements.txt Added a script and requirements file for fetching, parsing, and documenting Redpanda metrics as JSON and AsciiDoc.
tools/gen-rpk-ascii.py Added a script to extract and document the rpk CLI command structure and flags as AsciiDoc and JSON.
tools/property-extractor/Makefile, tools/property-extractor/README.adoc, tools/property-extractor/definitions.json, tools/property-extractor/file_pair.py, tools/property-extractor/json-to-asciidoc/generate_docs.py, tools/property-extractor/parser.py, tools/property-extractor/property_bag.py, tools/property-extractor/property_extractor.py, tools/property-extractor/requirements.txt, tools/property-extractor/tests/transformers_test.py, tools/property-extractor/transformers.py Introduced a full property extraction toolchain: build automation, extraction logic, transformers, type definitions, JSON-to-AsciiDoc conversion, tests, and documentation.
tools/fetch-from-github.js Added a module to fetch files or directories from GitHub repositories recursively, saving them locally.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant DocToolsCLI
    participant ShellScript/Makefile
    participant Python/Go Script
    participant Docker/Redpanda Cluster

    User->>DocToolsCLI: Run doc-tools <command>
    DocToolsCLI->>DocToolsCLI: Check dependencies
    DocToolsCLI->>ShellScript/Makefile: Invoke appropriate script (e.g., generate metrics/property/rpk docs)
    ShellScript/Makefile->>Docker/Redpanda Cluster: Start/stop cluster as needed
    ShellScript/Makefile->>Python/Go Script: Run extraction/generation logic
    Python/Go Script->>Docker/Redpanda Cluster: Fetch data (metrics, properties, etc.)
    Python/Go Script->>ShellScript/Makefile: Output JSON/AsciiDoc
    ShellScript/Makefile->>DocToolsCLI: Return status/output
    DocToolsCLI->>User: Print results or errors
Loading

Assessment against linked issues

Objective Addressed Explanation
Provide a dedicated CLI tool for documentation automations as backbone for other automations (DOC-1275)

Poem

In the warren, scripts now bloom,
Redpanda clusters start with a vroom!
Docs are built, properties found,
Metrics and schemas all neatly bound.
Compose spins up, transforms deploy,
Rabbits hop with coder joy!
🐇✨


🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 32

🧹 Nitpick comments (47)
tools/docusaurus-to-antora-conversion-scripts/get-file-changes.sh (2)

3-6: Support non-interactive usage and validate inputs
Relying solely on read makes automation harder. Accept two positional parameters and ensure they’re provided, falling back to usage help otherwise.

if [[ $# -ne 2 ]]; then
  echo "Usage: $0 <branch1> <branch2>"
  exit 1
fi
branch1=$1
branch2=$2

8-9: Quote branch variables in Git commands
Unquoted variables can lead to word-splitting or pathname expansion. Also specify origin on fetch to avoid ambiguity.

-git fetch
-git diff --summary $branch1..$branch2 -- ./modules/
+git fetch origin
+git diff --summary "$branch1".."$branch2" -- ./modules/
.gitignore (1)

4-10: Normalize directory ignore patterns for clarity
Prefix root-level paths with / and append / to directories. This prevents accidentally ignoring similarly named files elsewhere.

-.vscode
+.vscode/
-autogenerated/
+.autogenerated/
-tmp/
+.tmp/
-venv/
+.venv/
-__pycache__/
+. __pycache__/
-gen/
+.gen/
-tree-sitter/
+/tree-sitter/
tools/property-extractor/file_pair.py (1)

1-7: Add docstrings to improve code documentation.

The FilePair class implementation is clear and concise, but it would benefit from docstrings to explain its purpose and how it fits into the property extraction workflow. Consider adding class and method docstrings to help future developers understand the intended usage.

 class FilePair:
+    """
+    Represents a pair of related source files: a header file and its corresponding implementation file.
+    
+    This class is used in the property extraction workflow to process pairs of C++ header and
+    implementation files together when extracting configuration properties.
+    """
     def __init__(self, header, implementation) -> None:
+        """
+        Initialize a FilePair with paths to header and implementation files.
+        
+        Args:
+            header: Path to the header file (.h)
+            implementation: Path to the implementation file (.cpp)
+        """
         self.header = header
         self.implementation = implementation

     def __repr__(self) -> str:
+        """Return string representation of the FilePair."""
         return f"(header={self.header}, implementation={self.implementation})"
tools/property-extractor/property_bag.py (1)

1-4: Add docstrings and consider additional utility methods.

This implementation of a nested auto-initializing dictionary is clean, but would benefit from documentation explaining its purpose and behavior. Consider also adding utility methods for common operations like merging, flattening, or deep copying nested property structures.

 class PropertyBag(dict):
+    """
+    An auto-initializing nested dictionary for managing hierarchical property metadata.
+    
+    PropertyBag allows accessing nested keys through chained indexing operations without
+    explicitly creating intermediate dictionaries. When a non-existent key is accessed,
+    a new PropertyBag is automatically created at that key.
+    
+    Example:
+        bag = PropertyBag()
+        bag['a']['b']['c'] = 'value'  # No need to initialize 'a' or 'b' dictionaries
+    """
     def __missing__(self, key):
+        """
+        Create a new PropertyBag when accessing a non-existent key.
+        
+        Args:
+            key: The missing dictionary key
+            
+        Returns:
+            PropertyBag: A newly created PropertyBag instance for the missing key
+        """
         self[key] = PropertyBag()
         return self[key]
docker-compose/transactions.md (1)

1-47: LGTM! Well-structured documentation with clear examples.

The transactions topic documentation is comprehensive and well-organized. It clearly explains the schema, provides a helpful example, and outlines potential use cases for the data.

One minor suggestion: In the schema description, consider specifying the exact ISO 8601 format for the timestamp field (as shown in your example) to provide additional clarity for users implementing against this schema.

-  - **timestamp**: The timestamp of when the transaction occurred, formatted in ISO 8601.
+  - **timestamp**: The timestamp of when the transaction occurred, formatted in ISO 8601 (e.g., "2024-08-16T15:51:19.799474084Z").
utils/beta-from-antora.js (2)

20-20: Consider handling various truthy prerelease values

Currently, the function only returns true if antoraConfig.prerelease === true (strict equality). This means other truthy values like string 'true' would be treated as false. Consider whether this strict comparison is intentional, or if you should handle other truthy values.

- return antoraConfig.prerelease === true;
+ return Boolean(antoraConfig.prerelease);

22-23: Consider passing errors to a logger or returning error information

The function currently logs errors to console but silently returns false. This might mask configuration issues for consumers of this function. Consider either using a configurable logger or returning error information to allow callers to handle errors appropriately.

- console.error('Error reading antora.yml:', error.message);
- return false;
+ console.error('Error reading antora.yml:', error.message);
+ // Either return an object with error info:
+ return { success: false, error: error.message };
+ // Or keep returning false but consider adding a logger parameter:
+ // (logger || console).error('Error reading antora.yml:', error.message);
+ // return false;
docker-compose/rpk-profile.yaml (1)

7-9: Security consideration for credentials

The configuration contains hardcoded credentials (superuser/secretpassword). While this may be acceptable for local development or documentation purposes, it's worth adding a comment explicitly stating this is for development use only.

  sasl:
    user: superuser # The username used for authentication
    password: secretpassword # The password associated with the username
+   # NOTE: These are example credentials for local development only. 
+   # Never use hardcoded credentials in production environments.
    mechanism: scram-sha-256 # Authentication mechanism; SCRAM-SHA-256 provides secure password-based authentication
utils/python-venv.sh (3)

11-11: Add a safeguard before removing directories

The rm -rf "$VENV_DIR" command could potentially be dangerous if $VENV_DIR is unexpectedly empty or set to a critical directory. Consider adding a safety check before deletion.

- rm -rf "$VENV_DIR"
+ # Safety check to avoid removing critical directories
+ if [[ -n "$VENV_DIR" && "$VENV_DIR" != "/" && "$VENV_DIR" != "$HOME" ]]; then
+   rm -rf "$VENV_DIR"
+ else
+   echo "Error: Invalid virtual environment directory: $VENV_DIR"
+   exit 1
+ fi

12-12: Consider checking Python availability

The script directly uses python3 without verifying its existence or version. Consider adding a check to ensure the appropriate Python version is available.

+ # Check if python3 is available
+ if ! command -v python3 &> /dev/null; then
+   echo "Error: python3 command not found. Please install Python 3."
+   exit 1
+ fi
+
+ # Optionally check minimum version
+ PYTHON_VERSION=$(python3 --version | cut -d' ' -f2)
+ if [[ "$(printf '%s\n' "3.6" "$PYTHON_VERSION" | sort -V | head -n1)" != "3.6" ]]; then
+   echo "Warning: Python 3.6 or higher recommended. Found: $PYTHON_VERSION"
+ fi
+
python3 -m venv "$VENV_DIR"

13-13: Consider selective verbosity for pip operations

The script uses --quiet for pip operations, which suppresses all output including potential warnings. Consider using a variable to control verbosity or allowing important warnings to be displayed.

- "$VENV_DIR/bin/pip" install --upgrade pip --quiet
+ # Allow controlling verbosity through environment variable
+ PIP_QUIET=${PIP_QUIET:---quiet}
+ "$VENV_DIR/bin/pip" install --upgrade pip $PIP_QUIET
docker-compose/transform/transform.yaml (1)

27-34: Consider adding default or validation for MATCH_VALUE

While PATTERN is correctly marked as required, there's no default value or validation specified for the MATCH_VALUE environment variable. Consider adding a default value to make the configuration more robust.

  PATTERN: '<required>'
+ MATCH_VALUE: 'false'
tools/get-redpanda-version.js (2)

22-27: Consider consistent import style

The script uses dynamic import for @octokit/rest but static imports for other dependencies. While this might be intentional for ES module compatibility, consider using a consistent import style for better maintainability.

- const { Octokit } = await import('@octokit/rest');
+ const { Octokit } = require('@octokit/rest');

Or alternatively, convert all imports to dynamic:

- const GetLatestRedpandaVersion = require('../extensions/version-fetcher/get-latest-redpanda-version.js');
- const { getPrereleaseFromAntora } = require('../utils/beta-from-antora.js');
+ const GetLatestRedpandaVersion = (await import('../extensions/version-fetcher/get-latest-redpanda-version.js')).default;
+ const { getPrereleaseFromAntora } = await import('../utils/beta-from-antora.js');

41-53: Consider adding success logging and version validation

The script logs errors but not successful operations. Adding debug-level logging would improve troubleshooting. Additionally, a simple validation of the version string format would add robustness.

  // Output for downstream consumption
+ console.log(`// Using Redpanda ${version} from ${dockerRepo}`);
  console.log(`REDPANDA_VERSION=${version}`);
  console.log(`REDPANDA_DOCKER_REPO=${dockerRepo}`);
+ 
+ // Simple validation that we have a semantic version
+ if (!version.match(/^\d+\.\d+\.\d+(-rc\.\d+)?$/)) {
+   console.warn(`Warning: Version format ${version} looks unusual`);
+ }
tools/docusaurus-to-antora-conversion-scripts/post-process-asciidoc.js (1)

20-33: Add error handling for directory operations

The directory processing function lacks error handling for filesystem operations and would benefit from some logging.

 function processDirectory(directory) {
-  const files = fs.readdirSync(directory);
+  let files;
+  try {
+    files = fs.readdirSync(directory);
+  } catch (err) {
+    console.error(`Error reading directory ${directory}: ${err.message}`);
+    return;
+  }

   files.forEach((file) => {
     const filePath = path.join(directory, file);
-    const stat = fs.statSync(filePath);
+    let stat;
+    try {
+      stat = fs.statSync(filePath);
+    } catch (err) {
+      console.error(`Error stating file ${filePath}: ${err.message}`);
+      return;
+    }

     if (stat.isFile() && path.extname(file) === '.adoc') {
+      console.log(`Processing file: ${filePath}`);
       processFile(filePath);
     } else if (stat.isDirectory()) {
       processDirectory(filePath);
     }
   });
 }
utils/add-caret-external-links.py (3)

4-19: Improve regex pattern and link replacement handling

The regex pattern only matches HTTPS links and might miss edge cases. Also, the replacement function has duplicate logic that could be simplified.

 # Define the regular expression pattern to match the links
-pattern = r'(https://[^\]]+)\[([^\]]+)\](?!\^)'
+pattern = r'(https?://[^\]]+)\[([^\]^]+)\]'

 # Function to process a single file
 def process_file(file_path):
     with open(file_path, 'r', encoding='utf-8') as file:
         content = file.read()

     def replace_link(match):
         link = match.group(1)
         text = match.group(2)
-        if text.endswith('^'):
-            return match.group(0)  # No modification if caret is already present
-        else:
-            return f"{link}[{text}^]"
+        return f"{link}[{text}^]"

48-57: Improve directory processing with error handling

The directory processing function doesn't handle potential filesystem errors and could benefit from additional logging.

 # Function to process all .adoc files in a directory
 def process_directory(directory_path):
-    for root, _, files in os.walk(directory_path):
-        for file in files:
-            if file.endswith('.adoc'):
-                file_path = os.path.join(root, file)
-                relative_file_path = os.path.relpath(file_path, directory_path)
-                if relative_file_path not in exclusion_list:
-                    process_file(file_path)
-                    print(f"Processed: {file_path}")
+    try:
+        for root, _, files in os.walk(directory_path):
+            for file in files:
+                if file.endswith('.adoc'):
+                    file_path = os.path.join(root, file)
+                    try:
+                        relative_file_path = os.path.relpath(file_path, directory_path)
+                        if relative_file_path not in exclusion_list:
+                            success = process_file(file_path)
+                            if success:
+                                print(f"Processed: {file_path}")
+                            else:
+                                print(f"Failed to process: {file_path}")
+                    except Exception as e:
+                        print(f"Error processing path {file_path}: {str(e)}")
+    except Exception as e:
+        print(f"Error walking directory {directory_path}: {str(e)}")

58-60: Consider adding command-line arguments for improved flexibility

The script would benefit from command-line arguments to customize behavior, such as specifying the modules directory, adding a dry-run mode, or overriding the exclusion list.

+import argparse
+
+# Parse command-line arguments
+def parse_args():
+    parser = argparse.ArgumentParser(description='Add caret to external links in AsciiDoc files')
+    parser.add_argument('--modules-dir', default=os.path.join(script_directory, '..', 'modules'),
+                        help='Path to the modules directory')
+    parser.add_argument('--dry-run', action='store_true',
+                        help='Only print files that would be modified without changing them')
+    return parser.parse_args()
+
+# Get command-line arguments
+args = parse_args()
+
 # Call the function with the constructed directory path
-process_directory(directory_path)
+process_directory(args.modules_dir)
tools/get-console-version.js (3)

15-24: Consider a more robust approach for determining beta usage

The current implementation works well, but you might want to add a fallback or default value if getPrereleaseFromAntora() throws an exception.

  let useBeta = beta;
  if (fromAntora) {
-    useBeta = getPrereleaseFromAntora();
+    try {
+      useBeta = getPrereleaseFromAntora();
+    } catch (err) {
+      console.warn('Failed to determine beta status from Antora:', err.message);
+      console.warn('Falling back to provided beta flag:', beta);
+    }
  }

37-44: Consider handling error recovery

The current implementation exits the process on error, which might not be ideal when this module is imported by other code. Consider either returning an error object or throwing a custom error that can be caught by the caller.

  try {
    data = await GetLatestConsoleVersion(octokit, owner, repo);
  } catch (err) {
    console.error('Failed to fetch Console version:', err.message);
-    process.exit(1);
+    throw new Error(`Failed to fetch Console version: ${err.message}`);
  }

  if (!data) {
    console.error('No version data returned for Console');
-    process.exit(1);
+    throw new Error('No version data returned for Console');
  }

46-50: Add version format validation

Consider adding validation to ensure the version string matches expected format before outputting it.

  // Select the version
  const version = useBeta
    ? (data.latestBetaRelease || data.latestStableRelease)
    : data.latestStableRelease;

+  // Validate version format
+  if (!version || !/^\d+\.\d+\.\d+(-[a-z0-9.]+)?$/.test(version)) {
+    console.error(`Invalid version format: ${version}`);
+    throw new Error(`Invalid version format: ${version}`);
+  }
+
  console.log(`CONSOLE_VERSION=${version}`);
  console.log(`CONSOLE_DOCKER_REPO=${CONSOLE_DOCKER_REPO}`);
docker-compose/bootstrap.yml (1)

56-56: Missing newline at end of file

Add a newline at the end of the file to follow YAML best practices and eliminate the static analysis warning.

 # https://docs.redpanda.com/current/reference/properties/cluster-properties/#enable_host_metrics
-enable_host_metrics: true
\ No newline at end of file
+enable_host_metrics: true
+
🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 56-56: no new line character at the end of file

(new-line-at-end-of-file)

docker-compose/generate-profiles.yaml (1)

24-24: Use correct ISO-639 code for Japanese

"jp" is a country code. The standard language code is "ja".

-      let languages = ["en", "es", "fr", "de", "zh", "jp"]
+      let languages = ["en", "es", "fr", "de", "zh", "ja"]
utils/install-test-dependencies.sh (3)

35-38: Remove unused variable missing_deps

The variable is set but never read, generating SC2034.

-        missing_deps=1
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 37-37: missing_deps appears unused. Verify use (or export if used externally).

(SC2034)


98-100: ARCH is unused – delete to silence ShellCheck

-ARCH="$(uname -m)"
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 99-99: ARCH appears unused. Verify use (or export if used externally).

(SC2034)


146-149: Protect against missing ~/.bashrc

Sourcing a non-existent file trips SC1090 and can break non-bash shells.

-    echo 'export PATH=$HOME/.local/bin:$PATH' >> ~/.bashrc
-    source ~/.bashrc
+    echo 'export PATH=$HOME/.local/bin:$PATH' >> ~/.bashrc
+    [[ -f ~/.bashrc ]] && source ~/.bashrc
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 148-148: ShellCheck can't follow non-constant source. Use a directive to specify location.

(SC1090)

tools/property-extractor/parser.py (2)

37-38: MAX_INITIALIZER_LIST_DEPTH declared but never used

Delete or implement logic that enforces the depth limit.


199-205: Iterate directly over the dict – avoid .keys()

-    for key in header_properties.keys():
+    for key in header_properties:

Slightly faster and clearer; also silences the Ruff SIM118 warning.

🧰 Tools
🪛 Ruff (0.8.2)

199-199: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh (1)

24-24: Address shellcheck warning SC2155.

Multiple variable assignments in the script mask return values, which can hide errors from commands.

Split declaration and assignment for better error detection:

-  local content="$(cat "$mdx_file")"
+  local content
+  content="$(cat "$mdx_file")"

Apply this pattern to lines 27, 48, 50, 54, 57, 58, and 62 as well.

🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 24-24: Declare and assign separately to avoid masking return values.

(SC2155)

utils/generate-cluster-docs.sh (2)

16-21: Consider preserving explicitly provided Docker repo value.

The script overrides DOCKER_REPO when the tag is an RC, even if it was explicitly provided as an argument.

Consider preserving the user's explicit choice:

 # if it's an RC tag, switch Docker repo
 shopt -s nocasematch
-if [[ "$TAG" =~ rc[0-9]+ ]]; then
+# Only switch repo if it wasn't explicitly provided as an argument
+if [[ "$TAG" =~ rc[0-9]+ ]] && [[ "$3" == "" ]]; then
   DOCKER_REPO="redpanda-unstable"
 fi
 shopt -u nocasematch

49-53: Consider checking if Python is installed.

The script relies on Python but doesn't check if it's available on the system.

Add a check for Python:

+# Check if Python is available
+if ! command -v python3 &> /dev/null; then
+  echo "Error: Python 3 is required but not installed."
+  exit 1
+fi
+
 # Ensure Python venv (always create under utils/venv)
 "$SCRIPT_DIR"/python-venv.sh \
   "$SCRIPT_DIR"/venv \
   "$SCRIPT_DIR"/../tools/metrics/requirements.txt
tools/property-extractor/README.adoc (1)

9-12: Mention Node + npm as required prerequisites

Later instructions (make buildnpm install, npx tree-sitter) rely on Node.js and npm, but they are not listed here. Without them the build will fail early with a cryptic error.

Add them to the bullet list so writers know to install them upfront.

tools/docusaurus-to-antora-conversion-scripts/pre-process-markdown.js (3)

37-55: Tab regex is brittle – attribute order is assumed

<TabItem label="Foo" value="bar"> will not match because the current pattern requires value= to precede label=. A safer approach is to parse attributes independently or use a DOM parser.

Minimal regex fix:

-const tabItems = [...match.matchAll(/\s?<TabItem[^>]*value="([^"]+)"[^>]*label="([^"]+)"[^>]*>([\s\S]*?)<\/TabItem>/g)];
+const tabItems = [...match.matchAll(/\s?<TabItem[^>]*?(?:value="([^"]+)")?[^>]*?(?:label="([^"]+)")?[^>]*>([\s\S]*?)<\/TabItem>/g)];

…and then fall back to one attribute when the other is undefined.


47-48: Superfluous argument may confuse future readers

markdownToAsciidoc only accepts one parameter, but a second empty string is passed here. While harmless in JS, it hints at a forgotten feature or refactor.

-const asciidocContent = markdownToAsciidoc(content.trim(), '');
+const asciidocContent = markdownToAsciidoc(content.trim());

58-65: <details> conversion fails when tags are on the same line

The regex insists on a newline after both <details> and <summary>. Many docs contain one-liners like <details><summary>Title</summary>…. Widen the pattern:

-const detailsRegex = /<details>(?:\r?\n)<summary>([\s\S]*?)<\/summary>(?:\r?\n)([\s\S]*?)(?:\r?\n)<\/details>/g;
+const detailsRegex = /<details>\s*?<summary>([\s\S]*?)<\/summary>\s*?([\s\S]*?)\s*?<\/details>/g;
tools/property-extractor/Makefile (2)

40-50: Use the venv’s python for pip to ensure isolation

Right now python3 -m venv … is correct, but subsequent pip install should be done via the same interpreter:

-  $(VENV)/bin/pip install --upgrade pip --quiet; \
-  $(VENV)/bin/pip install --no-cache-dir -r $<; \
+  $(PYTHON) -m pip install --upgrade pip --quiet; \
+  $(PYTHON) -m pip install --no-cache-dir -r $<; \

This avoids shebang / PATH mismatches on systems with multiple Python versions.


74-82: treesitter target assumes Node & npm – surface helpful error

If Node/NPM are missing the target fails deep inside npm install. Emit a user-friendly pre-check:

treesitter:
	@command -v node >/dev/null 2>&1 || { echo "❌ Node.js is required for tree-sitter"; exit 1; }
	@command -v npm  >/dev/null 2>&1 || { echo "❌ npm is required for tree-sitter"; exit 1; }
	@echo "🌲 Ensuring tree-sitter-cpp grammar…"
bin/doc-tools.js (1)

185-186: Avoid double-shell invocation for better portability

spawnSync('bash', [scriptPath, ...args], { shell: true }) spawns a shell
and explicitly asks for bash, which is redundant and breaks on systems
where bash is not in PATH (for example, Windows users running PowerShell).
Let Node pick the correct shell automatically:

-const result = spawnSync('bash', [scriptPath, ...args], { stdio: 'inherit', shell: true });
+const result = spawnSync(scriptPath, args, { stdio: 'inherit', shell: true });

The she-bang (#!/usr/bin/env bash) in generate-cluster-docs.sh will take
care of invoking Bash.

tools/metrics/metrics.py (1)

111-115: Repository root detection is fragile

repo_root = os.getcwd() assumes the caller’s working directory is the repo
root. doc-tools generate metrics-docs may invoke the script from anywhere,
leading to “autogenerated folder not found”.

-repo_root = os.getcwd()
+repo_root = os.path.abspath(os.path.join(os.path.dirname(__file__), "..", ".."))

This resolves the path relative to the script’s location and is immune to
cwd.

tools/property-extractor/property_extractor.py (1)

78-81: Combine nested if into a single condition

The double check can be simplified, improving readability and addressing Ruff
SIM102:

-        if any(path in implementation_value for path in file_paths):
-            if len(properties) > 0:
-                files_with_properties.append((fp, properties))
+        if properties and any(p in implementation_value for p in file_paths):
+            files_with_properties.append((fp, properties))
🧰 Tools
🪛 Ruff (0.8.2)

78-79: Use a single if statement instead of nested if statements

(SIM102)

tools/property-extractor/tests/transformers_test.py (1)

4-4: Replace wildcard import in tests

Even in test code, prefer explicit imports for correctness checking and IDE
support.

-from transformers import *
+from transformers import (
+    BasicInfoTransformer,
+    IsArrayTransformer,
+    NeedsRestartTransformer,
+    VisibilityTransformer,
+    TypeTransformer,
+    DeprecatedTransformer,
+    IsSecretTransformer,
+)
🧰 Tools
🪛 Ruff (0.8.2)

4-4: from transformers import * used; unable to detect undefined names

(F403)

tools/gen-rpk-ascii.py (1)

416-416: Remove stray semicolon

Ruff E703: the semicolon is unnecessary and can be dropped.

-    cmd_dict = build_dict(cmd_dict, executed_command, explanation, usage, it_flags, flag_list);
+    cmd_dict = build_dict(cmd_dict, executed_command, explanation, usage, it_flags, flag_list)
🧰 Tools
🪛 Ruff (0.8.2)

416-416: Statement ends with an unnecessary semicolon

Remove unnecessary semicolon

(E703)

tools/property-extractor/transformers.py (4)

100-112: vector<...> detection can never trigger after vector stripping

get_cpp_type_from_declaration() already removes the outer std::vector<…> wrapper (lines 95-97). Consequently, the regex vector<[^>]+string> here is unreachable.
Options:

  1. Detect the vector before stripping and immediately yield "string[]" (or "array" + items) – preferred.
  2. Remove this mapping entry to avoid dead code.

Leaving the rule intact is misleading and suggests coverage that does not exist.


75-78: Minor: simplify boolean conversion

-        property["visibility"] = re.sub(
-            r"^.*::", "", info["params"][2]["value"]["visibility"]
-        )
+        property["visibility"] = re.sub(r"^.*::", "", info["params"][2]["value"]["visibility"])

…and in IsSecretTransformer later:

-        property["is_secret"] = is_secret == "yes"
+        property["is_secret"] = (is_secret == "yes")

While not functionally critical, applying bool(...) (as Ruff suggests) or an explicit comparison improves readability and satisfies the linter.


6-18: Avoid shadowing the built-in property name

Throughout the module every parse() signature uses a parameter named property, shadowing the Python built-in property() decorator. Although not harmful at runtime, it can confuse IDEs, linters and developers.

Consider a universal rename (e.g. prop or bag) across all transformers:

-    def parse(self, property, info, file_pair):
-        property["name"] = ...
+    def parse(self, prop, info, file_pair):
+        prop["name"] = ...

This change is mechanical and can be executed safely with a global search-and-replace.


1-3: Package-relative imports improve resilience

If tools/property_extractor is turned into a proper Python package, absolute file-system imports can break. Switching to explicit relative imports keeps the module usable both as a script and as part of a package:

-from property_bag import PropertyBag
-from parser import normalize_string
+from .property_bag import PropertyBag
+from .parser import normalize_string

This is a non-breaking change today and future-proofs the codebase.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5d44bae and 4588b07.

⛔ Files ignored due to path filters (3)
  • docker-compose/transform/go.sum is excluded by !**/*.sum
  • docker-compose/transform/regex.wasm is excluded by !**/*.wasm
  • package-lock.json is excluded by !**/package-lock.json
📒 Files selected for processing (40)
  • .gitignore (1 hunks)
  • .npmignore (1 hunks)
  • bin/doc-tools.js (1 hunks)
  • docker-compose/bootstrap.yml (1 hunks)
  • docker-compose/docker-compose.yml (1 hunks)
  • docker-compose/generate-profiles.yaml (1 hunks)
  • docker-compose/rpk-profile.yaml (1 hunks)
  • docker-compose/transactions-schema.json (1 hunks)
  • docker-compose/transactions.md (1 hunks)
  • docker-compose/transform/README.adoc (1 hunks)
  • docker-compose/transform/go.mod (1 hunks)
  • docker-compose/transform/transform.go (1 hunks)
  • docker-compose/transform/transform.yaml (1 hunks)
  • package.json (4 hunks)
  • tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh (1 hunks)
  • tools/docusaurus-to-antora-conversion-scripts/get-file-changes.sh (1 hunks)
  • tools/docusaurus-to-antora-conversion-scripts/post-process-asciidoc.js (1 hunks)
  • tools/docusaurus-to-antora-conversion-scripts/pre-process-markdown.js (1 hunks)
  • tools/gen-rpk-ascii.py (1 hunks)
  • tools/get-console-version.js (1 hunks)
  • tools/get-redpanda-version.js (1 hunks)
  • tools/metrics/metrics.py (1 hunks)
  • tools/metrics/requirements.txt (1 hunks)
  • tools/property-extractor/Makefile (1 hunks)
  • tools/property-extractor/README.adoc (1 hunks)
  • tools/property-extractor/definitions.json (1 hunks)
  • tools/property-extractor/file_pair.py (1 hunks)
  • tools/property-extractor/json-to-asciidoc/generate_docs.py (1 hunks)
  • tools/property-extractor/parser.py (1 hunks)
  • tools/property-extractor/property_bag.py (1 hunks)
  • tools/property-extractor/property_extractor.py (1 hunks)
  • tools/property-extractor/requirements.txt (1 hunks)
  • tools/property-extractor/tests/transformers_test.py (1 hunks)
  • tools/property-extractor/transformers.py (1 hunks)
  • utils/add-caret-external-links.py (1 hunks)
  • utils/beta-from-antora.js (1 hunks)
  • utils/generate-cluster-docs.sh (1 hunks)
  • utils/install-test-dependencies.sh (1 hunks)
  • utils/python-venv.sh (1 hunks)
  • utils/start-cluster.sh (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (7)
utils/beta-from-antora.js (2)
tools/get-console-version.js (4)
  • fs (4-4)
  • require (7-7)
  • path (5-5)
  • yaml (3-3)
tools/get-redpanda-version.js (1)
  • require (4-4)
tools/get-console-version.js (2)
utils/beta-from-antora.js (3)
  • yaml (3-3)
  • fs (1-1)
  • path (2-2)
tools/get-redpanda-version.js (7)
  • require (4-4)
  • owner (13-13)
  • repo (14-14)
  • useBeta (17-17)
  • octokit (24-26)
  • data (29-29)
  • version (45-45)
tools/property-extractor/parser.py (3)
tools/property-extractor/file_pair.py (1)
  • FilePair (1-7)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/transformers.py (16)
  • parse (10-17)
  • parse (24-35)
  • parse (45-50)
  • parse (57-63)
  • parse (74-77)
  • parse (120-122)
  • parse (133-135)
  • parse (146-148)
  • parse (159-175)
  • parse (185-203)
  • parse (211-256)
  • parse (272-305)
  • parse (312-313)
  • parse (326-334)
  • parse (343-348)
  • parse (364-382)
bin/doc-tools.js (3)
tools/get-console-version.js (2)
  • require (7-7)
  • path (5-5)
tools/get-redpanda-version.js (1)
  • require (4-4)
utils/beta-from-antora.js (1)
  • path (2-2)
tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh (2)
tools/docusaurus-to-antora-conversion-scripts/pre-process-markdown.js (2)
  • command (25-25)
  • pandoc (3-3)
util/compute-out.js (1)
  • dirname (8-8)
tools/property-extractor/property_extractor.py (4)
tools/property-extractor/file_pair.py (1)
  • FilePair (1-7)
tools/property-extractor/parser.py (2)
  • build_treesitter_cpp_library (228-229)
  • extract_properties_from_file_pair (208-225)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/transformers.py (48)
  • TypeTransformer (80-122)
  • EnterpriseTransformer (337-348)
  • MetaParamTransformer (351-382)
  • BasicInfoTransformer (6-17)
  • IsNullableTransformer (20-35)
  • IsArrayTransformer (38-50)
  • NeedsRestartTransformer (53-63)
  • VisibilityTransformer (66-77)
  • DeprecatedTransformer (125-135)
  • IsSecretTransformer (138-148)
  • NumericBoundsTransformer (151-175)
  • DurationBoundsTransformer (178-203)
  • SimpleDefaultValuesTransformer (206-256)
  • FriendlyDefaultTransformer (259-305)
  • ExperimentalTransformer (308-313)
  • AliasTransformer (316-334)
  • accepts (7-8)
  • accepts (21-22)
  • accepts (42-43)
  • accepts (54-55)
  • accepts (67-72)
  • accepts (81-82)
  • accepts (126-131)
  • accepts (139-144)
  • accepts (155-157)
  • accepts (182-183)
  • accepts (207-209)
  • accepts (269-270)
  • accepts (309-311)
  • accepts (317-324)
  • accepts (338-341)
  • accepts (352-362)
  • parse (10-17)
  • parse (24-35)
  • parse (45-50)
  • parse (57-63)
  • parse (74-77)
  • parse (120-122)
  • parse (133-135)
  • parse (146-148)
  • parse (159-175)
  • parse (185-203)
  • parse (211-256)
  • parse (272-305)
  • parse (312-313)
  • parse (326-334)
  • parse (343-348)
  • parse (364-382)
tools/property-extractor/transformers.py (2)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/parser.py (1)
  • normalize_string (92-93)
🪛 YAMLlint (1.35.1)
docker-compose/bootstrap.yml

[error] 56-56: no new line character at the end of file

(new-line-at-end-of-file)

🪛 Ruff (0.8.2)
tools/property-extractor/parser.py

9-9: sys imported but unused

Remove unused import: sys

(F401)


199-199: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

tools/metrics/metrics.py

7-7: glob imported but unused

Remove unused import: glob

(F401)

tools/property-extractor/property_extractor.py

14-14: from transformers import * used; unable to detect undefined names

(F403)


78-79: Use a single if statement instead of nested if statements

(SIM102)


88-88: TypeTransformer may be undefined, or defined from star imports

(F405)


90-90: EnterpriseTransformer may be undefined, or defined from star imports

(F405)


91-91: MetaParamTransformer may be undefined, or defined from star imports

(F405)


92-92: BasicInfoTransformer may be undefined, or defined from star imports

(F405)


94-94: IsNullableTransformer may be undefined, or defined from star imports

(F405)


95-95: IsArrayTransformer may be undefined, or defined from star imports

(F405)


96-96: NeedsRestartTransformer may be undefined, or defined from star imports

(F405)


97-97: VisibilityTransformer may be undefined, or defined from star imports

(F405)


98-98: DeprecatedTransformer may be undefined, or defined from star imports

(F405)


99-99: IsSecretTransformer may be undefined, or defined from star imports

(F405)


100-100: NumericBoundsTransformer may be undefined, or defined from star imports

(F405)


101-101: DurationBoundsTransformer may be undefined, or defined from star imports

(F405)


102-102: SimpleDefaultValuesTransformer may be undefined, or defined from star imports

(F405)


103-103: FriendlyDefaultTransformer may be undefined, or defined from star imports

(F405)


104-104: ExperimentalTransformer may be undefined, or defined from star imports

(F405)


105-105: AliasTransformer may be undefined, or defined from star imports

(F405)

tools/property-extractor/tests/transformers_test.py

4-4: from transformers import * used; unable to detect undefined names

(F403)


29-29: BasicInfoTransformer may be undefined, or defined from star imports

(F405)


46-46: IsArrayTransformer may be undefined, or defined from star imports

(F405)


46-46: TypeTransformer may be undefined, or defined from star imports

(F405)


86-86: NeedsRestartTransformer may be undefined, or defined from star imports

(F405)


149-149: VisibilityTransformer may be undefined, or defined from star imports

(F405)


197-197: TypeTransformer may be undefined, or defined from star imports

(F405)


252-252: DeprecatedTransformer may be undefined, or defined from star imports

(F405)


316-316: IsSecretTransformer may be undefined, or defined from star imports

(F405)

tools/property-extractor/transformers.py

69-71: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


141-143: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


226-226: Remove unnecessary True if ... else False

Remove unnecessary True if ... else False

(SIM210)


339-341: Return the condition directly

Inline condition

(SIM103)

tools/gen-rpk-ascii.py

142-142: Using .strip() with multi-character strings is misleading

(B005)


369-369: Using .strip() with multi-character strings is misleading

(B005)


393-393: Use enumerate() for index variable quantity in for loop

(SIM113)


406-406: Using .strip() with multi-character strings is misleading

(B005)


416-416: Statement ends with an unnecessary semicolon

Remove unnecessary semicolon

(E703)

🪛 Shellcheck (0.10.0)
utils/install-test-dependencies.sh

[warning] 37-37: missing_deps appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 99-99: ARCH appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 148-148: ShellCheck can't follow non-constant source. Use a directive to specify location.

(SC1090)

tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh

[warning] 24-24: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 27-27: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 48-48: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 50-50: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 54-54: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 57-57: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 58-58: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 62-62: Declare and assign separately to avoid masking return values.

(SC2155)

🔇 Additional comments (35)
tools/property-extractor/requirements.txt (1)

1-2: Dependencies and versions look appropriate
Pinning tree_sitter==0.21.1 ensures compatibility with the C++ grammar APIs, and setuptools>=42.0.0 satisfies packaging requirements.

docker-compose/transform/go.mod (1)

5-5:

✅ Verification successful

Ensure go.sum captures checksums for supply-chain integrity
After updating go.mod, run go mod tidy and check in any changes to go.sum.


🏁 Script executed:

#!/bin/bash
go mod tidy
git diff --exit-code go.sum

Length of output: 324


🏁 Script executed:

#!/bin/bash
# Verify go.mod and go.sum in the transform module directory
cd docker-compose/transform
go mod tidy
git diff --exit-code go.sum

Length of output: 161


No go.sum updates required
Running go mod tidy in docker-compose/transform produced no changes to go.sum, so all checksums (including for github.com/redpanda-data/redpanda/src/transform-sdk/go/transform v1.1.0) are already present.

.npmignore (1)

4-9: LGTM! Properly updated ignore patterns.

These additions to .npmignore appropriately exclude build artifacts, temporary files, tests, and Python cache directories from the npm package, which is good practice for keeping the package clean and minimal.

utils/beta-from-antora.js (1)

1-27: Well-structured and cleanly implemented utility function

The getPrereleaseFromAntora() function is well-designed with good error handling and a clear purpose. It appropriately handles cases where the antora.yml file doesn't exist or can't be parsed.

docker-compose/transactions-schema.json (1)

1-37: Well-structured JSON schema with clear property definitions

The JSON schema is well-formed and provides clear validation rules for a "Transactions" object, with appropriate formatting requirements and descriptive metadata for each field.

docker-compose/rpk-profile.yaml (1)

1-24: Well-documented configuration file with comprehensive setup

The configuration is well-structured with detailed comments explaining each section's purpose. It properly sets up both Kafka API and Admin API connections for a three-broker Redpanda cluster.

utils/python-venv.sh (1)

1-20: Effective virtual environment setup script with good defaults

The script provides a clean, reusable way to create Python virtual environments with proper error handling using set -euo pipefail. The default values and parameter handling are well implemented.

docker-compose/transform/transform.yaml (3)

1-7: Well-structured metadata file with comprehensive comments

The file starts with clear and informative comments explaining the purpose and structure of this transform metadata file, which helps users understand how the file is used by the rpk transform build command.


8-20: Clear transform definition with detailed documentation

The transform definition includes a clear name, comprehensive description, and well-documented environment variables. The reference to RE2 syntax with a link to the documentation is particularly helpful for users implementing regex patterns.


21-26: Good design choice leaving topics configurable

Leaving input and output topics empty by default and mentioning they can be set during deployment provides flexibility for users when implementing this transform.

tools/get-redpanda-version.js (2)

1-21: Well-structured module with clear documentation

The script has proper shebang, imports, and JSDoc comments that clearly document the function parameters and purpose.


28-40: Robust error handling for API requests

The script properly handles errors when fetching version data and exits with an appropriate error code. This is a good practice for CLI tools.

tools/docusaurus-to-antora-conversion-scripts/post-process-asciidoc.js (1)

34-51: Robust command-line argument validation

The input validation is well-implemented, checking for missing arguments and validating that the path exists and is either a file or directory.

package.json (5)

3-3: Version bump looks appropriate for the new functionality

The increment from 4.3.0 to 4.4.0 follows semantic versioning for feature additions.


15-17: LGTM! Great addition of CLI binary

The new doc-tools CLI entry point aligns perfectly with the PR objective of providing a consistent set of tools.


19-21: Well-structured npm script commands

These scripts provide convenient access to the CLI functionality through npm, following Node.js best practices.


65-69: Good inclusion of new directories in the "files" array

Ensuring these directories are included in the published package is essential for the CLI to function properly.


97-98: Dependencies look appropriate

The addition of tree-sitter (for parsing) and updated yaml dependency support the new functionality while using the caret prefix for compatibility with minor version updates.

tools/get-console-version.js (2)

1-14: Well-structured script with good documentation

The script is properly set up as an executable with a shebang line and includes clear JSDoc comments.


18-18:

✅ Verification successful

Verify Docker repository name

Ensure that setting CONSOLE_DOCKER_REPO to the value of repo ('console') is the correct Docker repository name for the Console software.


🏁 Script executed:

#!/bin/bash
# Verify the Docker repository name for Console

# Check if there are any references to a different Console Docker repo name in the codebase
echo "Searching for Console Docker repository references:"
rg -i "console.*(docker|image|container)" --type yaml --type js

Length of output: 1628


Console Docker repository name confirmed

All occurrences use “console” as the default Docker repository name—no change needed:

  • docker-compose/docker-compose.yml: ${CONSOLE_DOCKER_REPO:-console}
  • bin/doc-tools.js: consoleDockerRepo: 'console'
  • preview/extensions-and-macros/modules/ROOT/attachments/test.yaml: console-image: …/console:v…
docker-compose/bootstrap.yml (3)

1-8: Well-documented configuration file

Excellent header with clear explanation of the file's purpose and link to documentation.


10-22: Good authentication and topic configuration

The security settings (SASL, superusers) and topic auto-creation are well-documented with links to references.


23-56: Comprehensive enterprise feature configuration

The configuration for data transforms, audit logging, tiered storage, and metrics collection is thorough and well-documented.

🧰 Tools
🪛 YAMLlint (1.35.1)

[error] 56-56: no new line character at the end of file

(new-line-at-end-of-file)

docker-compose/transform/README.adoc (5)

1-6: Clear introduction and purpose explanation

The README begins with a clear explanation of what the directory contains and why someone might want to customize the transform.


8-13: Well-documented motivations for customization

Good job explaining the potential reasons someone might want to customize the transform.


14-28: Clear prerequisites section

The prerequisites section is thorough and includes commands to verify the requirements.


29-67: Comprehensive step-by-step instructions

The instructions for modifying, building, deploying, and testing the transform are clear and include example commands.


68-74: Helpful suggested reading section

The references to additional resources are relevant and well-formatted.

utils/start-cluster.sh (1)

43-46: ⚠️ Potential issue

set -euo pipefail + grep can exit unexpectedly – guard the probe

If no containers are running, grep -q Up exits 1, causing the whole script
to abort. Redirect the grep status so the check is non-fatal.

-if docker compose ps | grep -q Up; then
+if docker compose ps | grep -q Up || true; then

An alternative is docker compose ps --format '{{.State}}' | grep -q Up which
is cleaner, but the guard above is the minimal change.

Likely an incorrect or invalid review comment.

docker-compose/transform/transform.go (3)

44-51: LGTM! Concise boolean parser implementation.

The isTrueVar function provides a clean way to handle various string representations of boolean values.


57-84: Good initialization setup with proper error handling.

The main() function properly initializes the transform by:

  • Setting up logging with appropriate flags
  • Checking for required environment variables
  • Clearly logging configuration values
  • Properly compiling the regex pattern
  • Registering the transform handler

The error handling for the required PATTERN environment variable is appropriate.


89-122: Well-structured record processing with good logging.

The doRegexFilter function effectively processes each record by:

  • Determining whether to check key or value based on configuration
  • Handling null key/value cases
  • Logging detailed information about the matching process
  • Correctly passing or dropping records based on the regex match

The code is easy to follow and has good debug logging at each step.

utils/generate-cluster-docs.sh (1)

1-3: Good script setup with error handling.

The script uses appropriate shebang and error handling flags (set -euo pipefail), which ensure that errors are caught early and the script exits on any failure.

tools/docusaurus-to-antora-conversion-scripts/pre-process-markdown.js (1)

74-85: Risk of overlapping replacements when multiple tables exist

Because replace is executed inside the loop, index offsets shift and subsequent htmlTableMatch values can become stale, potentially skipping tables. Build the replacements first, then apply them once, or process with a streaming parser.

For a quick fix, rebuild the regex each iteration and call newContent = newContent.replace(htmlTableMatch, …) only on the slice returned by match() to mitigate offset drift.

tools/property-extractor/Makefile (1)

52-55: make clean may wipe artefacts generated by other tooling

Removing the whole $(REPO_ROOT)/autogenerated tree might delete unrelated files produced by CI or other docs tools. Consider restricting the path to the sub-folder this Makefile owns:

-	rm -rf $(VENV) $(TMP_ROOT) $(REPO_ROOT)/autogenerated
+	rm -rf $(VENV) $(TMP_ROOT) $(OUTPUT_DIR)

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

♻️ Duplicate comments (1)
tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh (1)

84-84: Fix path to post-processing script.

Similar to the pre-processing script, the path to the post-processing script should be relative to the current script location.

-  node post-process-asciidoc.js "$output_file"
+  node "$(dirname "$0")/post-process-asciidoc.js" "$output_file"
🧹 Nitpick comments (3)
utils/add-caret-external-links.py (2)

43-44: Remove duplicate comment line.

There's a duplicate comment line here that should be removed.

 # List of excluded file paths (relative paths)
-# List of excluded file paths (relative paths)
 exclusion_list = [

1-5: Add script documentation.

Consider adding a docstring at the beginning of the script to explain its purpose, usage, and behavior.

 import os
 import re
 
+"""
+This script recursively processes all .adoc files in the modules directory,
+adding a caret (^) to external links that don't already have one.
+
+Usage: python add-caret-external-links.py
+"""
+
 # Define the regular expression pattern to match the links
 pattern = r'(https://[^\]]+)\[([^\]]+)\](?!\^)'
tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh (1)

19-19: Correct the script name in usage message.

The usage message refers to "your_script.sh" instead of the actual script name. Replace it with the basename of the script for clarity.

-  echo "Usage: ./your_script.sh /path/to/your/source_directory"
+  echo "Usage: ./$(basename "$0") /path/to/your/source_directory"
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4588b07 and c839f7b.

📒 Files selected for processing (7)
  • bin/doc-tools.js (1 hunks)
  • docker-compose/transactions-schema.json (1 hunks)
  • tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh (1 hunks)
  • tools/docusaurus-to-antora-conversion-scripts/get-file-changes.sh (1 hunks)
  • tools/docusaurus-to-antora-conversion-scripts/post-process-asciidoc.js (1 hunks)
  • tools/docusaurus-to-antora-conversion-scripts/pre-process-markdown.js (1 hunks)
  • utils/add-caret-external-links.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tools/docusaurus-to-antora-conversion-scripts/get-file-changes.sh
🚧 Files skipped from review as they are similar to previous changes (4)
  • tools/docusaurus-to-antora-conversion-scripts/post-process-asciidoc.js
  • docker-compose/transactions-schema.json
  • tools/docusaurus-to-antora-conversion-scripts/pre-process-markdown.js
  • bin/doc-tools.js
🧰 Additional context used
🪛 Shellcheck (0.10.0)
tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh

[warning] 30-30: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 33-33: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 54-54: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 56-56: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 60-60: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 63-63: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 64-64: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 68-68: Declare and assign separately to avoid masking return values.

(SC2155)

🪛 Ruff (0.8.2)
utils/add-caret-external-links.py

28-28: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


28-28: SyntaxError: Simple statements must be separated by newlines or semicolons


28-28: SyntaxError: Expected a statement


28-29: SyntaxError: Expected an expression


30-30: SyntaxError: Invalid annotated assignment target


30-30: SyntaxError: Expected an identifier, but found a keyword 'try' that cannot be used here


30-31: SyntaxError: Expected an expression


31-31: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


31-31: SyntaxError: Simple statements must be separated by newlines or semicolons


31-31: SyntaxError: Expected a statement


31-32: SyntaxError: Expected an expression


33-33: SyntaxError: Expected an identifier, but found a keyword 'except' that cannot be used here


33-33: SyntaxError: Simple statements must be separated by newlines or semicolons


33-33: SyntaxError: Expected a statement


33-34: SyntaxError: Expected an expression


35-35: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


35-35: SyntaxError: Simple statements must be separated by newlines or semicolons


36-36: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


36-36: SyntaxError: Simple statements must be separated by newlines or semicolons

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-extensions-and-macros
  • GitHub Check: Header rules - docs-extensions-and-macros
  • GitHub Check: Pages changed - docs-extensions-and-macros

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

♻️ Duplicate comments (7)
tools/metrics/metrics.py (1)

4-7: ⚠️ Potential issue

Remove unused import

The re module is imported at the file level but then redefined at line 43 inside the parse_metrics function.

import os
import sys
import requests
-import re
import json
import logging
🧰 Tools
🪛 Ruff (0.8.2)

4-4: re imported but unused

Remove unused import: re

(F401)

tools/gen-rpk-ascii.py (4)

142-142: ⚠️ Potential issue

Fix misleading use of lstrip

Using lstrip with a multi-character string doesn't remove the entire prefix, but instead removes any character that appears in the string.

-    return commands.lstrip("Available Commands:")
+    prefix = "Available Commands:"
+    return commands[len(prefix):] if commands.startswith(prefix) else commands
🧰 Tools
🪛 Ruff (0.8.2)

142-142: Using .strip() with multi-character strings is misleading

(B005)


369-370: ⚠️ Potential issue

Fix misleading use of lstrip

Same issue with lstrip as before, which could lead to unexpected behavior.

-available_commmands = commands.lstrip("Available Commands:")
+prefix = "Available Commands:"
+available_commmands = commands[len(prefix):] if commands.startswith(prefix) else commands
it_commands = extract_new_commands(available_commmands, False)
🧰 Tools
🪛 Ruff (0.8.2)

369-369: Using .strip() with multi-character strings is misleading

(B005)


410-411: ⚠️ Potential issue

Fix misleading use of lstrip

Same issue with lstrip again.

-    available_flags = flags.lstrip("Flags:")
+    prefix = "Flags:"
+    available_flags = flags[len(prefix):] if flags.startswith(prefix) else flags

406-407: ⚠️ Potential issue

Fix misleading use of lstrip

Another instance of the same lstrip issue.

-    available_commmands = commands.lstrip("Available Commands:")
+    prefix = "Available Commands:"
+    available_commmands = commands[len(prefix):] if commands.startswith(prefix) else commands
    new_commands = extract_new_commands(available_commmands, False)
🧰 Tools
🪛 Ruff (0.8.2)

406-406: Using .strip() with multi-character strings is misleading

(B005)

tools/property-extractor/transformers.py (2)

227-231: ⚠️ Potential issue

Fix regex pattern for string lists

The current regex pattern for string lists is missing the closing brace in the pattern, which means it will never match string lists correctly.

-        elif re.search("^{[^:]+$", default):  # string lists
+        elif re.search(r"^\{[^:]+\}$", default):  # string lists

221-224: ⚠️ Potential issue

Fix incorrect numeric parsing & regex errors

The current implementation has two issues:

  1. str.replace() is used with regex patterns, but it treats the pattern literally, which won't work for numeric parsing
  2. The float regex pattern is missing the optional group for decimals
-        elif re.search("^-?[0-9][0-9']*$", default):  # integers
-            property["default"] = int(default.replace("[^0-9-]", ""))
-        elif re.search(r"^-?[0-9]+(\.[0-9]+)$", default):  # floats
-            property["default"] = float(default.replace("[^0-9]", ""))
+        elif re.search(r"^-?[0-9][0-9']*$", default):  # integers
+            property["default"] = int(re.sub(r"[^0-9-]", "", default))
+        elif re.search(r"^-?[0-9]+(\.[0-9]+)?$", default):  # floats
+            property["default"] = float(re.sub(r"[^0-9.\-]", "", default))
🧹 Nitpick comments (21)
cli-utils/add-caret-external-links.py (2)

44-45: Remove duplicated comment.

There's a duplicated comment "# List of excluded file paths (relative paths)" on consecutive lines.

# List of excluded file paths (relative paths)
-# List of excluded file paths (relative paths)

65-66: Add Python main guard pattern.

Add a main guard to follow Python best practices and make the script more reusable.

# Call the function with the constructed directory path
-process_directory(directory_path)
+if __name__ == "__main__":
+    process_directory(directory_path)
cli-utils/install-test-dependencies.sh (2)

37-37: Remove unused variable.

The missing_deps variable is defined but never used elsewhere in the script.

-        missing_deps=1
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 37-37: missing_deps appears unused. Verify use (or export if used externally).

(SC2034)


99-99: Remove unused variable.

The ARCH variable is defined but never used in the script.

-ARCH="$(uname -m)"
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 99-99: ARCH appears unused. Verify use (or export if used externally).

(SC2034)

cli-utils/start-cluster.sh (2)

19-21: Improve version extraction robustness.

The current sed pattern might not handle all possible tag formats. Consider a more robust approach.

if [[ "$TAG" != "latest" ]]; then
-  MAJOR_MINOR="$(echo "$TAG" | sed -E 's/^v?([0-9]+\.[0-9]+).*$/\1/')"
+  # Extract major.minor from version tags like v23.1.2, 23.1.2-rc1, etc.
+  if [[ "$TAG" =~ ^v?([0-9]+\.[0-9]+) ]]; then
+    MAJOR_MINOR="${BASH_REMATCH[1]}"
+  else
+    echo "⚠️ Warning: Unexpected version format: $TAG. Using 'latest' instead."
+    MAJOR_MINOR="latest"
+  fi
fi

49-49: Ensure compatibility with older Docker versions.

The script uses docker compose (without hyphen) which requires Docker Compose V2. For better compatibility, consider checking if V2 is available and fall back to V1 if needed.

echo "▶️  Starting Redpanda cluster…"
-docker compose up -d
+# Check if docker compose V2 is available, otherwise use docker-compose V1
+if docker compose version >/dev/null 2>&1; then
+  docker compose up -d
+else
+  docker-compose up -d
+fi
cli-utils/generate-cluster-docs.sh (1)

68-68: Ensure Docker Compose compatibility with older versions.

As with start-cluster.sh, ensure compatibility with both Docker Compose V1 and V2.

cd "$SCRIPT_DIR"/../docker-compose
-docker compose down --volumes
+# Check if docker compose V2 is available, otherwise use docker-compose V1
+if docker compose version >/dev/null 2>&1; then
+  docker compose down --volumes
+else
+  docker-compose down --volumes
+fi
tools/property-extractor/parser.py (1)

194-194: Optimize dictionary key lookup

The current code uses header_properties.keys() to check key presence, but Python dictionaries support direct key lookup which is more efficient.

-    for key in header_properties.keys():
+    for key in header_properties:
🧰 Tools
🪛 Ruff (0.8.2)

194-194: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

tools/metrics/metrics.py (3)

43-44: Redundant imports inside function

The re and logging modules are already imported at the file level, so they don't need to be reimported inside the parse_metrics function.

def parse_metrics(metrics_text):
    """
    Parse Prometheus exposition text into a structured dict of:
      metric_name → { description, type, labels: [<label_keys>] }
    Robust against any ordering of # HELP/# TYPE and handles samples
    both with and without labels.
    """
-    import re
-    import logging
🧰 Tools
🪛 Ruff (0.8.2)

43-43: Redefinition of unused re from line 4

Remove definition: re

(F811)


37-103: Add support for Prometheus UNIT metadata

The current metrics parser doesn't handle Prometheus UNIT metadata lines, which provide additional unit information for metrics.

Extend the parser to support UNIT metadata lines by adding:

  1. Update the metric dictionary structure to include a "unit" field:
        metrics[name] = {
            "description": desc,
            "type": t,
-           "labels": labels
+           "labels": labels,
+           "unit": None
        }
  1. Add UNIT metadata collection in the metadata extraction loop:
        elif line.startswith("# TYPE"):
            m = re.match(r"# TYPE\s+(\S+)\s+(\S+)", line)
            if m:
                name, t = m.group(1), m.group(2)
                meta.setdefault(name, {})['type'] = t
+       elif line.startswith("# UNIT"):
+           m = re.match(r"# UNIT\s+(\S+)\s+(.+)", line)
+           if m:
+               name, unit = m.group(1), m.group(2)
+               meta.setdefault(name, {})['unit'] = unit
  1. Include the unit in the final structure:
        desc = meta.get(name, {}).get("description")
        t    = meta.get(name, {}).get("type")
+       unit = meta.get(name, {}).get("unit")
        labels = sorted(label_map.get(name, []))

        # ... existing code ...

        metrics[name] = {
            "description": desc,
            "type": t,
+           "unit": unit,
            "labels": labels
        }
  1. Update the AsciiDoc output to include unit information:
        f.write(f"*Type*: {data['type']}")
+       if data.get("unit"):
+           f.write(f"\n\n*Unit*: {data['unit']}")
        if data["labels"]:
            # ... existing code ...
🧰 Tools
🪛 Ruff (0.8.2)

43-43: Redefinition of unused re from line 4

Remove definition: re

(F811)


149-153: Improve error message when metrics endpoint is unavailable

The current error message is generic and doesn't provide information about possible causes or troubleshooting steps.

    if not metrics_text:
-        logging.error("No public metrics retrieved. Exiting.")
+        logging.error("No public metrics retrieved. Exiting. Check if Redpanda is running and the metrics endpoint is accessible at http://localhost:19644/public_metrics/")
        sys.exit(1)
tools/gen-rpk-ascii.py (2)

393-428: Improve command processing loop and progress reporting

The current implementation uses a manual counter and modulo operation for progress reporting. Using enumerate would be more Pythonic, and the progress reporting could be more informative.

-quantity =0
-
-for command in it_commands:
-    quantity+=1
+total_commands = len(it_commands)
+for i, command in enumerate(it_commands, 1):
     
     result = execute_process([command])
     executed_command = "rpk " + command

     # ... existing code ...
     
-    index = it_commands.index(command) + 1
+    index = i
     for new_command in new_commands:
         it_commands.insert(index, command + " " + new_command)
         index += 1

-    if(quantity%20==0):
-        print(f"{quantity}/{len(it_commands)} files written in disk.")
+    if i % 20 == 0:
+        print(f"Progress: {i}/{total_commands} ({i/total_commands:.1%}) files written to disk.")
🧰 Tools
🪛 Ruff (0.8.2)

393-393: Use enumerate() for index variable quantity in for loop

(SIM113)


406-406: Using .strip() with multi-character strings is misleading

(B005)


416-416: Statement ends with an unnecessary semicolon

Remove unnecessary semicolon

(E703)


416-416: Remove unnecessary semicolon

The line ends with a semicolon, which is unnecessary in Python.

-    cmd_dict = build_dict(cmd_dict, executed_command, explanation, usage, it_flags, flag_list);
+    cmd_dict = build_dict(cmd_dict, executed_command, explanation, usage, it_flags, flag_list)
🧰 Tools
🪛 Ruff (0.8.2)

416-416: Statement ends with an unnecessary semicolon

Remove unnecessary semicolon

(E703)

tools/property-extractor/transformers.py (8)

69-71: Simplify the boolean expression

You can make this code more concise by using the bool() function directly.

-    def accepts(self, info, file_pair):
-        return (
-            True
-            if len(info["params"]) > 2 and "visibility" in info["params"][2]["value"]
-            else False
-        )
+    def accepts(self, info, file_pair):
+        return bool(
+            len(info["params"]) > 2 and "visibility" in info["params"][2]["value"]
+        )
🧰 Tools
🪛 Ruff (0.8.2)

69-71: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


141-143: Simplify the boolean expression

You can make this code more concise by using the bool() function directly.

-    def accepts(self, info, file_pair):
-        return (
-            True
-            if len(info["params"]) > 2 and "secret" in info["params"][2]["value"]
-            else False
-        )
+    def accepts(self, info, file_pair):
+        return bool(
+            len(info["params"]) > 2 and "secret" in info["params"][2]["value"]
+        )
🧰 Tools
🪛 Ruff (0.8.2)

141-143: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


226-226: Simplify boolean expression

You can replace the ternary expression with a direct equality check.

-            property["default"] = True if default == "true" else False
+            property["default"] = default == "true"
🧰 Tools
🪛 Ruff (0.8.2)

226-226: Remove unnecessary True if ... else False

Remove unnecessary True if ... else False

(SIM210)


234-234: Use raw string for regex pattern

For consistency with other regex patterns in the file, use an r prefix to denote a raw string.

-            matches = re.search("^([0-9]+)_(.)iB$", default)
+            matches = re.search(r"^([0-9]+)_(.)iB$", default)

252-256: Remove redundant conditional and duplicate assignments

This conditional block has identical code in both branches, making it redundant.

-                # For durations, enums, or other default initializations.
-                if not re.search("([0-9]|::|\\()", default):
-                    property["default"] = default
-                else:
-                    property["default"] = default
+                # For durations, enums, or other default initializations.
+                property["default"] = default

312-314: Add missing return statement in parse method

For consistency with other transformers, the parse method should return the property after setting the experimental flag.

-    def parse(self, property, info, file_pair):
-        property["is_experimental_property"] = True
+    def parse(self, property, info, file_pair):
+        property["is_experimental_property"] = True
+        return property

334-335: Add missing return statement

For consistency with other transformers, return the property object after setting the aliases.

-        property['aliases'] = aliases
+        property['aliases'] = aliases
+        return property

363-381: Handle early return and add missing return statement

The parse method should return the property even when modifying meta parameters. This ensures consistency with other transformers and makes it clear that the property is being returned.

-        if 'params' not in info or info['params'] is None:
-            return
+        if 'params' not in info or info['params'] is None:
+            return property

         iterable_params = info['params']
         for param in iterable_params:
             if isinstance(param['value'], str) and param['value'].startswith("meta{"):
                 meta_content = param['value'].strip("meta{ }").strip()
                 meta_dict = {}
                 for item in meta_content.split(','):
                     item = item.strip()
                     if '=' in item:
                         key, value = item.split('=')
                         meta_dict[key.strip().replace('.', '')] = value.strip()
                         meta_dict['type'] = 'initializer_list'  # Enforce required type
                 param['value'] = meta_dict
+        return property
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c839f7b and 9102dff.

📒 Files selected for processing (18)
  • bin/doc-tools.js (1 hunks)
  • cli-utils/add-caret-external-links.py (1 hunks)
  • cli-utils/beta-from-antora.js (1 hunks)
  • cli-utils/generate-cluster-docs.sh (1 hunks)
  • cli-utils/install-test-dependencies.sh (1 hunks)
  • cli-utils/python-venv.sh (1 hunks)
  • cli-utils/start-cluster.sh (1 hunks)
  • macros/data-template.js (1 hunks)
  • package.json (4 hunks)
  • tools/docusaurus-to-antora-conversion-scripts/pre-process-markdown.js (1 hunks)
  • tools/gen-rpk-ascii.py (1 hunks)
  • tools/get-console-version.js (1 hunks)
  • tools/get-redpanda-version.js (1 hunks)
  • tools/metrics/metrics.py (1 hunks)
  • tools/metrics/requirements.txt (1 hunks)
  • tools/property-extractor/definitions.json (1 hunks)
  • tools/property-extractor/parser.py (1 hunks)
  • tools/property-extractor/transformers.py (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • macros/data-template.js
  • cli-utils/beta-from-antora.js
  • cli-utils/python-venv.sh
🚧 Files skipped from review as they are similar to previous changes (6)
  • tools/metrics/requirements.txt
  • tools/get-redpanda-version.js
  • tools/property-extractor/definitions.json
  • tools/get-console-version.js
  • package.json
  • tools/docusaurus-to-antora-conversion-scripts/pre-process-markdown.js
🧰 Additional context used
🧬 Code Graph Analysis (2)
tools/property-extractor/parser.py (3)
tools/property-extractor/file_pair.py (1)
  • FilePair (1-7)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/transformers.py (16)
  • parse (10-17)
  • parse (24-35)
  • parse (45-50)
  • parse (57-63)
  • parse (74-77)
  • parse (120-122)
  • parse (133-135)
  • parse (146-148)
  • parse (159-175)
  • parse (185-203)
  • parse (211-256)
  • parse (272-305)
  • parse (312-313)
  • parse (326-334)
  • parse (341-346)
  • parse (362-380)
tools/property-extractor/transformers.py (2)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/parser.py (1)
  • normalize_string (87-88)
🪛 Ruff (0.8.2)
cli-utils/add-caret-external-links.py

28-28: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


28-28: SyntaxError: Simple statements must be separated by newlines or semicolons


28-28: SyntaxError: Expected a statement


28-29: SyntaxError: Expected an expression


30-30: SyntaxError: Invalid annotated assignment target


30-30: SyntaxError: Expected an identifier, but found a keyword 'try' that cannot be used here


30-31: SyntaxError: Expected an expression


31-31: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


31-31: SyntaxError: Simple statements must be separated by newlines or semicolons


31-31: SyntaxError: Expected a statement


31-32: SyntaxError: Expected an expression


33-33: SyntaxError: Expected an identifier, but found a keyword 'except' that cannot be used here


33-33: SyntaxError: Simple statements must be separated by newlines or semicolons


33-33: SyntaxError: Expected a statement


33-34: SyntaxError: Expected an expression


35-35: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


35-35: SyntaxError: Simple statements must be separated by newlines or semicolons


36-36: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


36-36: SyntaxError: Simple statements must be separated by newlines or semicolons

tools/property-extractor/parser.py

194-194: Use key in dict instead of key in dict.keys()

Remove .keys()

(SIM118)

tools/gen-rpk-ascii.py

142-142: Using .strip() with multi-character strings is misleading

(B005)


369-369: Using .strip() with multi-character strings is misleading

(B005)


393-393: Use enumerate() for index variable quantity in for loop

(SIM113)


406-406: Using .strip() with multi-character strings is misleading

(B005)


416-416: Statement ends with an unnecessary semicolon

Remove unnecessary semicolon

(E703)

tools/property-extractor/transformers.py

69-71: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


141-143: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


226-226: Remove unnecessary True if ... else False

Remove unnecessary True if ... else False

(SIM210)

tools/metrics/metrics.py

4-4: re imported but unused

Remove unused import: re

(F401)


43-43: Redefinition of unused re from line 4

Remove definition: re

(F811)

🪛 Shellcheck (0.10.0)
cli-utils/install-test-dependencies.sh

[warning] 37-37: missing_deps appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 99-99: ARCH appears unused. Verify use (or export if used externally).

(SC2034)


[warning] 149-149: ShellCheck can't follow non-constant source. Use a directive to specify location.

(SC1090)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-extensions-and-macros
  • GitHub Check: Header rules - docs-extensions-and-macros
  • GitHub Check: Pages changed - docs-extensions-and-macros
🔇 Additional comments (1)
cli-utils/add-caret-external-links.py (1)

28-36: Good addition of error handling.

The try-except block is a good improvement to handle potential file write errors gracefully.

🧰 Tools
🪛 Ruff (0.8.2)

28-28: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


28-28: SyntaxError: Simple statements must be separated by newlines or semicolons


28-28: SyntaxError: Expected a statement


28-29: SyntaxError: Expected an expression


30-30: SyntaxError: Invalid annotated assignment target


30-30: SyntaxError: Expected an identifier, but found a keyword 'try' that cannot be used here


30-31: SyntaxError: Expected an expression


31-31: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


31-31: SyntaxError: Simple statements must be separated by newlines or semicolons


31-31: SyntaxError: Expected a statement


31-32: SyntaxError: Expected an expression


33-33: SyntaxError: Expected an identifier, but found a keyword 'except' that cannot be used here


33-33: SyntaxError: Simple statements must be separated by newlines or semicolons


33-33: SyntaxError: Expected a statement


33-34: SyntaxError: Expected an expression


35-35: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


35-35: SyntaxError: Simple statements must be separated by newlines or semicolons


36-36: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


36-36: SyntaxError: Simple statements must be separated by newlines or semicolons

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

♻️ Duplicate comments (4)
docker-compose/docker-compose.yml (1)

42-43: Duplicate: Remove hard-coded bootstrap credentials
As noted in a previous review, storing RP_BOOTSTRAP_USER: "superuser:secretpassword" in the compose file exposes credentials in plain text. Please switch to referencing an environment variable or Docker secret, for example:

environment:
  RP_BOOTSTRAP_USER: ${RP_BOOTSTRAP_USER:?err}

Also applies to: 79-80, 111-112

tools/property-extractor/property_extractor.py (3)

14-14: 🛠️ Refactor suggestion

Replace wildcard import with explicit imports

Wildcard imports (from transformers import *) make it difficult to determine where symbols come from and can lead to namespace pollution. The static analyzer has flagged this with F403 and multiple F405 warnings.

Import specific transformer classes instead:

-from transformers import *
+from transformers import (
+    TypeTransformer,
+    EnterpriseTransformer,
+    MetaParamTransformer,
+    BasicInfoTransformer,
+    IsNullableTransformer,
+    IsArrayTransformer,
+    NeedsRestartTransformer,
+    VisibilityTransformer,
+    DeprecatedTransformer,
+    IsSecretTransformer,
+    NumericBoundsTransformer,
+    DurationBoundsTransformer,
+    SimpleDefaultValuesTransformer,
+    FriendlyDefaultTransformer,
+    ExperimentalTransformer,
+    AliasTransformer,
+)
🧰 Tools
🪛 Ruff (0.8.2)

14-14: from transformers import * used; unable to detect undefined names

(F403)


36-36: ⚠️ Potential issue

--recursive flag has no effect

Both branches of the conditional use path.rglob("*.h"), making the --recursive flag ineffective. The non-recursive branch should use path.glob("*.h") instead.

-    file_iter = path.rglob("*.h") if options.recursive else path.rglob("*.h")
+    file_iter = path.rglob("*.h") if options.recursive else path.glob("*.h")

90-90: 💡 Verification agent

🧩 Analysis chain

Fix comment about transformer order

The comment suggests EnterpriseTransformer must be first, but the transformer at index 0 is TypeTransformer. This order seems contradictory and could lead to runtime errors.

Based on the code from the transformers.py snippet, EnterpriseTransformer.accepts() checks for info.get('type'), which is set by TypeTransformer.parse(). Please verify the correct order by:

  1. Checking if EnterpriseTransformer accesses properties set by other transformers
  2. Testing whether the current order works correctly with real data

🏁 Script executed:

#!/bin/bash
# Check implementation of EnterpriseTransformer.accepts()
rg -A 5 "class EnterpriseTransformer" tools/property-extractor/transformers.py

# Check if type is used in acceptors
rg "type.*in .*acceptor" tools/property-extractor/transformers.py

Length of output: 376


Fix transformer order comment

EnterpriseTransformer.accepts() relies on info['type'] being set by TypeTransformer.parse(). Update the inline comment (and transformer list if needed) to reflect that EnterpriseTransformer must come after TypeTransformer, not first.

• File: tools/property-extractor/property_extractor.py
• Line: 90

-        EnterpriseTransformer(),  ## this must be the first, as it modifies current data
+        EnterpriseTransformer(),  ## must follow TypeTransformer—relies on info['type'] set by it
🧰 Tools
🪛 Ruff (0.8.2)

90-90: EnterpriseTransformer may be undefined, or defined from star imports

(F405)

🧹 Nitpick comments (12)
docker-compose/bootstrap.yml (1)

32-37: Consider externalizing MinIO credentials
The cloud_storage_access_key and cloud_storage_secret_key are hard-coded in this quickstart config. Even for development setups, it’s safer to allow users to override these via environment variables or Docker secrets to avoid committing credentials in source control.

docker-compose/docker-compose.yml (1)

363-366: Secure MinIO root credentials
The MinIO service’s MINIO_ROOT_USER and MINIO_ROOT_PASSWORD are also hard-coded. Consider externalizing these via environment variables or Docker secrets to prevent leaking credentials:

environment:
  MINIO_ROOT_USER: ${MINIO_ROOT_USER:?err}
  MINIO_ROOT_PASSWORD: ${MINIO_ROOT_PASSWORD:?err}
tools/metrics/metrics.py (7)

1-7: Fix duplicate and unused imports.

The module re is imported twice (lines 4 and 45), but it's actually only used inside the parse_metrics function.

 import os
 import sys
 import requests
-import re
 import json
 import logging
🧰 Tools
🪛 Ruff (0.8.2)

4-4: re imported but unused

Remove unused import: re

(F401)


8-22: Use triple double quotes for docstrings.

PEP 257 recommends using triple double quotes for docstrings. Using the standard format improves consistency and enables better tools support.

-'''
+"""
 ## How it works
 
 Fetches metrics from the brokers and parses out the `# HELP` lines, creating:
 
 - `metrics.json`
 - `metrics.adoc`
 
 These output files are stored in a versioned folder under `docs/autogenerated/<version>/metrics`, based on the major.minor version provided as the first argument.
 
 ## Prerequisites
 
 - **Python 3** & **pip** on your system.
 - A Redpanda cluster running with the `public_metrics` and `metrics` endpoints exposed at http://localhost:19644/
-'''
+"""

45-45: Remove redundant import.

The re module is already imported at the module level (though that import can be removed as suggested in a previous comment).

-    import re, logging
+    import logging
🧰 Tools
🪛 Ruff (0.8.2)

45-45: Redefinition of unused re from line 4

Remove definition: re

(F811)


122-123: Handle the case where metric type is None.

The mtype value could be None if a metric has no # TYPE entry (as indicated in lines 104-105), but there's no handling for this case when writing to the AsciiDoc file.

-            f.write(f"*Type*: {data['type']}")
+            type_value = data['type'] if data['type'] is not None else "unknown"
+            f.write(f"*Type*: {type_value}")

150-151: Consider using a more robust method to find the repository root.

Using the current working directory assumes the script is always run from the repository root. This could lead to issues if the script is run from a different directory.

     # Resolve the base autogenerated folder at the repo root
-    repo_root = os.getcwd()
+    # Find the repository root by looking for common markers
+    repo_root = os.getcwd()
+    while not os.path.isdir(os.path.join(repo_root, ".git")) and not os.path.isfile(os.path.join(repo_root, "package.json")):
+        parent = os.path.dirname(repo_root)
+        if parent == repo_root:  # Reached filesystem root
+            logging.warning("Could not find repository root, using current directory")
+            repo_root = os.getcwd()
+            break
+        repo_root = parent

166-167: Add logging for public metrics count.

It would be helpful to log the number of public metrics processed, similar to what is done in the parse_metrics function.

     public_metrics = parse_metrics(metrics_text)
+    logging.info(f"Processed {len(public_metrics)} public metrics.")

172-175: Add logging for internal metrics count.

It would be helpful to log the number of internal metrics processed, similar to what is done in the parse_metrics function.

         internal_metrics = parse_metrics(internal_metrics_text)
+        logging.info(f"Processed {len(internal_metrics)} internal metrics.")
     else:
         logging.error("No internal metrics retrieved.")
tools/property-extractor/property_extractor.py (3)

78-84: Simplify nested if statements

This code uses nested if statements that can be combined to improve readability.

-        # Check if any of the paths are in fp.implementation
-        if any(path in implementation_value for path in file_paths):
-            if len(properties) > 0:
-                files_with_properties.append((fp, properties))
+        # Check if any of the paths are in fp.implementation and properties exist
+        if any(path in implementation_value for path in file_paths) and len(properties) > 0:
+            files_with_properties.append((fp, properties))
🧰 Tools
🪛 Ruff (0.8.2)

78-79: Use a single if statement instead of nested if statements

(SIM102)


204-206: Add more descriptive error message for missing parser.c

The error message could provide more helpful instructions on how to fix the missing parser.c issue.

     if not os.path.exists(os.path.join(treesitter_dir, "src/parser.c")):
-        logging.error("Missing parser.c. Ensure Tree-sitter submodules are initialized.")
+        logging.error("Missing parser.c. Ensure Tree-sitter submodules are initialized with 'git submodule update --init --recursive'.")
         sys.exit(1)

127-127: Improve documentation with Markdown syntax

Long comments like this would be more readable in Python docstring format using Markdown syntax.

-# The definitions.json file contains type definitions that the extractor uses to standardize and centralize type information. After extracting and transforming the properties from the source code, the function merge_properties_and_definitions looks up each property's type in the definitions. If a property's type (or the type of its items, in the case of arrays) matches one of the definitions, the transformer replaces that type with a JSON pointer ( such as #/definitions/<type>) to the corresponding entry in definitions.json. The final JSON output then includes both a properties section (with types now referencing the definitions) and a definitions section, so that consumers of the output can easily resolve the full type information.
+def merge_properties_and_definitions(properties, definitions):
+    """
+    Merge extracted properties with type definitions from definitions.json.
+    
+    ## Purpose
+    
+    The definitions.json file contains type definitions that the extractor uses to standardize 
+    and centralize type information. After extracting and transforming the properties from the source 
+    code, this function:
+    
+    1. Looks up each property's type in the definitions
+    2. If a property's type (or the type of its items, in the case of arrays) matches one of the definitions,
+       replaces that type with a JSON pointer (such as #/definitions/<type>) to the corresponding entry
+    3. The final JSON output includes both a properties section (with types now referencing the definitions)
+       and a definitions section
+    
+    This enables consumers of the output to easily resolve the full type information.
+    """
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9102dff and ef814d4.

📒 Files selected for processing (4)
  • docker-compose/bootstrap.yml (1 hunks)
  • docker-compose/docker-compose.yml (1 hunks)
  • tools/metrics/metrics.py (1 hunks)
  • tools/property-extractor/property_extractor.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tools/property-extractor/property_extractor.py (4)
tools/property-extractor/file_pair.py (1)
  • FilePair (1-7)
tools/property-extractor/parser.py (2)
  • build_treesitter_cpp_library (223-224)
  • extract_properties_from_file_pair (203-220)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/transformers.py (48)
  • TypeTransformer (80-122)
  • EnterpriseTransformer (337-346)
  • MetaParamTransformer (349-380)
  • BasicInfoTransformer (6-17)
  • IsNullableTransformer (20-35)
  • IsArrayTransformer (38-50)
  • NeedsRestartTransformer (53-63)
  • VisibilityTransformer (66-77)
  • DeprecatedTransformer (125-135)
  • IsSecretTransformer (138-148)
  • NumericBoundsTransformer (151-175)
  • DurationBoundsTransformer (178-203)
  • SimpleDefaultValuesTransformer (206-256)
  • FriendlyDefaultTransformer (259-305)
  • ExperimentalTransformer (308-313)
  • AliasTransformer (316-334)
  • accepts (7-8)
  • accepts (21-22)
  • accepts (42-43)
  • accepts (54-55)
  • accepts (67-72)
  • accepts (81-82)
  • accepts (126-131)
  • accepts (139-144)
  • accepts (155-157)
  • accepts (182-183)
  • accepts (207-209)
  • accepts (269-270)
  • accepts (309-311)
  • accepts (317-324)
  • accepts (338-339)
  • accepts (350-360)
  • parse (10-17)
  • parse (24-35)
  • parse (45-50)
  • parse (57-63)
  • parse (74-77)
  • parse (120-122)
  • parse (133-135)
  • parse (146-148)
  • parse (159-175)
  • parse (185-203)
  • parse (211-256)
  • parse (272-305)
  • parse (312-313)
  • parse (326-334)
  • parse (341-346)
  • parse (362-380)
🪛 YAMLlint (1.35.1)
docker-compose/docker-compose.yml

[warning] 317-317: wrong indentation: expected 6 but found 4

(indentation)


[error] 413-413: no new line character at the end of file

(new-line-at-end-of-file)

docker-compose/bootstrap.yml

[error] 65-65: no new line character at the end of file

(new-line-at-end-of-file)

🪛 Ruff (0.8.2)
tools/metrics/metrics.py

4-4: re imported but unused

Remove unused import: re

(F401)


45-45: Redefinition of unused re from line 4

Remove definition: re

(F811)

tools/property-extractor/property_extractor.py

14-14: from transformers import * used; unable to detect undefined names

(F403)


78-79: Use a single if statement instead of nested if statements

(SIM102)


89-89: TypeTransformer may be undefined, or defined from star imports

(F405)


90-90: EnterpriseTransformer may be undefined, or defined from star imports

(F405)


91-91: MetaParamTransformer may be undefined, or defined from star imports

(F405)


92-92: BasicInfoTransformer may be undefined, or defined from star imports

(F405)


93-93: IsNullableTransformer may be undefined, or defined from star imports

(F405)


94-94: IsArrayTransformer may be undefined, or defined from star imports

(F405)


94-94: type_transformer may be undefined, or defined from star imports

(F405)


95-95: NeedsRestartTransformer may be undefined, or defined from star imports

(F405)


96-96: VisibilityTransformer may be undefined, or defined from star imports

(F405)


97-97: DeprecatedTransformer may be undefined, or defined from star imports

(F405)


98-98: IsSecretTransformer may be undefined, or defined from star imports

(F405)


99-99: NumericBoundsTransformer may be undefined, or defined from star imports

(F405)


99-99: type_transformer may be undefined, or defined from star imports

(F405)


100-100: DurationBoundsTransformer may be undefined, or defined from star imports

(F405)


100-100: type_transformer may be undefined, or defined from star imports

(F405)


101-101: SimpleDefaultValuesTransformer may be undefined, or defined from star imports

(F405)


102-102: FriendlyDefaultTransformer may be undefined, or defined from star imports

(F405)


103-103: ExperimentalTransformer may be undefined, or defined from star imports

(F405)


104-104: AliasTransformer may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-extensions-and-macros
  • GitHub Check: Header rules - docs-extensions-and-macros
  • GitHub Check: Pages changed - docs-extensions-and-macros

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (7)
tools/property-extractor/property_extractor.py (4)

36-36: ⚠️ Potential issue

--recursive flag has no effect because both branches use the same method.

The recursive option doesn't change the behavior because path.rglob() is used in both cases.

-    file_iter = path.rglob("*.h") if options.recursive else path.rglob("*.h")
+    file_iter = path.rglob("*.h") if options.recursive else path.glob("*.h")

88-106: ⚠️ Potential issue

Fix transformer order and undefined variable issues.

There are several issues with the transformer initialization:

  1. The comment says EnterpriseTransformer must be first, but it's listed second.
  2. type_transformer is defined on line 88 but a new instance is created on line 90.
  3. The same variable is used in multiple places.

Apply this fix:

     type_transformer = TypeTransformer()
     transformers = [
-        TypeTransformer(),
-        EnterpriseTransformer(), ## this must be the first, as it modifies current data
+        EnterpriseTransformer(), ## this must be the first, as it modifies current data
+        type_transformer,        # reuse existing instance
         MetaParamTransformer(),
         BasicInfoTransformer(),
         IsNullableTransformer(),
         IsArrayTransformer(type_transformer),
         NeedsRestartTransformer(),
         VisibilityTransformer(),
         DeprecatedTransformer(),
         IsSecretTransformer(),
         NumericBoundsTransformer(type_transformer),
         DurationBoundsTransformer(type_transformer),
         SimpleDefaultValuesTransformer(),
         FriendlyDefaultTransformer(),
         ExperimentalTransformer(),
         AliasTransformer(),
     ]
🧰 Tools
🪛 Ruff (0.8.2)

88-88: TypeTransformer may be undefined, or defined from star imports

(F405)


90-90: TypeTransformer may be undefined, or defined from star imports

(F405)


91-91: EnterpriseTransformer may be undefined, or defined from star imports

(F405)


92-92: MetaParamTransformer may be undefined, or defined from star imports

(F405)


93-93: BasicInfoTransformer may be undefined, or defined from star imports

(F405)


94-94: IsNullableTransformer may be undefined, or defined from star imports

(F405)


95-95: IsArrayTransformer may be undefined, or defined from star imports

(F405)


96-96: NeedsRestartTransformer may be undefined, or defined from star imports

(F405)


97-97: VisibilityTransformer may be undefined, or defined from star imports

(F405)


98-98: DeprecatedTransformer may be undefined, or defined from star imports

(F405)


99-99: IsSecretTransformer may be undefined, or defined from star imports

(F405)


100-100: NumericBoundsTransformer may be undefined, or defined from star imports

(F405)


101-101: DurationBoundsTransformer may be undefined, or defined from star imports

(F405)


102-102: SimpleDefaultValuesTransformer may be undefined, or defined from star imports

(F405)


103-103: FriendlyDefaultTransformer may be undefined, or defined from star imports

(F405)


104-104: ExperimentalTransformer may be undefined, or defined from star imports

(F405)


105-105: AliasTransformer may be undefined, or defined from star imports

(F405)


199-200: 🛠️ Refactor suggestion

Add error handling for JSON parsing.

JSON parsing could fail if the definitions file is malformed, but there's no error handling.

     if options.definitions:
-        with open(options.definitions) as json_file:
-            definitions = json.load(json_file)
+        try:
+            with open(options.definitions) as json_file:
+                definitions = json.load(json_file)
+        except json.JSONDecodeError as e:
+            logging.error(f"Failed to parse definitions file: {e}")
+            sys.exit(1)

14-14: 🛠️ Refactor suggestion

Avoid wildcard imports to enhance code maintainability and readability.

Using from transformers import * makes it difficult to track which symbols are imported and can lead to naming conflicts. This is also flagged by Ruff (F403).

Import specific transformer classes explicitly:

-from transformers import *
+from transformers import (
+    TypeTransformer,
+    EnterpriseTransformer,
+    MetaParamTransformer,
+    BasicInfoTransformer,
+    IsNullableTransformer,
+    IsArrayTransformer,
+    NeedsRestartTransformer,
+    VisibilityTransformer,
+    DeprecatedTransformer,
+    IsSecretTransformer,
+    NumericBoundsTransformer,
+    DurationBoundsTransformer,
+    SimpleDefaultValuesTransformer,
+    FriendlyDefaultTransformer,
+    ExperimentalTransformer,
+    AliasTransformer,
+)
🧰 Tools
🪛 Ruff (0.8.2)

14-14: from transformers import * used; unable to detect undefined names

(F403)

tools/property-extractor/transformers.py (3)

221-224: ⚠️ Potential issue

Fix incorrect numeric parsing in SimpleDefaultValuesTransformer.

The code uses str.replace() with regex patterns, which doesn't work as expected. str.replace() performs literal replacement rather than regex replacement.

-        elif re.search("^-?[0-9][0-9']*$", default):  # integers
-            property["default"] = int(default.replace("[^0-9-]", ""))
-        elif re.search(r"^-?[0-9]+(\.[0-9]+)$", default):  # floats
-            property["default"] = float(default.replace("[^0-9]", ""))
+        elif re.search(r"^-?[0-9][0-9']*$", default):  # integers (allow digit group separators)
+            property["default"] = int(re.sub(r"[^0-9-]", "", default))
+        elif re.search(r"^-?[0-9]+(\.[0-9]+)?$", default):  # floats
+            property["default"] = float(re.sub(r"[^0-9.\-]", "", default))

227-231: ⚠️ Potential issue

Fix malformed regex for string lists.

The regex pattern for detecting string lists is missing the closing brace, which means it will never match expected string lists like {a,b,c}.

-        elif re.search("^{[^:]+$", default):  # string lists
+        elif re.search(r"^\{[^:]+\}$", default):  # string lists
             property["default"] = [
                 normalize_string(s)
                 for s in re.sub("{([^}]+)}", "\\1", default).split(",")
             ]

340-346: 🛠️ Refactor suggestion

Document side effects and ensure consistent return in EnterpriseTransformer.

This transformer modifies the info parameter by deleting an element, which could have unexpected side effects for other transformers that process the same information later.

     def parse(self, property, info, file_pair):
         if info['params'] is not None:
             enterpriseValue = info['params'][0]['value']
             property['enterprise_value'] = enterpriseValue
             property['is_enterprise'] = True
+            # Note: We're modifying the input info object which could affect other transformers
             del info['params'][0]
             return property
+        return property
🧹 Nitpick comments (5)
tools/property-extractor/property_extractor.py (1)

78-80: Simplify nested if statements to a single condition.

The nested if statements can be combined into a single statement for better readability.

-    # Check if any of the paths are in fp.implementation
-    if any(path in implementation_value for path in file_paths):
-        if len(properties) > 0:
-            files_with_properties.append((fp, properties))
+    # Check if implementation is relevant and contains properties
+    if any(path in implementation_value for path in file_paths) and len(properties) > 0:
+        files_with_properties.append((fp, properties))
🧰 Tools
🪛 Ruff (0.8.2)

78-79: Use a single if statement instead of nested if statements

(SIM102)

tools/property-extractor/transformers.py (4)

69-72: Simplify boolean return expression.

The current conditional expression is unnecessarily verbose and can be simplified.

-    return (
-        True
-        if len(info["params"]) > 2 and "visibility" in info["params"][2]["value"]
-        else False
-    )
+    return len(info["params"]) > 2 and "visibility" in info["params"][2]["value"]
🧰 Tools
🪛 Ruff (0.8.2)

69-71: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


141-144: Simplify boolean return expression.

Similar to the previous issue, this conditional expression can be simplified.

-    return (
-        True
-        if len(info["params"]) > 2 and "secret" in info["params"][2]["value"]
-        else False
-    )
+    return len(info["params"]) > 2 and "secret" in info["params"][2]["value"]
🧰 Tools
🪛 Ruff (0.8.2)

141-143: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


226-226: Simplify boolean conversion expression.

This conditional expression can be written more concisely.

-        property["default"] = True if default == "true" else False
+        property["default"] = default == "true"
🧰 Tools
🪛 Ruff (0.8.2)

226-226: Remove unnecessary True if ... else False

Remove unnecessary True if ... else False

(SIM210)


253-256: Remove redundant conditional with identical branches.

Both branches set the same value, making the conditional redundant.

-                # For durations, enums, or other default initializations.
-                if not re.search("([0-9]|::|\\()", default):
-                    property["default"] = default
-                else:
-                    property["default"] = default
+                # For durations, enums, or other default initializations.
+                property["default"] = default
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ef814d4 and bc2c48d.

📒 Files selected for processing (2)
  • tools/property-extractor/property_extractor.py (1 hunks)
  • tools/property-extractor/transformers.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
tools/property-extractor/transformers.py (2)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/parser.py (1)
  • normalize_string (87-88)
tools/property-extractor/property_extractor.py (4)
tools/property-extractor/file_pair.py (1)
  • FilePair (1-7)
tools/property-extractor/parser.py (2)
  • build_treesitter_cpp_library (223-224)
  • extract_properties_from_file_pair (203-220)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/transformers.py (48)
  • TypeTransformer (80-122)
  • EnterpriseTransformer (336-346)
  • MetaParamTransformer (349-380)
  • BasicInfoTransformer (6-17)
  • IsNullableTransformer (20-35)
  • IsArrayTransformer (38-50)
  • NeedsRestartTransformer (53-63)
  • VisibilityTransformer (66-77)
  • DeprecatedTransformer (125-135)
  • IsSecretTransformer (138-148)
  • NumericBoundsTransformer (151-175)
  • DurationBoundsTransformer (178-203)
  • SimpleDefaultValuesTransformer (206-256)
  • FriendlyDefaultTransformer (259-305)
  • ExperimentalTransformer (308-312)
  • AliasTransformer (315-333)
  • accepts (7-8)
  • accepts (21-22)
  • accepts (42-43)
  • accepts (54-55)
  • accepts (67-72)
  • accepts (81-82)
  • accepts (126-131)
  • accepts (139-144)
  • accepts (155-157)
  • accepts (182-183)
  • accepts (207-209)
  • accepts (269-270)
  • accepts (309-310)
  • accepts (316-323)
  • accepts (337-338)
  • accepts (350-360)
  • parse (10-17)
  • parse (24-35)
  • parse (45-50)
  • parse (57-63)
  • parse (74-77)
  • parse (120-122)
  • parse (133-135)
  • parse (146-148)
  • parse (159-175)
  • parse (185-203)
  • parse (211-256)
  • parse (272-305)
  • parse (311-312)
  • parse (325-333)
  • parse (340-346)
  • parse (362-380)
🪛 Ruff (0.8.2)
tools/property-extractor/transformers.py

69-71: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


141-143: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


226-226: Remove unnecessary True if ... else False

Remove unnecessary True if ... else False

(SIM210)

tools/property-extractor/property_extractor.py

14-14: from transformers import * used; unable to detect undefined names

(F403)


78-79: Use a single if statement instead of nested if statements

(SIM102)


88-88: TypeTransformer may be undefined, or defined from star imports

(F405)


90-90: TypeTransformer may be undefined, or defined from star imports

(F405)


91-91: EnterpriseTransformer may be undefined, or defined from star imports

(F405)


92-92: MetaParamTransformer may be undefined, or defined from star imports

(F405)


93-93: BasicInfoTransformer may be undefined, or defined from star imports

(F405)


94-94: IsNullableTransformer may be undefined, or defined from star imports

(F405)


95-95: IsArrayTransformer may be undefined, or defined from star imports

(F405)


96-96: NeedsRestartTransformer may be undefined, or defined from star imports

(F405)


97-97: VisibilityTransformer may be undefined, or defined from star imports

(F405)


98-98: DeprecatedTransformer may be undefined, or defined from star imports

(F405)


99-99: IsSecretTransformer may be undefined, or defined from star imports

(F405)


100-100: NumericBoundsTransformer may be undefined, or defined from star imports

(F405)


101-101: DurationBoundsTransformer may be undefined, or defined from star imports

(F405)


102-102: SimpleDefaultValuesTransformer may be undefined, or defined from star imports

(F405)


103-103: FriendlyDefaultTransformer may be undefined, or defined from star imports

(F405)


104-104: ExperimentalTransformer may be undefined, or defined from star imports

(F405)


105-105: AliasTransformer may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-extensions-and-macros
  • GitHub Check: Header rules - docs-extensions-and-macros
  • GitHub Check: Pages changed - docs-extensions-and-macros

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh (1)

28-37: 🛠️ Refactor suggestion

Avoid in-place modifications and SC2155 “masking” – copy the file into a temp buffer first
remove_leading_tabs() still rewrites the original file. If perl/sed fail midway you lose the source. In addition, assigning and using the command substitution on the same line (local content="$(cat "$mdx_file")") triggers Shellcheck SC2155.

-function remove_leading_tabs() {
-  local mdx_file="$1"
-  local content="$(cat "$mdx_file")"
+function remove_leading_tabs() {
+  local mdx_file="$1"
+  local content
+  content="$(cat "$mdx_file")"
@@
-  echo "$updated_content" > "$mdx_file"
+  local tmp_file
+  tmp_file=$(mktemp)
+  echo "$updated_content" > "$tmp_file" || return 1
+  mv -- "$tmp_file" "$mdx_file"
 }
🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 30-30: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 33-33: Declare and assign separately to avoid masking return values.

(SC2155)

tools/gen-rpk-ascii.py (1)

378-379: lstrip("Available Commands:") still deletes characters, not the phrase
This was pointed out earlier; the bug remains. Use removeprefix() or slicing.

🧰 Tools
🪛 Ruff (0.8.2)

378-378: Using .strip() with multi-character strings is misleading

(B005)

cli-utils/add-caret-external-links.py (1)

62-65: Properly handling the return value from process_file.

The code now correctly checks the return value from the process_file function and provides appropriate feedback based on the result, which aligns with best practices for error handling.

🧹 Nitpick comments (11)
tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh (1)

30-60: SC2155 warnings: declare and assign separately
Several local foo="$(cmd)" one-liners (lines 30, 33, 54, 56, 60, 63) shadow $?. Split declaration and assignment to silence the warning and keep exit-status intact.

🧰 Tools
🪛 Shellcheck (0.10.0)

[warning] 30-30: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 33-33: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 54-54: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 56-56: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 60-60: Declare and assign separately to avoid masking return values.

(SC2155)

tools/metrics/metrics.py (3)

1-6: Remove the unused top-level re import to avoid F401 / F811 conflicts
You re-import re inside parse_metrics; the global import is unused and the inner import shadows it, triggering Ruff F401 & F811.

-import re
🧰 Tools
🪛 Ruff (0.8.2)

4-4: re imported but unused

Remove unused import: re

(F401)


46-52: Duplicate import inside function unnecessary
Inside parse_metrics you import re, logging; both are already available. Delete to avoid re-definition warnings and micro-overhead.

🧰 Tools
🪛 Ruff (0.8.2)

46-46: Redefinition of unused re from line 4

Remove definition: re

(F811)


26-35: requests.get without a timeout can block indefinitely
Add a small timeout (e.g. timeout=5) to prevent the docs-generation pipeline from hanging if the metrics endpoint is unreachable.

tools/gen-rpk-ascii.py (2)

425-425: Unnecessary trailing semicolon
Python does not need a semicolon here – remove to satisfy Ruff E703.

🧰 Tools
🪛 Ruff (0.8.2)

425-425: Statement ends with an unnecessary semicolon

Remove unnecessary semicolon

(E703)


21-23: Non-interactive scripts should avoid docker exec -it
-t allocates a TTY and fails in CI. Prefer docker exec -i for automated docs generation.

cli-utils/add-caret-external-links.py (5)

37-37: Fix indentation of the comment line.

This comment appears to be indented as part of the process_file function, but should be at the module level.

-# Get the directory of the current script
+
+# Get the directory of the current script

44-45: Remove duplicate comment line.

There's a duplicate comment "List of excluded file paths (relative paths)".

-# List of excluded file paths (relative paths)
 # List of excluded file paths (relative paths)

1-69: Add main guard to prevent unintended execution on import.

Python best practices recommend using an if __name__ == "__main__": guard to ensure that the script's main functionality only runs when executed directly, not when imported as a module.

 # Call the function with the constructed directory path
-process_directory(directory_path)
+
+if __name__ == "__main__":
+    process_directory(directory_path)
🧰 Tools
🪛 Ruff (0.8.2)

28-28: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


28-28: SyntaxError: Simple statements must be separated by newlines or semicolons


28-28: SyntaxError: Expected a statement


28-29: SyntaxError: Expected an expression


30-30: SyntaxError: Invalid annotated assignment target


30-30: SyntaxError: Expected an identifier, but found a keyword 'try' that cannot be used here


30-31: SyntaxError: Expected an expression


31-31: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


31-31: SyntaxError: Simple statements must be separated by newlines or semicolons


31-31: SyntaxError: Expected a statement


31-32: SyntaxError: Expected an expression


33-33: SyntaxError: Expected an identifier, but found a keyword 'except' that cannot be used here


33-33: SyntaxError: Simple statements must be separated by newlines or semicolons


33-33: SyntaxError: Expected a statement


33-34: SyntaxError: Expected an expression


35-35: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


35-35: SyntaxError: Simple statements must be separated by newlines or semicolons


36-36: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


36-36: SyntaxError: Simple statements must be separated by newlines or semicolons


30-34: Consider using more specific exception handling.

The code currently catches all exceptions with Exception, which is very broad. For file operations, it would be better to catch more specific exceptions like IOError or PermissionError.

 try:
     with open(file_path, 'w', encoding='utf-8') as file:
         file.write('\n'.join(updated_lines))
-except Exception as e:
+except (IOError, PermissionError) as e:
     print(f"Error writing to {file_path}: {e}")
     return False
🧰 Tools
🪛 Ruff (0.8.2)

30-30: SyntaxError: Invalid annotated assignment target


30-30: SyntaxError: Expected an identifier, but found a keyword 'try' that cannot be used here


30-31: SyntaxError: Expected an expression


31-31: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


31-31: SyntaxError: Simple statements must be separated by newlines or semicolons


31-31: SyntaxError: Expected a statement


31-32: SyntaxError: Expected an expression


33-33: SyntaxError: Expected an identifier, but found a keyword 'except' that cannot be used here


33-33: SyntaxError: Simple statements must be separated by newlines or semicolons


33-33: SyntaxError: Expected a statement


33-34: SyntaxError: Expected an expression


7-8: Add docstrings to functions for better code documentation.

Adding docstrings to functions improves code readability and helps future developers understand the purpose, parameters, and return values of your functions.

 # Function to process a single file
 def process_file(file_path):
+    """
+    Process a single AsciiDoc file to add carets to external links.
+    
+    Args:
+        file_path (str): Path to the AsciiDoc file to process
+        
+    Returns:
+        bool: True if processing was successful, False otherwise
+    """
     with open(file_path, 'r', encoding='utf-8') as file:

Similarly for the process_directory function:

 # Function to process all .adoc files in a directory
 def process_directory(directory_path):
+    """
+    Recursively process all AsciiDoc files in a directory.
+    
+    Args:
+        directory_path (str): Path to the directory containing AsciiDoc files
+    """
     for root, _, files in os.walk(directory_path):
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc2c48d and 7b92f26.

📒 Files selected for processing (7)
  • bin/doc-tools.js (1 hunks)
  • cli-utils/add-caret-external-links.py (1 hunks)
  • cli-utils/generate-cluster-docs.sh (1 hunks)
  • docker-compose/bootstrap.yml (1 hunks)
  • tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh (1 hunks)
  • tools/gen-rpk-ascii.py (1 hunks)
  • tools/metrics/metrics.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • docker-compose/bootstrap.yml
  • bin/doc-tools.js
  • cli-utils/generate-cluster-docs.sh
🧰 Additional context used
🪛 Shellcheck (0.10.0)
tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh

[warning] 30-30: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 33-33: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 54-54: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 56-56: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 60-60: Declare and assign separately to avoid masking return values.

(SC2155)


[warning] 63-63: Declare and assign separately to avoid masking return values.

(SC2155)

🪛 Ruff (0.8.2)
tools/gen-rpk-ascii.py

378-378: Using .strip() with multi-character strings is misleading

(B005)


402-402: Use enumerate() for index variable quantity in for loop

(SIM113)


415-415: Using .strip() with multi-character strings is misleading

(B005)


425-425: Statement ends with an unnecessary semicolon

Remove unnecessary semicolon

(E703)

cli-utils/add-caret-external-links.py

28-28: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


28-28: SyntaxError: Simple statements must be separated by newlines or semicolons


28-28: SyntaxError: Expected a statement


28-29: SyntaxError: Expected an expression


30-30: SyntaxError: Invalid annotated assignment target


30-30: SyntaxError: Expected an identifier, but found a keyword 'try' that cannot be used here


30-31: SyntaxError: Expected an expression


31-31: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


31-31: SyntaxError: Simple statements must be separated by newlines or semicolons


31-31: SyntaxError: Expected a statement


31-32: SyntaxError: Expected an expression


33-33: SyntaxError: Expected an identifier, but found a keyword 'except' that cannot be used here


33-33: SyntaxError: Simple statements must be separated by newlines or semicolons


33-33: SyntaxError: Expected a statement


33-34: SyntaxError: Expected an expression


35-35: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


35-35: SyntaxError: Simple statements must be separated by newlines or semicolons


36-36: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


36-36: SyntaxError: Simple statements must be separated by newlines or semicolons

tools/metrics/metrics.py

4-4: re imported but unused

Remove unused import: re

(F401)


46-46: Redefinition of unused re from line 4

Remove definition: re

(F811)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-extensions-and-macros
  • GitHub Check: Header rules - docs-extensions-and-macros
  • GitHub Check: Pages changed - docs-extensions-and-macros
🔇 Additional comments (2)
tools/docusaurus-to-antora-conversion-scripts/convert-docs.sh (1)

98-109: Function exit status is always 0 – failures are swallowed
convert_markdown_to_asciidoc has no explicit return; Bash therefore returns the status of the final echo, masking earlier errors.
Inside the loop you rely on the function’s exit code to increment failure_count, but that path can never be taken.

Add defensive set -e at top of the script or explicit return 1 on error branches inside the function, and return 0 on success.

cli-utils/add-caret-external-links.py (1)

28-36: Good addition of error handling for file operations.

The try-except block improves robustness by catching errors during file writing operations and providing informative error messages. The return value indicating success/failure is well utilized in the calling function.

🧰 Tools
🪛 Ruff (0.8.2)

28-28: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


28-28: SyntaxError: Simple statements must be separated by newlines or semicolons


28-28: SyntaxError: Expected a statement


28-29: SyntaxError: Expected an expression


30-30: SyntaxError: Invalid annotated assignment target


30-30: SyntaxError: Expected an identifier, but found a keyword 'try' that cannot be used here


30-31: SyntaxError: Expected an expression


31-31: SyntaxError: Expected an identifier, but found a keyword 'with' that cannot be used here


31-31: SyntaxError: Simple statements must be separated by newlines or semicolons


31-31: SyntaxError: Expected a statement


31-32: SyntaxError: Expected an expression


33-33: SyntaxError: Expected an identifier, but found a keyword 'except' that cannot be used here


33-33: SyntaxError: Simple statements must be separated by newlines or semicolons


33-33: SyntaxError: Expected a statement


33-34: SyntaxError: Expected an expression


35-35: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


35-35: SyntaxError: Simple statements must be separated by newlines or semicolons


36-36: SyntaxError: Expected an identifier, but found a keyword 'return' that cannot be used here


36-36: SyntaxError: Simple statements must be separated by newlines or semicolons

@JakeSCahill JakeSCahill changed the title Add a doc-tools CLI command so that writers and our tests have access to a consistent set of tools DOC-1275 Add a doc-tools CLI command so that writers and our tests have access to a consistent set of tools Apr 28, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (8)
tools/gen-rpk-ascii.py (5)

382-383: ⚠️ Potential issue

Don't use lstrip with multi-character strings

Similar to the previous comment, using lstrip() with "Flags:" doesn't remove the prefix as expected.

-available_flags = flags.lstrip("Flags:")
+prefix = "Flags:"
+available_flags = flags[len(prefix):] if flags.startswith(prefix) else flags

419-420: ⚠️ Potential issue

Don't use lstrip with multi-character strings

Another instance of using lstrip() with a multi-character string.

-    available_flags = flags.lstrip("Flags:")
+    prefix = "Flags:"
+    available_flags = flags[len(prefix):] if flags.startswith(prefix) else flags

415-417: ⚠️ Potential issue

Don't use lstrip with multi-character strings

Another instance of using lstrip() with a multi-character string.

-    available_commmands = commands.lstrip("Available Commands:")
+    prefix = "Available Commands:"
+    available_commmands = commands[len(prefix):] if commands.startswith(prefix) else commands
     new_commands = extract_new_commands(available_commmands, False)
🧰 Tools
🪛 Ruff (0.8.2)

415-415: Using .strip() with multi-character strings is misleading

(B005)


60-75: ⚠️ Potential issue

Improve command handling for nested commands with spaces

The current implementation in execute_process() splits only the first command, which would break for nested commands like "topic produce" where everything after the first space is lost.

 def execute_process(commands):
     if len(commands) > 0:
-        commands = commands[0].split(" ")
+        # Handle spaces in command properly
+        commands = []
+        for cmd in commands[0].split(" "):
+            if cmd.strip():  # Skip empty parts
+                commands.append(cmd)
     commands_to_execute = basic_commands_docker + rpk_basic_command + commands
     commands_to_execute.append("-h")
     try:
         process = subprocess.run(
             commands_to_execute,
             stdout=subprocess.PIPE,
             stderr=subprocess.PIPE,
             check=True,
         )
         return process.stdout.decode("utf-8")
     except subprocess.CalledProcessError as e:
         print(f"Error executing command: {' '.join(commands_to_execute)}")
         print(f"Error message: {e.stderr.decode('utf-8')}")
         sys.exit(1)

378-379: ⚠️ Potential issue

Don't use lstrip with multi-character strings

Using lstrip() with a multi-character string like "Available Commands:" doesn't remove the prefix as expected - it removes any characters that appear in the string until a character not in the string is encountered.

-available_commmands = commands.lstrip("Available Commands:")
+prefix = "Available Commands:"
+available_commmands = commands[len(prefix):] if commands.startswith(prefix) else commands
 it_commands = extract_new_commands(available_commmands, False)
🧰 Tools
🪛 Ruff (0.8.2)

378-378: Using .strip() with multi-character strings is misleading

(B005)

tools/property-extractor/property_extractor.py (2)

14-14: 🛠️ Refactor suggestion

Avoid import * – switch to explicit imports

Wildcard imports pollute the namespace, impede static analysis, and make the real dependencies of the module unclear. Please import only the transformer classes that are actually used.

-from transformers import *
+# from transformers import (
+#     TypeTransformer,
+#     EnterpriseTransformer,
+#     MetaParamTransformer,
+#     BasicInfoTransformer,
+#     IsNullableTransformer,
+#     IsArrayTransformer,
+#     NeedsRestartTransformer,
+#     VisibilityTransformer,
+#     DeprecatedTransformer,
+#     IsSecretTransformer,
+#     NumericBoundsTransformer,
+#     DurationBoundsTransformer,
+#     SimpleDefaultValuesTransformer,
+#     FriendlyDefaultTransformer,
+#     ExperimentalTransformer,
+#     AliasTransformer,
+# )
🧰 Tools
🪛 Ruff (0.8.2)

14-14: from transformers import * used; unable to detect undefined names

(F403)


33-37: ⚠️ Potential issue

--recursive flag currently has no effect

Both branches use path.rglob("*.h"), so non-recursive scans are impossible. Use path.glob when --recursive is absent.

-    file_iter = path.rglob("*.h") if options.recursive else path.rglob("*.h")
+    file_iter = path.rglob("*.h") if options.recursive else path.glob("*.h")
tools/property-extractor/transformers.py (1)

341-347: ⚠️ Potential issue

Side-effects on info['params'] may break later transformers

EnterpriseTransformer.parse() deletes the first parameter in-place, which downstream transformers (e.g. BasicInfoTransformer, IsNullableTransformer) expect to still exist.
Either:

  1. Re-order the transformer list so that EnterpriseTransformer runs after those that rely on the original list, or
  2. Work on a copy of the params to avoid mutation.
-            enterpriseValue = info['params'][0]['value']
-            property['enterprise_value'] = enterpriseValue
-            property['is_enterprise'] = True
-            del info['params'][0]
+            enterprise_value = info['params'][0]['value']
+            property['enterprise_value'] = enterprise_value
+            property['is_enterprise'] = True
+            # Work on a copy to avoid corrupting the list for other transformers
+            info['params'] = info['params'][1:]
             return property
🧹 Nitpick comments (14)
tools/metrics/metrics.py (6)

4-4: Remove unused import

The re module is imported globally here but is only used locally inside functions where it's re-imported.

-import re
🧰 Tools
🪛 Ruff (0.8.2)

4-4: re imported but unused

Remove unused import: re

(F401)


50-51: Remove duplicate comment

There's a duplicated comment "Gather HELP/TYPE metadata in any order".

-    # Gather HELP/TYPE metadata in any order
     # Gather HELP/TYPE metadata in any order

160-168: Consider more robust path validation

The current code validates that the autogenerated directory exists but doesn't check if it's actually a directory (vs. a file). While os.path.isdir() is used, adding a clearer error message would help users understand any issues.

     repo_root = os.getcwd()
     gen_path = os.path.join(repo_root, "autogenerated")
     if not os.path.isdir(gen_path):
-        logging.error(f"autogenerated folder not found at: {gen_path}")
+        logging.error(f"autogenerated folder not found or is not a directory at: {gen_path}")
+        logging.error(f"Please create the directory '{gen_path}' before running this script.")
         sys.exit(1)

170-175: Add retries for HTTP connections

The script currently exits immediately if fetching public metrics fails. Consider adding retries for transient network issues, especially since this might be run during cluster startup.

+    # Add retries for potential transient network issues
+    max_retries = 3
+    retry_delay = 2  # seconds
+    retry_count = 0
+    
     METRICS_URL = "http://localhost:19644/public_metrics/"
-    metrics_text = fetch_metrics(METRICS_URL)
-    if not metrics_text:
-        logging.error("No public metrics retrieved. Exiting.")
-        sys.exit(1)
+    metrics_text = None
+    while retry_count < max_retries and not metrics_text:
+        metrics_text = fetch_metrics(METRICS_URL)
+        if not metrics_text:
+            retry_count += 1
+            if retry_count < max_retries:
+                logging.warning(f"Failed to retrieve public metrics, retrying in {retry_delay} seconds (attempt {retry_count}/{max_retries})...")
+                time.sleep(retry_delay)
+            else:
+                logging.error(f"No public metrics retrieved after {max_retries} attempts. Exiting.")
+                sys.exit(1)

Remember to add import time at the top of the file if you implement this suggestion.


188-192: Separate out metric types for easier documentation

Consider separating public and internal metrics more explicitly by adding headers or sections in the AsciiDoc output to improve readability.

     # Merge public and internal metrics.
     merged_metrics = {
-        "public": public_metrics,
-        "internal": internal_metrics
+        "public": {
+            "title": "Public Metrics",
+            "description": "Metrics available at the /public_metrics/ endpoint",
+            "metrics": public_metrics
+        },
+        "internal": {
+            "title": "Internal Metrics",
+            "description": "Metrics available at the /metrics/ endpoint",
+            "metrics": internal_metrics
+        }
     }

198-200: Add headers to AsciiDoc output files

Add explanatory headers to the AsciiDoc files to provide context for readers.

     output_json(merged_metrics, JSON_OUTPUT_FILE)
-    output_asciidoc(public_metrics, ASCIIDOC_OUTPUT_FILE)
-    output_asciidoc(internal_metrics, INTERNAL_ASCIIDOC_OUTPUT_FILE)
+    
+    # Write public metrics with header
+    with open(ASCIIDOC_OUTPUT_FILE, "w") as f:
+        f.write("= Public Metrics\n\n")
+        f.write("This document lists all public metrics exposed by Redpanda at the `/public_metrics/` endpoint.\n\n")
+        f.write("---\n\n")
+    output_asciidoc(public_metrics, ASCIIDOC_OUTPUT_FILE, mode="a")
+    
+    # Write internal metrics with header
+    with open(INTERNAL_ASCIIDOC_OUTPUT_FILE, "w") as f:
+        f.write("= Internal Metrics\n\n")
+        f.write("This document lists all internal metrics exposed by Redpanda at the `/metrics/` endpoint.\n\n")
+        f.write("---\n\n")
+    output_asciidoc(internal_metrics, INTERNAL_ASCIIDOC_OUTPUT_FILE, mode="a")

You would also need to modify the output_asciidoc function to accept a mode parameter:

-def output_asciidoc(metrics, adoc_file):
+def output_asciidoc(metrics, adoc_file, mode="w"):
     """Output metrics as AsciiDoc."""
-    with open(adoc_file, "w") as f:
+    with open(adoc_file, mode) as f:
tools/gen-rpk-ascii.py (4)

425-425: Remove unnecessary semicolon

JavaScript-style semicolons are not needed in Python.

-    cmd_dict = build_dict(cmd_dict, executed_command, explanation, usage, it_flags, flag_list);
+    cmd_dict = build_dict(cmd_dict, executed_command, explanation, usage, it_flags, flag_list)
🧰 Tools
🪛 Ruff (0.8.2)

425-425: Statement ends with an unnecessary semicolon

Remove unnecessary semicolon

(E703)


401-403: Use enumerate() instead of manually incrementing a counter

Use Python's enumerate() function instead of manually incrementing a counter variable.

-quantity =0
-
-for command in it_commands:
-    quantity+=1
+for quantity, command in enumerate(it_commands, start=1):
🧰 Tools
🪛 Ruff (0.8.2)

402-402: Use enumerate() for index variable quantity in for loop

(SIM113)


436-437: Improve progress reporting

The current progress reporting only prints every 20 commands. Consider adding a more informative progress bar or at least showing percentage completion.

-    if(quantity%20==0):
-        print(f"{quantity}/{len(it_commands)} files written in disk.")
+    if quantity % 20 == 0 or quantity == len(it_commands):
+        percentage = (quantity / len(it_commands)) * 100
+        print(f"{quantity}/{len(it_commands)} files written to disk ({percentage:.1f}% complete).")

463-471: Consider error handling for file path creation

Add error handling for the case where directory creation fails.

 # Write the JSON file into a JSON subdirectory within the same base directory.
 json_path = os.path.join(base_dir, "json")
 if not os.path.exists(json_path):
-    os.makedirs(json_path)
+    try:
+        os.makedirs(json_path)
+    except OSError as e:
+        print(f"Error creating directory {json_path}: {e}")
+        sys.exit(1)

 json_file = os.path.join(json_path, "rpk-commands.json")
 try:
     with open(json_file, "w") as filetowrite:
         filetowrite.write(json_object)
     print("The rpk commands have been successfully generated at", json_file)
tools/property-extractor/property_extractor.py (1)

78-81: Collapse nested if and improve readability

A single conditional is sufficient and removes one indentation level.

-        if any(path in implementation_value for path in file_paths):
-            if len(properties) > 0:
-                files_with_properties.append((fp, properties))
+        if len(properties) > 0 and any(p in implementation_value for p in file_paths):
+            files_with_properties.append((fp, properties))
🧰 Tools
🪛 Ruff (0.8.2)

78-79: Use a single if statement instead of nested if statements

(SIM102)

tools/property-extractor/transformers.py (3)

68-72: Simplify boolean expression

accepts() can return the boolean directly.

-        return (
-            True
-            if len(info["params"]) > 2 and "visibility" in info["params"][2]["value"]
-            else False
-        )
+        return len(info["params"]) > 2 and "visibility" in info["params"][2]["value"]
🧰 Tools
🪛 Ruff (0.8.2)

69-71: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


140-144: Same simplification applies here

-        return (
-            True
-            if len(info["params"]) > 2 and "secret" in info["params"][2]["value"]
-            else False
-        )
+        return len(info["params"]) > 2 and "secret" in info["params"][2]["value"]
🧰 Tools
🪛 Ruff (0.8.2)

141-143: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


38-51: Return value consistency

parse() mutates property but does not return it, whereas most other transformers do.
Although the current caller ignores the return, having a consistent interface limits surprises and future refactor risks. Suggest return property at the end.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7b92f26 and 68ee36f.

📒 Files selected for processing (7)
  • cli-utils/generate-cluster-docs.sh (1 hunks)
  • docker-compose/bootstrap.yml (1 hunks)
  • docker-compose/docker-compose.yml (1 hunks)
  • tools/gen-rpk-ascii.py (1 hunks)
  • tools/metrics/metrics.py (1 hunks)
  • tools/property-extractor/property_extractor.py (1 hunks)
  • tools/property-extractor/transformers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • docker-compose/bootstrap.yml
  • cli-utils/generate-cluster-docs.sh
  • docker-compose/docker-compose.yml
🧰 Additional context used
🧬 Code Graph Analysis (1)
tools/property-extractor/property_extractor.py (4)
tools/property-extractor/file_pair.py (1)
  • FilePair (1-7)
tools/property-extractor/parser.py (2)
  • build_treesitter_cpp_library (223-224)
  • extract_properties_from_file_pair (203-220)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/transformers.py (48)
  • TypeTransformer (80-122)
  • EnterpriseTransformer (337-347)
  • MetaParamTransformer (350-381)
  • BasicInfoTransformer (6-17)
  • IsNullableTransformer (20-35)
  • IsArrayTransformer (38-50)
  • NeedsRestartTransformer (53-63)
  • VisibilityTransformer (66-77)
  • DeprecatedTransformer (125-135)
  • IsSecretTransformer (138-148)
  • NumericBoundsTransformer (151-175)
  • DurationBoundsTransformer (178-203)
  • SimpleDefaultValuesTransformer (206-256)
  • FriendlyDefaultTransformer (259-305)
  • ExperimentalTransformer (308-313)
  • AliasTransformer (316-334)
  • accepts (7-8)
  • accepts (21-22)
  • accepts (42-43)
  • accepts (54-55)
  • accepts (67-72)
  • accepts (81-82)
  • accepts (126-131)
  • accepts (139-144)
  • accepts (155-157)
  • accepts (182-183)
  • accepts (207-209)
  • accepts (269-270)
  • accepts (309-310)
  • accepts (317-324)
  • accepts (338-339)
  • accepts (351-361)
  • parse (10-17)
  • parse (24-35)
  • parse (45-50)
  • parse (57-63)
  • parse (74-77)
  • parse (120-122)
  • parse (133-135)
  • parse (146-148)
  • parse (159-175)
  • parse (185-203)
  • parse (211-256)
  • parse (272-305)
  • parse (311-313)
  • parse (326-334)
  • parse (341-347)
  • parse (363-381)
🪛 Ruff (0.8.2)
tools/property-extractor/transformers.py

69-71: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


141-143: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


226-226: Remove unnecessary True if ... else False

Remove unnecessary True if ... else False

(SIM210)

tools/gen-rpk-ascii.py

378-378: Using .strip() with multi-character strings is misleading

(B005)


402-402: Use enumerate() for index variable quantity in for loop

(SIM113)


415-415: Using .strip() with multi-character strings is misleading

(B005)


425-425: Statement ends with an unnecessary semicolon

Remove unnecessary semicolon

(E703)

tools/metrics/metrics.py

4-4: re imported but unused

Remove unused import: re

(F401)


46-46: Redefinition of unused re from line 4

Remove definition: re

(F811)

tools/property-extractor/property_extractor.py

14-14: from transformers import * used; unable to detect undefined names

(F403)


78-79: Use a single if statement instead of nested if statements

(SIM102)


88-88: TypeTransformer may be undefined, or defined from star imports

(F405)


90-90: TypeTransformer may be undefined, or defined from star imports

(F405)


91-91: EnterpriseTransformer may be undefined, or defined from star imports

(F405)


92-92: MetaParamTransformer may be undefined, or defined from star imports

(F405)


93-93: BasicInfoTransformer may be undefined, or defined from star imports

(F405)


94-94: IsNullableTransformer may be undefined, or defined from star imports

(F405)


95-95: IsArrayTransformer may be undefined, or defined from star imports

(F405)


96-96: NeedsRestartTransformer may be undefined, or defined from star imports

(F405)


97-97: VisibilityTransformer may be undefined, or defined from star imports

(F405)


98-98: DeprecatedTransformer may be undefined, or defined from star imports

(F405)


99-99: IsSecretTransformer may be undefined, or defined from star imports

(F405)


100-100: NumericBoundsTransformer may be undefined, or defined from star imports

(F405)


101-101: DurationBoundsTransformer may be undefined, or defined from star imports

(F405)


102-102: SimpleDefaultValuesTransformer may be undefined, or defined from star imports

(F405)


103-103: FriendlyDefaultTransformer may be undefined, or defined from star imports

(F405)


104-104: ExperimentalTransformer may be undefined, or defined from star imports

(F405)


105-105: AliasTransformer may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-extensions-and-macros
  • GitHub Check: Header rules - docs-extensions-and-macros
  • GitHub Check: Pages changed - docs-extensions-and-macros
🔇 Additional comments (3)
tools/metrics/metrics.py (2)

52-68: Well-implemented UNIT metadata support

Good job implementing the parsing of Prometheus # UNIT metadata lines alongside # HELP and # TYPE. This follows the Prometheus exposition format standards and stores unit information appropriately.


131-132: Good AsciiDoc output for unit information

The AsciiDoc output correctly includes unit information when available, which improves the documentation quality.

tools/gen-rpk-ascii.py (1)

267-269: ⚠️ Potential issue

Fix regex replacement for stringArray type

There's a mismatch between the regex search pattern and the replacement pattern for stringArray type.

         elif re.search(r"\bstringArray\b", flag):
             flag_type = "stringArray"
-            explanation = re.sub(r"\bstringArray\b", "", explanation)
+            explanation = re.sub(r"\bstringArray\b", "", explanation)

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
tools/property-extractor/property_extractor.py (3)

36-36: ⚠️ Potential issue

--recursive flag currently has no effect

path.rglob("*.h") is executed in both branches, so non-recursive scanning is impossible.

-    file_iter = path.rglob("*.h") if options.recursive else path.rglob("*.h")
+    file_iter = path.rglob("*.h") if options.recursive else path.glob("*.h")

14-14: 🛠️ Refactor suggestion

Avoid import * from transformers

Wildcard imports pollute the module namespace, hinder static analysis, and are the reason for the Ruff F403/F405 hints. Import the specific transformer classes you need instead.

-from transformers import *
+from transformers import (
+    EnterpriseTransformer,
+    MetaParamTransformer,
+    BasicInfoTransformer,
+    TypeTransformer,
+    IsNullableTransformer,
+    IsArrayTransformer,
+    NeedsRestartTransformer,
+    VisibilityTransformer,
+    DeprecatedTransformer,
+    IsSecretTransformer,
+    NumericBoundsTransformer,
+    DurationBoundsTransformer,
+    SimpleDefaultValuesTransformer,
+    FriendlyDefaultTransformer,
+    ExperimentalTransformer,
+    AliasTransformer,
+)
🧰 Tools
🪛 Ruff (0.8.2)

14-14: from transformers import * used; unable to detect undefined names

(F403)


88-91: ⚠️ Potential issue

Fix duplicate TypeTransformer and transformer order

There are two issues here:

  1. A redundant TypeTransformer instance is created on line 88 and another on line 91
  2. The comment indicates EnterpriseTransformer should be first, but it's not

This can cause issues with transformer ordering and potentially incorrect property processing.

-    type_transformer = TypeTransformer()
-    transformers = [
-        EnterpriseTransformer(), ## this must be the first, as it modifies current data
-        TypeTransformer(),
+    # Create TypeTransformer once to ensure consistency
+    type_transformer = TypeTransformer()
+    transformers = [
+        EnterpriseTransformer(),  # this must be the first, as it modifies current data
+        type_transformer,
🧰 Tools
🪛 Ruff (0.8.2)

88-88: TypeTransformer may be undefined, or defined from star imports

(F405)


90-90: EnterpriseTransformer may be undefined, or defined from star imports

(F405)


91-91: TypeTransformer may be undefined, or defined from star imports

(F405)

🧹 Nitpick comments (2)
tools/property-extractor/property_extractor.py (2)

78-80: Simplify nested if statements

The nested if statements can be combined into a single statement for better readability.

-        # Check if any of the paths are in fp.implementation
-        if any(path in implementation_value for path in file_paths):
-            if len(properties) > 0:
-                files_with_properties.append((fp, properties))
+        # Check if any of the paths are in fp.implementation and if properties exist
+        if any(path in implementation_value for path in file_paths) and len(properties) > 0:
+            files_with_properties.append((fp, properties))
🧰 Tools
🪛 Ruff (0.8.2)

78-79: Use a single if statement instead of nested if statements

(SIM102)


82-82: Remove empty line with whitespace

There's an empty line containing whitespace that should be removed.

-    
+
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68ee36f and db15af2.

📒 Files selected for processing (2)
  • tools/metrics/metrics.py (1 hunks)
  • tools/property-extractor/property_extractor.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tools/metrics/metrics.py
🧰 Additional context used
🪛 Ruff (0.8.2)
tools/property-extractor/property_extractor.py

14-14: from transformers import * used; unable to detect undefined names

(F403)


78-79: Use a single if statement instead of nested if statements

(SIM102)


88-88: TypeTransformer may be undefined, or defined from star imports

(F405)


90-90: EnterpriseTransformer may be undefined, or defined from star imports

(F405)


91-91: TypeTransformer may be undefined, or defined from star imports

(F405)


92-92: MetaParamTransformer may be undefined, or defined from star imports

(F405)


93-93: BasicInfoTransformer may be undefined, or defined from star imports

(F405)


94-94: IsNullableTransformer may be undefined, or defined from star imports

(F405)


95-95: IsArrayTransformer may be undefined, or defined from star imports

(F405)


96-96: NeedsRestartTransformer may be undefined, or defined from star imports

(F405)


97-97: VisibilityTransformer may be undefined, or defined from star imports

(F405)


98-98: DeprecatedTransformer may be undefined, or defined from star imports

(F405)


99-99: IsSecretTransformer may be undefined, or defined from star imports

(F405)


100-100: NumericBoundsTransformer may be undefined, or defined from star imports

(F405)


101-101: DurationBoundsTransformer may be undefined, or defined from star imports

(F405)


102-102: SimpleDefaultValuesTransformer may be undefined, or defined from star imports

(F405)


103-103: FriendlyDefaultTransformer may be undefined, or defined from star imports

(F405)


104-104: ExperimentalTransformer may be undefined, or defined from star imports

(F405)


105-105: AliasTransformer may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-extensions-and-macros
  • GitHub Check: Header rules - docs-extensions-and-macros
  • GitHub Check: Pages changed - docs-extensions-and-macros
🔇 Additional comments (5)
tools/property-extractor/property_extractor.py (5)

95-95: Verify IsArrayTransformer usage of shared type_transformer

IsArrayTransformer uses type_transformer which is correct, but ensure it's the same instance used later by NumericBoundsTransformer and DurationBoundsTransformer.

🧰 Tools
🪛 Ruff (0.8.2)

95-95: IsArrayTransformer may be undefined, or defined from star imports

(F405)


100-101: Fix undefined variable references

The type_transformer variable was created on line 88 and is correctly used by multiple transformers. This design ensures consistent behavior among transformers that process type information.

🧰 Tools
🪛 Ruff (0.8.2)

100-100: NumericBoundsTransformer may be undefined, or defined from star imports

(F405)


101-101: DurationBoundsTransformer may be undefined, or defined from star imports

(F405)


134-142: Safe property access with get() method

Great use of .get() method and dictionary default values to prevent KeyError exceptions when properties may not have expected keys.


202-208: Good error handling for JSON parsing

The code correctly implements error handling for JSON parsing, which captures potential file format issues and provides clear error messages.


231-237: Good error handling for file output operations

The code correctly handles potential file operation errors, such as permission issues or disk space limitations, with appropriate logging and error reporting.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (4)
tools/property-extractor/property_extractor.py (4)

36-36: ⚠️ Potential issue

--recursive flag has no effect

The --recursive flag isn't working as intended because path.rglob("*.h") is used in both conditional branches, making the recursive scanning always enabled.

-    file_iter = path.rglob("*.h") if options.recursive else path.rglob("*.h")
+    file_iter = path.rglob("*.h") if options.recursive else path.glob("*.h")

134-143: ⚠️ Potential issue

Add defensive checks to prevent KeyError exceptions

The current implementation may raise KeyError exceptions if properties are missing expected keys. Use .get() with default values to handle these cases safely.

         # guard against missing "type"
         prop_type = property.get("type")
         if prop_type and prop_type in definitions:
-            properties[name]["type"] = "#/definitions/{prop_type}"
+            properties[name]["type"] = f"#/definitions/{prop_type}"
         elif (
             prop_type == "array"
             and property.get("items", {}).get("type") in definitions
         ):
             properties[name]["items"]["type"] = (
                 f"#/definitions/{property['items']['type']}"
             )

14-14: 🛠️ Refactor suggestion

Avoid using wildcard imports

Importing with * pollutes the namespace and makes it difficult to determine which symbols are being used, leading to potential naming conflicts.

-from transformers import *
+from transformers import (
+    EnterpriseTransformer,
+    TypeTransformer,
+    MetaParamTransformer,
+    BasicInfoTransformer,
+    IsNullableTransformer,
+    IsArrayTransformer,
+    NeedsRestartTransformer,
+    GetsRestoredTransformer,
+    VisibilityTransformer,
+    DeprecatedTransformer,
+    IsSecretTransformer,
+    NumericBoundsTransformer,
+    DurationBoundsTransformer,
+    SimpleDefaultValuesTransformer,
+    FriendlyDefaultTransformer,
+    ExperimentalTransformer,
+    AliasTransformer,
+)
🧰 Tools
🪛 Ruff (0.8.2)

14-14: from transformers import * used; unable to detect undefined names

(F403)


88-107: ⚠️ Potential issue

Fix transformer initialization issues

There are several issues with the transformer setup:

  1. EnterpriseTransformer is noted as needing to be first but is not first in the execution order
  2. TypeTransformer is instantiated twice (lines 88 and 91)
  3. The transformers using type_transformer need to use the instance created at line 88
     type_transformer = TypeTransformer()
     transformers = [
+        type_transformer,
         EnterpriseTransformer(), ## this must be the first, as it modifies current data
-        TypeTransformer(),
         MetaParamTransformer(),
         BasicInfoTransformer(),
         IsNullableTransformer(),
         IsArrayTransformer(type_transformer),
         NeedsRestartTransformer(),
         GetsRestoredTransformer(),
         VisibilityTransformer(),
         DeprecatedTransformer(),
         IsSecretTransformer(),
         NumericBoundsTransformer(type_transformer),
         DurationBoundsTransformer(type_transformer),
         SimpleDefaultValuesTransformer(),
         FriendlyDefaultTransformer(),
         ExperimentalTransformer(),
         AliasTransformer(),
     ]
🧰 Tools
🪛 Ruff (0.8.2)

88-88: TypeTransformer may be undefined, or defined from star imports

(F405)


90-90: EnterpriseTransformer may be undefined, or defined from star imports

(F405)


91-91: TypeTransformer may be undefined, or defined from star imports

(F405)


92-92: MetaParamTransformer may be undefined, or defined from star imports

(F405)


93-93: BasicInfoTransformer may be undefined, or defined from star imports

(F405)


94-94: IsNullableTransformer may be undefined, or defined from star imports

(F405)


95-95: IsArrayTransformer may be undefined, or defined from star imports

(F405)


96-96: NeedsRestartTransformer may be undefined, or defined from star imports

(F405)


97-97: GetsRestoredTransformer may be undefined, or defined from star imports

(F405)


98-98: VisibilityTransformer may be undefined, or defined from star imports

(F405)


99-99: DeprecatedTransformer may be undefined, or defined from star imports

(F405)


100-100: IsSecretTransformer may be undefined, or defined from star imports

(F405)


101-101: NumericBoundsTransformer may be undefined, or defined from star imports

(F405)


102-102: DurationBoundsTransformer may be undefined, or defined from star imports

(F405)


103-103: SimpleDefaultValuesTransformer may be undefined, or defined from star imports

(F405)


104-104: FriendlyDefaultTransformer may be undefined, or defined from star imports

(F405)


105-105: ExperimentalTransformer may be undefined, or defined from star imports

(F405)


106-106: AliasTransformer may be undefined, or defined from star imports

(F405)

🧹 Nitpick comments (5)
tools/property-extractor/property_extractor.py (1)

78-80: Simplify nested conditional

This nested conditional can be simplified with a single if statement using and to combine the conditions.

-        # Check if any of the paths are in fp.implementation
-        if any(path in implementation_value for path in file_paths):
-            if len(properties) > 0:
-                files_with_properties.append((fp, properties))
+        # Check if any of the paths are in fp.implementation and properties exist
+        if any(path in implementation_value for path in file_paths) and len(properties) > 0:
+            files_with_properties.append((fp, properties))
🧰 Tools
🪛 Ruff (0.8.2)

78-79: Use a single if statement instead of nested if statements

(SIM102)

tools/property-extractor/transformers.py (4)

82-94: Simplify boolean expression and add missing return

The accepts method uses an unnecessarily verbose conditional expression. Also, the parse method is missing a return statement.

     def accepts(self, info, file_pair):
-        return (
-            True
-            if len(info["params"]) > 2 and "visibility" in info["params"][2]["value"]
-            else False
-        )
+        return bool(len(info["params"]) > 2 and "visibility" in info["params"][2]["value"])

     def parse(self, property, info, file_pair):
         property["visibility"] = re.sub(
             r"^.*::", "", info["params"][2]["value"]["visibility"]
         )
+        return property
🧰 Tools
🪛 Ruff (0.8.2)

85-87: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


154-165: Simplify boolean expression and add missing return

The accepts method uses an unnecessarily verbose conditional expression. Also, the parse method is missing a return statement.

     def accepts(self, info, file_pair):
-        return (
-            True
-            if len(info["params"]) > 2 and "secret" in info["params"][2]["value"]
-            else False
-        )
+        return bool(len(info["params"]) > 2 and "secret" in info["params"][2]["value"])

     def parse(self, property, info, file_pair):
         is_secret = re.sub(r"^.*::", "", info["params"][2]["value"]["secret"])
         property["is_secret"] = is_secret == "yes"
+        return property
🧰 Tools
🪛 Ruff (0.8.2)

157-159: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


242-242: Simplify boolean expression

This can be simplified to directly assign the boolean result.

-            property["default"] = True if default == "true" else False
+            property["default"] = default == "true"
🧰 Tools
🪛 Ruff (0.8.2)

242-242: Remove unnecessary True if ... else False

Remove unnecessary True if ... else False

(SIM210)


353-364: Document side effect in EnterpriseTransformer.parse

This transformer modifies the input info object by removing an element from info['params'], which could affect other transformers. Consider documenting this side effect for maintainability.

     def parse(self, property, info, file_pair):
         if info['params'] is not None:
             enterpriseValue = info['params'][0]['value']
             property['enterprise_value'] = enterpriseValue
             property['is_enterprise'] = True
+            # Note: modifying the input info object - this affects other transformers
             del info['params'][0]
             return property
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between db15af2 and 4936687.

📒 Files selected for processing (3)
  • tools/property-extractor/json-to-asciidoc/generate_docs.py (1 hunks)
  • tools/property-extractor/property_extractor.py (1 hunks)
  • tools/property-extractor/transformers.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tools/property-extractor/json-to-asciidoc/generate_docs.py
🧰 Additional context used
🧬 Code Graph Analysis (2)
tools/property-extractor/transformers.py (2)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/parser.py (1)
  • normalize_string (87-88)
tools/property-extractor/property_extractor.py (4)
tools/property-extractor/file_pair.py (1)
  • FilePair (1-7)
tools/property-extractor/parser.py (2)
  • build_treesitter_cpp_library (223-224)
  • extract_properties_from_file_pair (203-220)
tools/property-extractor/property_bag.py (1)
  • PropertyBag (1-4)
tools/property-extractor/transformers.py (49)
  • TypeTransformer (96-138)
  • EnterpriseTransformer (353-363)
  • MetaParamTransformer (366-397)
  • BasicInfoTransformer (6-17)
  • IsNullableTransformer (20-35)
  • IsArrayTransformer (38-50)
  • NeedsRestartTransformer (53-63)
  • GetsRestoredTransformer (65-79)
  • VisibilityTransformer (82-93)
  • DeprecatedTransformer (141-151)
  • IsSecretTransformer (154-164)
  • NumericBoundsTransformer (167-191)
  • DurationBoundsTransformer (194-219)
  • SimpleDefaultValuesTransformer (222-272)
  • FriendlyDefaultTransformer (275-321)
  • ExperimentalTransformer (324-329)
  • AliasTransformer (332-350)
  • accepts (7-8)
  • accepts (21-22)
  • accepts (42-43)
  • accepts (54-55)
  • accepts (66-72)
  • accepts (83-88)
  • accepts (97-98)
  • accepts (142-147)
  • accepts (155-160)
  • accepts (171-173)
  • accepts (198-199)
  • accepts (223-225)
  • accepts (285-286)
  • accepts (325-326)
  • accepts (333-340)
  • accepts (354-355)
  • parse (10-17)
  • parse (24-35)
  • parse (45-50)
  • parse (57-63)
  • parse (74-79)
  • parse (90-93)
  • parse (136-138)
  • parse (149-151)
  • parse (162-164)
  • parse (175-191)
  • parse (201-219)
  • parse (227-272)
  • parse (288-321)
  • parse (327-329)
  • parse (342-350)
  • parse (357-363)
🪛 Ruff (0.8.2)
tools/property-extractor/transformers.py

85-87: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


157-159: Use bool(...) instead of True if ... else False

Replace with `bool(...)

(SIM210)


242-242: Remove unnecessary True if ... else False

Remove unnecessary True if ... else False

(SIM210)

tools/property-extractor/property_extractor.py

14-14: from transformers import * used; unable to detect undefined names

(F403)


78-79: Use a single if statement instead of nested if statements

(SIM102)


88-88: TypeTransformer may be undefined, or defined from star imports

(F405)


90-90: EnterpriseTransformer may be undefined, or defined from star imports

(F405)


91-91: TypeTransformer may be undefined, or defined from star imports

(F405)


92-92: MetaParamTransformer may be undefined, or defined from star imports

(F405)


93-93: BasicInfoTransformer may be undefined, or defined from star imports

(F405)


94-94: IsNullableTransformer may be undefined, or defined from star imports

(F405)


95-95: IsArrayTransformer may be undefined, or defined from star imports

(F405)


96-96: NeedsRestartTransformer may be undefined, or defined from star imports

(F405)


97-97: GetsRestoredTransformer may be undefined, or defined from star imports

(F405)


98-98: VisibilityTransformer may be undefined, or defined from star imports

(F405)


99-99: DeprecatedTransformer may be undefined, or defined from star imports

(F405)


100-100: IsSecretTransformer may be undefined, or defined from star imports

(F405)


101-101: NumericBoundsTransformer may be undefined, or defined from star imports

(F405)


102-102: DurationBoundsTransformer may be undefined, or defined from star imports

(F405)


103-103: SimpleDefaultValuesTransformer may be undefined, or defined from star imports

(F405)


104-104: FriendlyDefaultTransformer may be undefined, or defined from star imports

(F405)


105-105: ExperimentalTransformer may be undefined, or defined from star imports

(F405)


106-106: AliasTransformer may be undefined, or defined from star imports

(F405)

⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-extensions-and-macros
  • GitHub Check: Header rules - docs-extensions-and-macros
  • GitHub Check: Pages changed - docs-extensions-and-macros
🔇 Additional comments (5)
tools/property-extractor/property_extractor.py (2)

203-210: Good error handling for JSON parsing

The proper error handling for JSON parsing is excellent practice. This helps prevent the script from failing silently with cryptic errors when the definitions file has malformed JSON.


232-239: Proper error handling for file operations

The try-except block for file output operations is a good practice to handle potential IO errors like permission issues or disk space constraints.

tools/property-extractor/transformers.py (3)

1-4: Well-organized imports

The imports are clean, specific, and well-organized. Using local imports from other modules in the package is properly implemented.


222-273: Well-implemented default value parsing

The SimpleDefaultValuesTransformer provides comprehensive handling of various default value types with appropriate type conversions and formatting. The regex patterns are correctly written, and the transformations are well-structured.

🧰 Tools
🪛 Ruff (0.8.2)

242-242: Remove unnecessary True if ... else False

Remove unnecessary True if ... else False

(SIM210)


380-386: Good defensive programming in MetaParamTransformer.parse

The method properly checks for the existence of 'params' and handles the case when it's None, returning the property gracefully. This prevents potential runtime errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
bin/doc-tools.js (1)

222-234: Create a dedicated dependency verification function for rpk-docs

The rpk-docs command is using verifyMetricsDependencies() which checks for dependencies that may not all be required for RPK documentation generation.

As mentioned in a previous review, the generate-cluster-docs.sh script invoked by rpk-docs requires Python and Docker Compose support, but not necessarily curl or tar. A dedicated verification function would be more accurate.

-    verifyMetricsDependencies();
+    function verifyClusterDocsDependencies() {
+      checkPython();
+      checkDocker();
+    }
+    
+    verifyClusterDocsDependencies();
🧹 Nitpick comments (7)
tools/fetch-from-github.js (2)

17-22: Add error handling to the saveFile function

The saveFile function could fail if there are file system permission issues or other IO errors, but these aren't currently handled.

 async function saveFile(content, saveDir, filename) {
-  await fs.promises.mkdir(saveDir, { recursive: true });
-  const target = path.join(saveDir, filename);
-  await fs.promises.writeFile(target, content);
-  console.log(`Saved: ${target}`);
+  try {
+    await fs.promises.mkdir(saveDir, { recursive: true });
+    const target = path.join(saveDir, filename);
+    await fs.promises.writeFile(target, content);
+    console.log(`Saved: ${target}`);
+  } catch (error) {
+    throw new Error(`Failed to save file ${filename}: ${error.message}`);
+  }
 }

41-41: Add JSDoc documentation for the exported function

Adding JSDoc comments would make the module more maintainable and easier to use for other developers.

+/**
+ * Fetches a file or directory from a GitHub repository and saves it locally.
+ * 
+ * @param {string} owner - GitHub repository owner or organization
+ * @param {string} repo - GitHub repository name
+ * @param {string} remotePath - Path to the file or directory in the repository
+ * @param {string} saveDir - Local directory to save the file(s) to
+ * @param {string} [customFilename] - Optional custom filename for the saved file (only used when remotePath is a file)
+ * @returns {Promise<void>} - Resolves when all files have been fetched and saved
+ * @throws {Error} - If API requests fail or file saving fails
+ */
 module.exports = fetchFromGithub
bin/doc-tools.js (5)

109-109: Read package version from package.json instead of hardcoding

The CLI version is hardcoded as '1.0.0'. It would be better to read this from the package.json file to ensure it stays in sync with the actual package version.

+const packageJson = require('../package.json');

 programCli
   .name('doc-tools')
   .description('Redpanda Document Automation CLI')
-  .version('1.0.0');
+  .version(packageJson.version);

246-254: Add error logging for each step of the GitHub fetch process

The fetch command would benefit from more informative logging during the fetch process to help troubleshoot issues.

   try {
     const fetchFromGithub = await require('../tools/fetch-from-github.js');
+    console.log(`Fetching ${options.remotePath} from ${options.owner}/${options.repo}...`);
     // options.owner, options.repo, options.remotePath, options.saveDir, options.filename
     await fetchFromGithub(
       options.owner,
       options.repo,
       options.remotePath,
       options.saveDir,
       options.filename
     );
+    console.log(`Successfully fetched to ${options.saveDir}`);
   } catch (err) {

21-29: Improve checkCommandExists to return descriptive errors

The current checkCommandExists function only prints a generic error message. It would be more useful if it returned the actual error message for better debugging.

 function checkCommandExists(command) {
   try {
     execSync(`which ${command}`, { stdio: 'ignore' });
     return true;
   } catch (error) {
-    console.error(`Error: \`${command}\` is required but not found. Please install \`${command}\` and try again.`);
+    const errorMsg = `Error: \`${command}\` is required but not found. Please install \`${command}\` and try again.`;
+    console.error(errorMsg);
+    // Return the error message along with false for better error reporting
+    return { found: false, error: errorMsg };
   }
 }

95-98: Update usage of checkCommandExists to handle potential return object

If you implement the suggestion to improve checkCommandExists, you'll need to update this usage to handle the new return value format.

 function verifyMetricsDependencies() {
   checkPython();
-  if (!checkCommandExists('curl') || !checkCommandExists('tar')) {
+  const curlCheck = checkCommandExists('curl');
+  const tarCheck = checkCommandExists('tar');
+  if ((curlCheck !== true && (!curlCheck || !curlCheck.found)) || 
+      (tarCheck !== true && (!tarCheck || !tarCheck.found))) {
     // `checkCommandExists` already prints a helpful message.
     process.exit(1);
   }

68-75: Improve C++ compiler verification

The current implementation checks for the existence of gcc or clang but doesn't verify if they're actually working correctly. Consider adding a simple compile test.

 function checkCompiler() {
   const gccInstalled = checkCommandExists('gcc');
   const clangInstalled = checkCommandExists('clang');
   if (!gccInstalled && !clangInstalled) {
     console.error('Error: A C++ compiler (such as gcc or clang) is required but not found. Please install one.');
     process.exit(1);
+  } else {
+    // Verify the compiler actually works by trying to compile a simple program
+    try {
+      const compilerCmd = gccInstalled ? 'gcc' : 'clang';
+      const testFile = path.join('/tmp', `compiler-test-${Date.now()}.cpp`);
+      fs.writeFileSync(testFile, '#include <iostream>\nint main() { return 0; }');
+      execSync(`${compilerCmd} -o /dev/null ${testFile}`, { stdio: 'ignore' });
+      fs.unlinkSync(testFile);
+    } catch (error) {
+      console.error(`Error: C++ compiler found but failed to compile a test program. Please check your compiler installation.`);
+      process.exit(1);
+    }
   }
 }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4936687 and 8dfd458.

📒 Files selected for processing (3)
  • .gitignore (1 hunks)
  • bin/doc-tools.js (1 hunks)
  • tools/fetch-from-github.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • .gitignore
🧰 Additional context used
🧬 Code Graph Analysis (1)
tools/fetch-from-github.js (1)
bin/doc-tools.js (3)
  • require (3-3)
  • require (4-4)
  • fetchFromGithub (246-246)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: Redirect rules - docs-extensions-and-macros
  • GitHub Check: Header rules - docs-extensions-and-macros
  • GitHub Check: Pages changed - docs-extensions-and-macros
🔇 Additional comments (1)
bin/doc-tools.js (1)

206-211: ⚠️ Potential issue

Remove redundant process.exit call

There are two consecutive calls to process.exit(result.status) - the second one is unreachable as the first one will terminate execution.

    if (result.error) {
      console.error('Failed to run `make build`:', result.error.message);
      process.exit(1);
    } else if (result.status !== 0) {
      console.error(`make build exited with status code ${result.status}`);
      process.exit(result.status);
    }
-    process.exit(result.status);

Likely an incorrect or invalid review comment.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (3)
tools/fetch-from-github.js (3)

8-18: Good handling of GitHub API token with appropriate warnings.

The code correctly uses the GitHub token when available and warns users about rate limits when no token is provided. This matches the previous review suggestion.


33-60: Comprehensive error handling with specific messages.

The try/catch block with specific handling for rate limits (403) and not found errors (404) is well-implemented. This addresses the previous review suggestion about improving error handling.


35-45: Effective recursive directory handling.

The implementation correctly handles both files and nested directories, maintaining the directory structure when fetching. This addresses the previous review suggestion about directory recursion.

🧹 Nitpick comments (3)
tools/fetch-from-github.js (3)

23-28: Consider adding overwrite protection for existing files.

The current implementation will overwrite files without warning if they already exist at the destination path.

 async function saveFile(content, saveDir, filename) {
   await fs.promises.mkdir(saveDir, { recursive: true });
   const target = path.join(saveDir, filename);
+  
+  // Check if file exists and handle accordingly
+  try {
+    await fs.promises.access(target, fs.constants.F_OK);
+    console.log(`Warning: Overwriting existing file at ${target}`);
+  } catch (error) {
+    // File doesn't exist, which is fine
+  }
+  
   await fs.promises.writeFile(target, content);
   console.log(`Saved: ${target}`);
 }

30-61: Consider adding support for specific branches or tags.

The current implementation fetches from the default branch, but it might be useful to specify a branch or tag.

-async function fetchFromGithub(owner, repo, remotePath, saveDir, customFilename) {
+async function fetchFromGithub(owner, repo, remotePath, saveDir, customFilename, ref = null) {
   const octokit = await loadOctokit();

   try {
-    const resp = await octokit.repos.getContent({ owner, repo, path: remotePath });
+    const params = { owner, repo, path: remotePath };
+    if (ref) {
+      params.ref = ref;
+    }
+    const resp = await octokit.repos.getContent(params);
     if (Array.isArray(resp.data)) {
       // directory
       for (const item of resp.data) {
         if (item.type === 'file') {
-          await fetchFromGithub(owner, repo, item.path, saveDir, customFilename);
+          await fetchFromGithub(owner, repo, item.path, saveDir, customFilename, ref);
         } else if (item.type === 'dir') {
           // For directories, maintain the directory structure
           const nestedDir = path.join(saveDir, path.basename(item.path));
-          await fetchFromGithub(owner, repo, item.path, nestedDir);
+          await fetchFromGithub(owner, repo, item.path, nestedDir, null, ref);
         }
       }
     } else {

63-64: Consider using ES module exports for consistency.

Since you're using ES module imports with await import, you might want to consider using ES module exports for consistency.

-module.exports = fetchFromGithub
+module.exports = fetchFromGithub;

Also, don't forget the semicolon at the end of the line.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8dfd458 and 11f1b65.

📒 Files selected for processing (1)
  • tools/fetch-from-github.js (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
tools/fetch-from-github.js (1)
bin/doc-tools.js (3)
  • require (3-3)
  • require (4-4)
  • fetchFromGithub (246-246)
🔇 Additional comments (1)
tools/fetch-from-github.js (1)

1-64: Well-implemented GitHub fetching utility that addresses previous feedback.

This utility provides a robust way to fetch files or directories from GitHub repositories with proper error handling and directory structure preservation. I can see that you've already incorporated previous review suggestions about rate limit handling and nested directory support.

The implementation uses modern Node.js practices with async/await patterns and promise-based file operations.

Copy link

@kbatuigas kbatuigas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In testing this for rpk reference docs generation, this works well, with the diff option being key for me. Thanks Jake!

@Feediver1
Copy link

Really like the summary!

Data written to /Users/joycefee/test/docs-extensions-and-macros/autogenerated/dev/properties/all_properties.txt successfully.
You have 2 properties with empty description. Percentage of errors: 0.36%. Data written in 'empty_description.txt'.
You have 46 deprecated properties. Percentage of errors: 8.19%. Data written in 'deprecated_properties.txt'.
Total properties read: 562
Total Broker properties: 26
Total Cluster properties: 355
Total Cloud properties: 102

Copy link

@Feediver1 Feediver1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worked for me!

@JakeSCahill JakeSCahill merged commit b50adcd into main May 2, 2025
5 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants