Initial set of user docs translated xliff files from Crowdin #17106

michaelDCurran · 2024-09-03T03:52:43Z

This pr adds a newly generated changes.xliff for English, which has also been uploaded to Crowdin.

This PR updates the user docs github action to upload the English changes.xliff to Crowdin if it has changed.

This pr also includes the initial set of translated user docs xliff files from Crowdin.
So far that is 20 translations of the user guide, and 7 translations of changes (what's new).

scons will see that these are newer than their markdown files, and rebuild the markdown files from these, and then build the html from the rebuilt markdown files.

coderabbitai · 2024-09-03T03:52:52Z

Walkthrough

The changes introduce a new GitHub Actions workflow for automating the synchronization of English user documentation with translation files. Additionally, a Python script is added to manage markdown translations through XLIFF files, including functionalities for generating, updating, and translating markdown content. Comprehensive unit tests for the translation module are also introduced to validate its functionality.

Changes

File(s)	Change Summary
`.github/workflows/regenerate_english_userDocs_translation_source.yml`	New workflow to update English user documentation for translation, handling Markdown and XLIFF files.
`sconstruct`	Added functionality to generate Markdown files from localized XLIFF files, excluding English.
`tests/unit/test_markdownTranslate.py`	Introduced unit tests for the `markdownTranslate` module, validating various translation functionalities.
`user_docs/markdownTranslate.py`	New script for managing markdown translations through XLIFF files, including generation and updating functions.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant GitHub Actions
    participant Python Script
    participant Crowdin

    User->>GitHub Actions: Push changes to beta branch
    GitHub Actions->>Python Script: Check modified Markdown files
    Python Script->>Python Script: Update corresponding XLIFF files
    Python Script-->>GitHub Actions: Commit changes to XLIFF
    GitHub Actions->>Crowdin: Upload updated XLIFF file

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

michaelDCurran · 2024-09-03T03:56:22Z

Err, this should be for beta.

michaelDCurran · 2024-09-03T03:58:08Z

Fixed now.

coderabbitai

Caution

Inline review comments failed to post

Actionable comments posted: 14

Outside diff range, codebase verification and nitpick comments (2)

tests/unit/test_markdownTranslate.py (1)
15-39: LGTM: Well-structured test class setup with a minor suggestion.

The test class is well-organized with appropriate setup and teardown methods. The helper method runMarkdownTranslateCommand is a good practice for running commands and handling errors.

Consider capturing and including the command output in the error message for easier debugging:
	def runMarkdownTranslateCommand(self, description: str, args: list[str]):
		try:
			result = subprocess.run([sys.executable, self.markdownTranslateScriptPath, *args], check=True, capture_output=True, text=True)
		except subprocess.CalledProcessError as e:
			message = f"Failed when trying to {description} with command: {' '.join(args)}\nOutput: {e.output}\nError: {e.stderr}"
			self.fail(message)
user_docs/markdownTranslate.py (1)
50-61: Enhance security by sanitizing filePath in getLastCommitID.

The getLastCommitID function does not sanitize the filePath argument before using it in a subprocess.run call. This poses a potential security risk for command injection. Consider adding validation or sanitization to ensure filePath is safe to use.

Location: user_docs/markdownTranslate.py, lines 50-61

Analysis chain

Enhance security in git command execution.

The getLastCommitID function uses subprocess.run to execute a git command. While it uses check=True to raise an exception on non-zero exit codes, it's important to ensure that the filePath argument is properly sanitized to prevent command injection.

Verify that filePath is properly sanitized before being used in the git command:
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check if filePath is properly sanitized before being used in git commands
rg -n 'subprocess\.run\(\["git".*\$\{?filePath' user_docs/markdownTranslate.py
Length of output: 80

Script:
#!/bin/bash
# Search for the getLastCommitID function definition and examine the usage of filePath
ast-grep --lang python --pattern $'def getLastCommitID(filePath: $_) {\n  $$$\n}' user_docs/markdownTranslate.py
Length of output: 577

Script:
#!/bin/bash
# Correct the pattern to search for the getLastCommitID function definition and examine the usage of filePath
ast-grep --lang python --pattern 'def getLastCommitID(filePath):' user_docs/markdownTranslate.py -A 10
Length of output: 104

Script:
#!/bin/bash
# Use rg to search for the getLastCommitID function definition in the file
rg -A 10 'def getLastCommitID' user_docs/markdownTranslate.py
Length of output: 480

Comments failed to post (14)

.github/workflows/regenerate_english_userDocs_translation_source.yml (4)

11-29: Consider using a requirements.txt file for dependency management.

The current setup installs the required Python packages directly in the workflow. While this works, using a requirements.txt file would be a more maintainable approach, especially if the number of dependencies grows in the future.

Consider creating a requirements.txt file in your repository with the following content:
lxml
requests
Then, modify the "Install dependencies" step as follows:
- name: Install dependencies
  run: |
    python -m pip install --upgrade pip
    pip install -r requirements.txt
This approach centralizes dependency management and makes it easier to maintain and update dependencies in the future.

30-53: Enhance error handling and logging in the XLIFF update process.

While the current implementation is functional, it could benefit from improved error handling and more detailed logging. This would make troubleshooting easier if issues arise during the XLIFF update process.

Consider the following enhancements:

Add error handling around the Python script execution:
try {
    python user_docs/markdownTranslate.py updateXliff -x $xliff -m $file -o $tempXliff
    if ($LASTEXITCODE -ne 0) {
        throw "Python script failed with exit code $LASTEXITCODE"
    }
} catch {
    Write-Error "Failed to update XLIFF file: $_"
    exit 1
}
Add more detailed logging:
Write-Host "Starting XLIFF update process for $file"
# ... existing code ...
Write-Host "XLIFF update process completed successfully for $file"
Consider adding a summary at the end of the process:
$updatedFiles = @()
# ... in the foreach loop ...
$updatedFiles += $xliff
# ... after the loop ...
Write-Host "XLIFF update process completed. Updated files: $($updatedFiles -join ', ')"
These changes will provide more visibility into the process and make it easier to identify and resolve any issues that may occur.

55-83: Improve security handling of SSH key.

The current implementation writes the SSH private key to a file, which could potentially be a security risk if the runner is compromised.

Consider using the ssh-agent to manage the SSH key more securely. Here's a suggested improvement:
- name: Set up SSH key
  env:
    SSH_PRIVATE_KEY: ${{ secrets.XLIFF_DEPLOY_PRIVATE_KEY }}
  run: |
    mkdir -p ~/.ssh
    ssh-keyscan github.com >> ~/.ssh/known_hosts
    ssh-agent -a $SSH_AUTH_SOCK > /dev/null
    echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add -
  env:
    SSH_AUTH_SOCK: /tmp/ssh_agent.sock

- name: Commit and Push changes
  env:
    GIT_SSH_COMMAND: "ssh -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa"
  run: |
    # ... rest of your existing script ...
This approach uses ssh-agent to manage the key in memory, reducing the risk of the key being exposed on disk.

Additionally, consider using a GitHub Action like webfactory/ssh-agent which handles SSH key setup securely:
- uses: webfactory/[email protected]
  with:
    ssh-private-key: ${{ secrets.XLIFF_DEPLOY_PRIVATE_KEY }}
This would replace the manual SSH key setup and provide a more secure, tested solution.

85-103: Enhance Crowdin upload process.

While the current implementation works, it could benefit from improved error handling and more flexibility.

Consider the following improvements:

Add error handling for the Python script execution:
try {
    python appVeyor/crowdinSync.py uploadSourceFile 18 user_docs/en/userguide.xliff
    if ($LASTEXITCODE -ne 0) {
        throw "Crowdin upload failed with exit code $LASTEXITCODE"
    }
    Write-Host "Successfully uploaded userGuide.xliff to Crowdin"
} catch {
    Write-Error "Failed to upload to Crowdin: $_"
    exit 1
}
Make the file ID configurable:

Instead of hardcoding the file ID (18), consider storing it in a GitHub secret or variable:
env:
  crowdinProjectID: ${{ vars.CROWDIN_PROJECT_ID }}
  crowdinAuthToken: ${{ secrets.CROWDIN_AUTH_TOKEN }}
  crowdinUserGuideFileID: ${{ vars.CROWDIN_USERGUIDE_FILE_ID }}
Then use this variable in your script:
python appVeyor/crowdinSync.py uploadSourceFile $env:crowdinUserGuideFileID user_docs/en/userguide.xliff
Consider adding a mechanism to upload other XLIFF files if needed in the future:
$xliffFiles = @{
    "userGuide.xliff" = $env:crowdinUserGuideFileID
    # Add more files here as needed
}

foreach ($file in $xliffFiles.Keys) {
    $changed = git diff --name-only ${{GITHUB.SHA}}.. -- "user_docs/en/$file"
    if ($changed) {
        Write-Host "Uploading $file to Crowdin"
        python appVeyor/crowdinSync.py uploadSourceFile $xliffFiles[$file] "user_docs/en/$file"
    } else {
        Write-Host "Not uploading $file to Crowdin as it has not changed"
    }
}
These changes will make the Crowdin upload process more robust, flexible, and easier to maintain as the project grows.

tests/unit/test_markdownTranslate.py (1)

41-135: LGTM: Comprehensive test method with suggestions for improvement.

The test_markdownTranslate method covers various functionalities of the markdownTranslate script, including generating XLIFF files, regenerating markdown files, updating XLIFF files, and translating to French. Each step is followed by a verification, which is a good practice.

Consider the following improvements for better readability and maintainability:

Break down the large test method into smaller, focused test methods. This will make it easier to identify which specific functionality fails if a test doesn't pass.

Use parameterized tests to reduce code duplication for similar test cases.

Create helper methods for common operations, such as file path creation.

Here's an example of how you could refactor a part of the test:
import unittest
from parameterized import parameterized

class TestMarkdownTranslate(unittest.TestCase):
    # ... (existing setup code) ...

    def _get_file_path(self, filename):
        return os.path.join(self.outDir.name if filename.startswith("rebuilt_") else self.testDir, filename)

    @parameterized.expand([
        ("2024.2", "en_2024.2_userGuide"),
        ("2024.3beta6", "en_2024.3beta6_userGuide"),
    ])
    def test_generate_and_verify_markdown(self, version, file_prefix):
        xliff_file = f"{file_prefix}.xliff"
        md_file = f"{file_prefix}.md"
        rebuilt_md_file = f"rebuilt_{md_file}"

        self.runMarkdownTranslateCommand(
            f"Generate an xliff file from the English {version} user guide markdown file",
            ["generateXliff", "-m", self._get_file_path(md_file), "-o", self._get_file_path(xliff_file)],
        )

        self.runMarkdownTranslateCommand(
            f"Regenerate the {version} markdown file from the generated {version} xliff file",
            ["generateMarkdown", "-x", self._get_file_path(xliff_file), "-o", self._get_file_path(rebuilt_md_file), "-u"],
        )

        self.runMarkdownTranslateCommand(
            f"Ensure the regenerated {version} markdown file matches the original {version} markdown file",
            ["ensureMarkdownFilesMatch", self._get_file_path(rebuilt_md_file), self._get_file_path(md_file)],
        )

    # ... (other test methods) ...
This refactoring improves readability, reduces duplication, and makes it easier to add new test cases in the future.

user_docs/markdownTranslate.py (9)

544-649: Consider organizing CLI setup into a separate function.

The command-line interface setup is quite lengthy and could benefit from being organized into a separate function for better readability and maintainability.

Consider refactoring the CLI setup into a separate function:

+def setup_cli():
+    mainParser = argparse.ArgumentParser()
+    commandParser = mainParser.add_subparsers(title="commands", dest="command", required=True)
+    
+    # Setup for generateXliff
+    generateXliffParser = commandParser.add_parser("generateXliff")
+    generateXliffParser.add_argument(
+        "-m",
+        "--markdown",
+        dest="md",
+        type=str,
+        required=True,
+        help="The markdown file to generate the xliff file for",
+    )
+    generateXliffParser.add_argument(
+        "-o", "--output", dest="output", type=str, required=True, help="The file to output the xliff file to"
+    )
+    
+    # Setup for other commands...
+    
+    return mainParser

 if __name__ == "__main__":
-    mainParser = argparse.ArgumentParser()
-    commandParser = mainParser.add_subparsers(title="commands", dest="command", required=True)
-    # ... (rest of the CLI setup)
+    parser = setup_cli()
+    args = parser.parse_args()
     
-    args = mainParser.parse_args()
     match args.command:
         case "generateXliff":
             generateXliff(mdPath=args.md, outputPath=args.output)
         # ... (rest of the command handling)

Committable suggestion was skipped due to low confidence.

38-47: Consider adding error handling for file operations.

The createAndDeleteTempFilePath_contextManager function is well-implemented as a context manager. However, it might be beneficial to add error handling for file creation and deletion operations.

Consider wrapping the file operations in try-except blocks to handle potential IOErrors:

 @contextlib.contextmanager
 def createAndDeleteTempFilePath_contextManager(
 	dir: str | None = None, prefix: str | None = None, suffix: str | None = None
 ) -> Generator[str, None, None]:
 	"""A context manager that creates a temporary file and deletes it when the context is exited"""
 	with tempfile.NamedTemporaryFile(dir=dir, prefix=prefix, suffix=suffix, delete=False) as tempFile:
 		tempFilePath = tempFile.name
 		tempFile.close()
 		yield tempFilePath
+		try:
 			os.remove(tempFilePath)
+		except OSError as e:
+			print(f"Error deleting temporary file {tempFilePath}: {e}")

Committable suggestion was skipped due to low confidence.

397-442: Consider adding a progress bar for better user feedback.

The generateMarkdown function processes potentially large files. Adding a progress bar could provide better feedback to the user.

Consider using the tqdm library to add a progress bar:

+from tqdm import tqdm
+
 def generateMarkdown(xliffPath: str, outputPath: str, translated: bool = True) -> Result_generateMarkdown:
 	print(f"Generating markdown file {prettyPathString(outputPath)} from {prettyPathString(xliffPath)}...")
 	res = Result_generateMarkdown()
 	with contextlib.ExitStack() as stack:
 		outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline=""))
 		xliff = lxml.etree.parse(xliffPath)
 		xliffRoot = xliff.getroot()
 		namespace = {"xliff": "urn:oasis:names:tc:xliff:document:2.0"}
 		if xliffRoot.tag != "{urn:oasis:names:tc:xliff:document:2.0}xliff":
 			raise ValueError("Not an xliff file")
 		skeletonNode = xliffRoot.find("./xliff:file/xliff:skeleton", namespaces=namespace)
 		if skeletonNode is None:
 			raise ValueError("No skeleton found in xliff file")
 		skeletonContent = xmlUnescape(skeletonNode.text).strip()
+		total_lines = len(skeletonContent.splitlines())
+		pbar = tqdm(total=total_lines, desc="Generating Markdown")
 		for line in skeletonContent.splitlines(keepends=True):
 			res.numTotalLines += 1
 			if m := re_translationID.match(line):
 				prefix, ID, suffix = m.groups()
 				res.numTranslatableStrings += 1
 				unit = xliffRoot.find(f'./xliff:file/xliff:unit[@id="{ID}"]', namespaces=namespace)
 				if unit is not None:
 					segment = unit.find("./xliff:segment", namespaces=namespace)
 					if segment is not None:
 						source = segment.find("./xliff:source", namespaces=namespace)
 						if translated:
 							target = segment.find("./xliff:target", namespaces=namespace)
 						else:
 							target = None
 						if target is not None and target.text:
 							res.numTranslatedStrings += 1
 							translation = xmlUnescape(target.text)
 						elif source is not None and source.text:
 							translation = xmlUnescape(source.text)
 						else:
 							raise ValueError(f"No source or target found for unit {ID}")
 					else:
 						raise ValueError(f"No segment found for unit {ID}")
 				else:
 					raise ValueError(f"Cannot locate Unit {ID} in xliff file")
 				outputFile.write(f"{prefix}{translation}{suffix}\n")
 			else:
 				outputFile.write(line)
+			pbar.update(1)
+		pbar.close()
 		print(
 			f"Generated markdown file with {res.numTotalLines} total lines, {res.numTranslatableStrings} translatable strings, and {res.numTranslatedStrings} translated strings"
 		)
 		return res

Committable suggestion was skipped due to low confidence.

75-80: Consider adding error handling for file path operations.

The getRawGithubURLForPath function combines multiple operations. It might be beneficial to add error handling for cases where the file path is invalid or not within the git repository.

Consider adding a try-except block to handle potential OSError or ValueError exceptions:

 def getRawGithubURLForPath(filePath: str) -> str:
-	gitDirPath = getGitDir()
-	commitID = getLastCommitID(filePath)
-	relativePath = os.path.relpath(os.path.abspath(filePath), gitDirPath)
-	relativePath = relativePath.replace("\\", "/")
-	return f"{RAW_GITHUB_REPO_URL}/{commitID}/{relativePath}"
+	try:
+		gitDirPath = getGitDir()
+		commitID = getLastCommitID(filePath)
+		relativePath = os.path.relpath(os.path.abspath(filePath), gitDirPath)
+		relativePath = relativePath.replace("\\", "/")
+		return f"{RAW_GITHUB_REPO_URL}/{commitID}/{relativePath}"
+	except (OSError, ValueError) as e:
+		raise ValueError(f"Error generating GitHub URL for {filePath}: {e}")

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def getRawGithubURLForPath(filePath: str) -> str:
	try:
		gitDirPath = getGitDir()
		commitID = getLastCommitID(filePath)
		relativePath = os.path.relpath(os.path.abspath(filePath), gitDirPath)
		relativePath = relativePath.replace("\\", "/")
		return f"{RAW_GITHUB_REPO_URL}/{commitID}/{relativePath}"
	except (OSError, ValueError) as e:
		raise ValueError(f"Error generating GitHub URL for {filePath}: {e}")

650-673: Consider adding global error handling and logging.

The main execution block could benefit from a global try-except block to catch and log any unexpected errors that might occur during command execution.

Consider adding a global try-except block and logging:

+import logging
+
+logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
+
 if __name__ == "__main__":
-    match args.command:
-        case "generateXliff":
-            generateXliff(mdPath=args.md, outputPath=args.output)
-        case "updateXliff":
-            updateXliff(
-                xliffPath=args.xliff,
-                mdPath=args.md,
-                outputPath=args.output,
-            )
-        case "generateMarkdown":
-            generateMarkdown(xliffPath=args.xliff, outputPath=args.output, translated=args.translated)
-        case "translateXliff":
-            translateXliff(
-                xliffPath=args.xliff,
-                lang=args.lang,
-                pretranslatedMdPath=args.pretranslatedMd,
-                outputPath=args.output,
-            )
-        case "pretranslateLangs":
-            pretranslateAllPossibleLanguages(langsDir=args.langsDir, mdBaseName=args.mdBaseName)
-        case "ensureMarkdownFilesMatch":
-            ensureMarkdownFilesMatch(path1=args.path1, path2=args.path2)
-        case _:
-            raise ValueError(f"Unknown command: {args.command}")
+    try:
+        match args.command:
+            case "generateXliff":
+                generateXliff(mdPath=args.md, outputPath=args.output)
+            case "updateXliff":
+                updateXliff(
+                    xliffPath=args.xliff,
+                    mdPath=args.md,
+                    outputPath=args.output,
+                )
+            case "generateMarkdown":
+                generateMarkdown(xliffPath=args.xliff, outputPath=args.output, translated=args.translated)
+            case "translateXliff":
+                translateXliff(
+                    xliffPath=args.xliff,
+                    lang=args.lang,
+                    pretranslatedMdPath=args.pretranslatedMd,
+                    outputPath=args.output,
+                )
+            case "pretranslateLangs":
+                pretranslateAllPossibleLanguages(langsDir=args.langsDir, mdBaseName=args.mdBaseName)
+            case "ensureMarkdownFilesMatch":
+                ensureMarkdownFilesMatch(path1=args.path1, path2=args.path2)
+            case _:
+                raise ValueError(f"Unknown command: {args.command}")
+    except Exception as e:
+        logging.error(f"An error occurred: {e}")
+        raise

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

	import logging

	logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

	if __name__ == "__main__":
		try:
			match args.command:
				case "generateXliff":
					generateXliff(mdPath=args.md, outputPath=args.output)
				case "updateXliff":
					updateXliff(
						xliffPath=args.xliff,
						mdPath=args.md,
						outputPath=args.output,
					)
				case "generateMarkdown":
					generateMarkdown(xliffPath=args.xliff, outputPath=args.output, translated=args.translated)
				case "translateXliff":
					translateXliff(
						xliffPath=args.xliff,
						lang=args.lang,
						pretranslatedMdPath=args.pretranslatedMd,
						outputPath=args.output,
					)
				case "pretranslateLangs":
					pretranslateAllPossibleLanguages(langsDir=args.langsDir, mdBaseName=args.mdBaseName)
				case "ensureMarkdownFilesMatch":
					ensureMarkdownFilesMatch(path1=args.path1, path2=args.path2)
				case _:
					raise ValueError(f"Unknown command: {args.command}")
		except Exception as e:
			logging.error(f"An error occurred: {e}")
			raise

117-135: Consider adding more detailed logging.

The generateSkeleton function could benefit from more detailed logging, especially for larger files where the process might take some time.

Consider adding more granular logging:

 def generateSkeleton(mdPath: str, outputPath: str) -> Result_generateSkeleton:
 	print(f"Generating skeleton file {prettyPathString(outputPath)} from {prettyPathString(mdPath)}...")
 	res = Result_generateSkeleton()
 	with (
 		open(mdPath, "r", encoding="utf8") as mdFile,
 		open(outputPath, "w", encoding="utf8", newline="") as outputFile,
 	):
+		total_lines = sum(1 for _ in mdFile)
+		mdFile.seek(0)
 		for mdLine in mdFile.readlines():
 			res.numTotalLines += 1
 			skelLine = skeletonizeLine(mdLine)
 			if skelLine:
 				res.numTranslationPlaceholders += 1
 			else:
 				skelLine = mdLine
 			outputFile.write(skelLine)
+			if res.numTotalLines % 1000 == 0:
+				print(f"Processed {res.numTotalLines}/{total_lines} lines...")
 		print(
 			f"Generated skeleton file with {res.numTotalLines} total lines and {res.numTranslationPlaceholders} translation placeholders"
 		)
 		return res

Committable suggestion was skipped due to low confidence.

165-209: Consider breaking down the updateSkeleton function.

The updateSkeleton function is quite long and complex. Consider breaking it down into smaller, more manageable functions to improve readability and maintainability.

Consider extracting the main loop into a separate function:

 def updateSkeleton(
 	origMdPath: str, newMdPath: str, origSkelPath: str, outputPath: str
 ) -> Result_updateSkeleton:
 	print(
 		f"Creating updated skeleton file {prettyPathString(outputPath)} from {prettyPathString(origSkelPath)} with changes from {prettyPathString(origMdPath)} to {prettyPathString(newMdPath)}..."
 	)
 	res = Result_updateSkeleton()
 	with contextlib.ExitStack() as stack:
 		origMdFile = stack.enter_context(open(origMdPath, "r", encoding="utf8"))
 		newMdFile = stack.enter_context(open(newMdPath, "r", encoding="utf8"))
 		origSkelFile = stack.enter_context(open(origSkelPath, "r", encoding="utf8"))
 		outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline=""))
 		mdDiff = difflib.ndiff(origMdFile.readlines(), newMdFile.readlines())
 		origSkelLines = iter(origSkelFile.readlines())
+		res = process_diff_lines(mdDiff, origSkelLines, outputFile)
+		print(
+			f"Updated skeleton file with {res.numAddedLines} added lines "
+			f"({res.numAddedTranslationPlaceholders} translation placeholders), "
+			f"{res.numRemovedLines} removed lines ({res.numRemovedTranslationPlaceholders} translation placeholders), "
+			f"and {res.numUnchangedLines} unchanged lines ({res.numUnchangedTranslationPlaceholders} translation placeholders)"
+		)
+		return res
+
+def process_diff_lines(mdDiff, origSkelLines, outputFile) -> Result_updateSkeleton:
+	res = Result_updateSkeleton()
 		for mdDiffLine in mdDiff:
 			if mdDiffLine.startswith("?"):
 				continue
 			if mdDiffLine.startswith(" "):
 				res.numUnchangedLines += 1
 				skelLine = next(origSkelLines)
 				if re_translationID.match(skelLine):
 					res.numUnchangedTranslationPlaceholders += 1
 				outputFile.write(skelLine)
 			elif mdDiffLine.startswith("+"):
 				res.numAddedLines += 1
 				skelLine = skeletonizeLine(mdDiffLine[2:])
 				if skelLine:
 					res.numAddedTranslationPlaceholders += 1
 				else:
 					skelLine = mdDiffLine[2:]
 				outputFile.write(skelLine)
 			elif mdDiffLine.startswith("-"):
 				res.numRemovedLines += 1
 				origSkelLine = next(origSkelLines)
 				if re_translationID.match(origSkelLine):
 					res.numRemovedTranslationPlaceholders += 1
 			else:
 				raise ValueError(f"Unexpected diff line: {mdDiffLine}")
-		print(
-			f"Updated skeleton file with {res.numAddedLines} added lines "
-			f"({res.numAddedTranslationPlaceholders} translation placeholders), "
-			f"{res.numRemovedLines} removed lines ({res.numRemovedTranslationPlaceholders} translation placeholders), "
-			f"and {res.numUnchangedLines} unchanged lines ({res.numUnchangedTranslationPlaceholders} translation placeholders)"
-		)
 		return res

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def updateSkeleton(
	origMdPath: str, newMdPath: str, origSkelPath: str, outputPath: str
) -> Result_updateSkeleton:
	print(
		f"Creating updated skeleton file {prettyPathString(outputPath)} from {prettyPathString(origSkelPath)} with changes from {prettyPathString(origMdPath)} to {prettyPathString(newMdPath)}..."
	)
	res = Result_updateSkeleton()
	with contextlib.ExitStack() as stack:
		origMdFile = stack.enter_context(open(origMdPath, "r", encoding="utf8"))
		newMdFile = stack.enter_context(open(newMdPath, "r", encoding="utf8"))
		origSkelFile = stack.enter_context(open(origSkelPath, "r", encoding="utf8"))
		outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline=""))
		mdDiff = difflib.ndiff(origMdFile.readlines(), newMdFile.readlines())
		origSkelLines = iter(origSkelFile.readlines())
		res = process_diff_lines(mdDiff, origSkelLines, outputFile)
		print(
			f"Updated skeleton file with {res.numAddedLines} added lines "
			f"({res.numAddedTranslationPlaceholders} translation placeholders), "
			f"{res.numRemovedLines} removed lines ({res.numRemovedTranslationPlaceholders} translation placeholders), "
			f"and {res.numUnchangedLines} unchanged lines ({res.numUnchangedTranslationPlaceholders} translation placeholders)"
		)
		return res

def process_diff_lines(mdDiff, origSkelLines, outputFile) -> Result_updateSkeleton:
	res = Result_updateSkeleton()
	for mdDiffLine in mdDiff:
		if mdDiffLine.startswith("?"):
			continue
		if mdDiffLine.startswith(" "):
			res.numUnchangedLines += 1
			skelLine = next(origSkelLines)
			if re_translationID.match(skelLine):
				res.numUnchangedTranslationPlaceholders += 1
			outputFile.write(skelLine)
		elif mdDiffLine.startswith("+"):
			res.numAddedLines += 1
			skelLine = skeletonizeLine(mdDiffLine[2:])
			if skelLine:
				res.numAddedTranslationPlaceholders += 1
			else:
				skelLine = mdDiffLine[2:]
			outputFile.write(skelLine)
		elif mdDiffLine.startswith("-"):
			res.numRemovedLines += 1
			origSkelLine = next(origSkelLines)
			if re_translationID.match(origSkelLine):
				res.numRemovedTranslationPlaceholders += 1
		else:
			raise ValueError(f"Unexpected diff line: {mdDiffLine}")
	return res

217-326: Consider adding progress logging for long-running operations.

The generateXliff function might benefit from progress logging, especially when processing large files.

Consider adding progress logging:

 def generateXliff(
 	mdPath: str,
 	outputPath: str,
 	skelPath: str | None = None,
 ) -> Result_generateXliff:
 	# If a skeleton file is not provided, first generate one
 	with contextlib.ExitStack() as stack:
 		if not skelPath:
 			skelPath = stack.enter_context(
 				createAndDeleteTempFilePath_contextManager(
 					dir=os.path.dirname(outputPath),
 					prefix=os.path.basename(mdPath),
 					suffix=".skel",
 				)
 			)
 			generateSkeleton(mdPath=mdPath, outputPath=skelPath)
 		with open(skelPath, "r", encoding="utf8") as skelFile:
 			skelContent = skelFile.read()
 	res = Result_generateXliff()
 	print(
 		f"Generating xliff file {prettyPathString(outputPath)} from {prettyPathString(mdPath)} and {prettyPathString(skelPath)}..."
 	)
 	with contextlib.ExitStack() as stack:
 		mdFile = stack.enter_context(open(mdPath, "r", encoding="utf8"))
 		outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline=""))
 		fileID = os.path.basename(mdPath)
 		mdUri = getRawGithubURLForPath(mdPath)
 		print(f"Including Github raw URL: {mdUri}")
 		outputFile.write(
 			'<?xml version="1.0"?>\n'
 			f'<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en">\n'
 			f'<file id="{fileID}" original="{mdUri}">\n'
 		)
 		outputFile.write(f"<skeleton>\n{xmlEscape(skelContent)}\n</skeleton>\n")
 		res.numTranslatableStrings = 0
+		total_lines = sum(1 for _ in mdFile)
+		mdFile.seek(0)
 		for lineNo, (mdLine, skelLine) in enumerate(
 			zip_longest(mdFile.readlines(), skelContent.splitlines(keepends=True)), start=1
 		):
 			mdLine = mdLine.rstrip()
 			skelLine = skelLine.rstrip()
 			if m := re_translationID.match(skelLine):
 				res.numTranslatableStrings += 1
 				prefix, ID, suffix = m.groups()
 				if prefix and not mdLine.startswith(prefix):
 					raise ValueError(f'Line {lineNo}: does not start with "{prefix}", {mdLine=}, {skelLine=}')
 				if suffix and not mdLine.endswith(suffix):
 					raise ValueError(f'Line {lineNo}: does not end with "{suffix}", {mdLine=}, {skelLine=}')
 				source = mdLine[len(prefix) : len(mdLine) - len(suffix)]
 				outputFile.write(
 					f'<unit id="{ID}">\n' "<notes>\n" f'<note appliesTo="source">line: {lineNo + 1}</note>\n'
 				)
 				if prefix:
 					outputFile.write(f'<note appliesTo="source">prefix: {xmlEscape(prefix)}</note>\n')
 				if suffix:
 					outputFile.write(f'<note appliesTo="source">suffix: {xmlEscape(suffix)}</note>\n')
 				outputFile.write(
 					"</notes>\n"
 					f"<segment>\n"
 					f"<source>{xmlEscape(source)}</source>\n"
 					"</segment>\n"
 					"</unit>\n"
 				)
 			else:
 				if mdLine != skelLine:
 					raise ValueError(f"Line {lineNo}: {mdLine=} does not match {skelLine=}")
+			if lineNo % 1000 == 0:
+				print(f"Processed {lineNo}/{total_lines} lines...")
 		outputFile.write("</file>\n" "</xliff>")
 		print(f"Generated xliff file with {res.numTranslatableStrings} translatable strings")
 		return res

Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

def generateXliff(
	mdPath: str,
	outputPath: str,
	skelPath: str | None = None,
) -> Result_generateXliff:
	# If a skeleton file is not provided, first generate one
	with contextlib.ExitStack() as stack:
		if not skelPath:
			skelPath = stack.enter_context(
				createAndDeleteTempFilePath_contextManager(
					dir=os.path.dirname(outputPath),
					prefix=os.path.basename(mdPath),
					suffix=".skel",
				)
			)
			generateSkeleton(mdPath=mdPath, outputPath=skelPath)
		with open(skelPath, "r", encoding="utf8") as skelFile:
			skelContent = skelFile.read()
	res = Result_generateXliff()
	print(
		f"Generating xliff file {prettyPathString(outputPath)} from {prettyPathString(mdPath)} and {prettyPathString(skelPath)}..."
	)
	with contextlib.ExitStack() as stack:
		mdFile = stack.enter_context(open(mdPath, "r", encoding="utf8"))
		outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline=""))
		fileID = os.path.basename(mdPath)
		mdUri = getRawGithubURLForPath(mdPath)
		print(f"Including Github raw URL: {mdUri}")
		outputFile.write(
			'<?xml version="1.0"?>\n'
			f'<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en">\n'
			f'<file id="{fileID}" original="{mdUri}">\n'
		)
		outputFile.write(f"<skeleton>\n{xmlEscape(skelContent)}\n</skeleton>\n")
		res.numTranslatableStrings = 0
		total_lines = sum(1 for _ in mdFile)
		mdFile.seek(0)
		for lineNo, (mdLine, skelLine) in enumerate(
			zip_longest(mdFile.readlines(), skelContent.splitlines(keepends=True)), start=1
		):
			mdLine = mdLine.rstrip()
			skelLine = skelLine.rstrip()
			if m := re_translationID.match(skelLine):
				res.numTranslatableStrings += 1
				prefix, ID, suffix = m.groups()
				if prefix and not mdLine.startswith(prefix):
					raise ValueError(f'Line {lineNo}: does not start with "{prefix}", {mdLine=}, {skelLine=}')
				if suffix and not mdLine.endswith(suffix):
					raise ValueError(f'Line {lineNo}: does not end with "{suffix}", {mdLine=}, {skelLine=}')
				source = mdLine[len(prefix) : len(mdLine) - len(suffix)]
				outputFile.write(
					f'<unit id="{ID}">\n' "<notes>\n" f'<note appliesTo="source">line: {lineNo + 1}</note>\n'
				)
				if prefix:
					outputFile.write(f'<note appliesTo="source">prefix: {xmlEscape(prefix)}</note>\n')
				if suffix:
					outputFile.write(f'<note appliesTo="source">suffix: {xmlEscape(suffix)}</note>\n')
				outputFile.write(
					"</notes>\n"
					f"<segment>\n"
					f"<source>{xmlEscape(source)}</source>\n"
					"</segment>\n"
					"</unit>\n"
				)
			else:
				if mdLine != skelLine:
					raise ValueError(f"Line {lineNo}: {mdLine=} does not match {skelLine=}")
			if lineNo % 1000 == 0:
				print(f"Processed {lineNo}/{total_lines} lines...")
		outputFile.write("</file>\n" "</xliff>")
		print(f"Generated xliff file with {res.numTranslatableStrings} translatable strings")
		return res

328-387: Consider adding input validation for the lang parameter.

In the translateXliff function, it might be beneficial to add input validation for the lang parameter to ensure it's a valid language code.

Consider adding a validation step for the lang parameter:

 def translateXliff(
 	xliffPath: str,
 	lang: str,
 	pretranslatedMdPath: str,
 	outputPath: str,
 	allowBadAnchors: bool = False,
 ) -> Result_translateXliff:
+	# Validate lang parameter
+	if not re.match(r'^[a-z]{2,3}(-[A-Z]{2,3})?$', lang):
+		raise ValueError(f"Invalid language code: {lang}")
 	print(
 		f"Creating {lang} translated xliff file {prettyPathString(outputPath)} from {prettyPathString(xliffPath)} using {prettyPathString(pretranslatedMdPath)}..."
 	)
 	res = Result_translateXliff()
 	# ... rest of the function ...

Committable suggestion was skipped due to low confidence.

…t has changed.

user_docs/en/changes.md

zstanecic · 2024-09-03T07:21:29Z

Hi, As one of the translators, let's give my one penny to this discussion *smiles* Can we remove changes for developers altogether from the changes file? In the past, i haven't translated it for my languages, because developer changes should be just for developers, and developers know english to be able to read and have the access to the developer changes. So, i always croslinked to english.

michaelDCurran · 2024-09-03T07:29:57Z

@zstanecic For now at least, we are going to keep the changes file as is. However, now that it is on Crowdin, translators can choose to simply not translate those strings if they wish.
We may revisit the structure of the changes file at some point, but only after we have completed the move to Crowdin.
I will have more to say on the translators list once this pr is merged.

wmhn1872265132 · 2024-09-03T07:39:44Z

Please note that the .xliff file is now included into the installer, should this file be excluded?

michaelDCurran · 2024-09-03T07:49:52Z

@wmhn1872265132 thanks for catching this. I've excluded xliff files now.

cary-rowen · 2024-09-03T08:16:11Z

Hi @michaelDCurran

Just wanted to ask:
Will we still have a way to directly edit the markdown version of the user guide in the future?

Since we plan to make extensive revisions to the Simplified Chinese version of the User Guide in the future, I would like to see a way to directly edit the Markdown version of the User Guide instead of a po file.

For large-scale changes, we may prefer to use a text editor to edit the markdown file.

Also want a script that uses structure comparison to ensure that the structure is not broken.

Thanks

michaelDCurran · 2024-09-03T08:46:37Z

@cary-rowen I'm sorry, but going forward documentation such as the user guide and changes files can only be translated on Crowdin, either through its interface, or using poedit to translate the xliff files.
If there are specific sections required by translations but is not yet in the English source, we can consider adding a placeholder in the English version.
I will have more to say on the translators list in the coming days once I have merged this pr.

cary-rowen · 2024-09-03T09:00:49Z

Thanks Mic,

Will NV Access open source the script for converting markdown in the future? This allows us to edit markdown to xliff in a highly customizable way

Best,
Cary

hwf1324 · 2024-09-03T09:07:44Z

Or can I use a project like https://github.com/cataria-rocks/md2xliff to do so?

Initial set of user docs translated xliff files from Crowdin

12f080d

michaelDCurran requested a review from a team as a code owner September 3, 2024 03:52

michaelDCurran requested a review from SaschaCowley September 3, 2024 03:52

seanbudd changed the base branch from master to beta September 3, 2024 03:54

seanbudd approved these changes Sep 3, 2024

View reviewed changes

seanbudd added this to the 2024.4 milestone Sep 3, 2024

coderabbitai bot reviewed Sep 3, 2024

View reviewed changes

michaelDCurran and others added 7 commits September 3, 2024 15:13

Soe more translated xliff files.

1bcca7c

Add initial English changes.xliff file generated from markdown.

a89e067

Add initial translated xliff files for changes.md

c650f5f

User docs github action: upload English changes.xliff to Crowdin if i…

abba045

…t has changed.

user docs github action: temporarily run for this branch

df23f88

English changes file: add test string

34acf09

Update user_docs/en/changes.xliff

e2b3dc3

seanbudd reviewed Sep 3, 2024

View reviewed changes

user_docs/en/changes.md Outdated Show resolved Hide resolved

michaelDCurran and others added 2 commits September 3, 2024 17:19

English changes file: remove test string

779e13f

Update user_docs/en/changes.xliff

5207433

User docs github action: remove rule for this branch

ea67689

michaelDCurran requested a review from seanbudd September 3, 2024 07:33

setup.py: exclude xliff files.

fbfddb0

seanbudd approved these changes Sep 3, 2024

View reviewed changes

michaelDCurran merged commit 51cd079 into beta Sep 3, 2024
4 checks passed

michaelDCurran deleted the updatetranslations branch September 3, 2024 10:05

cary-rowen mentioned this pull request Sep 5, 2024

Many Challenges Faced with Translating NVDA Documentation #17118

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial set of user docs translated xliff files from Crowdin #17106

Initial set of user docs translated xliff files from Crowdin #17106

michaelDCurran commented Sep 3, 2024 •

edited

Loading

coderabbitai bot commented Sep 3, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

michaelDCurran commented Sep 3, 2024

michaelDCurran commented Sep 3, 2024

coderabbitai bot left a comment

zstanecic commented Sep 3, 2024 via email

michaelDCurran commented Sep 3, 2024

wmhn1872265132 commented Sep 3, 2024

michaelDCurran commented Sep 3, 2024

cary-rowen commented Sep 3, 2024

michaelDCurran commented Sep 3, 2024

cary-rowen commented Sep 3, 2024

hwf1324 commented Sep 3, 2024

Initial set of user docs translated xliff files from Crowdin #17106

Initial set of user docs translated xliff files from Crowdin #17106

Conversation

michaelDCurran commented Sep 3, 2024 • edited Loading

coderabbitai bot commented Sep 3, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

michaelDCurran commented Sep 3, 2024

michaelDCurran commented Sep 3, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

zstanecic commented Sep 3, 2024 via email

michaelDCurran commented Sep 3, 2024

wmhn1872265132 commented Sep 3, 2024

michaelDCurran commented Sep 3, 2024

cary-rowen commented Sep 3, 2024

michaelDCurran commented Sep 3, 2024

cary-rowen commented Sep 3, 2024

hwf1324 commented Sep 3, 2024

michaelDCurran commented Sep 3, 2024 •

edited

Loading

coderabbitai bot commented Sep 3, 2024 •

edited

Loading