-
-
Notifications
You must be signed in to change notification settings - Fork 643
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial set of user docs translated xliff files from Crowdin #17106
Conversation
WalkthroughThe changes introduce a new GitHub Actions workflow for automating the synchronization of English user documentation with translation files. Additionally, a Python script is added to manage markdown translations through XLIFF files, including functionalities for generating, updating, and translating markdown content. Comprehensive unit tests for the translation module are also introduced to validate its functionality. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant GitHub Actions
participant Python Script
participant Crowdin
User->>GitHub Actions: Push changes to beta branch
GitHub Actions->>Python Script: Check modified Markdown files
Python Script->>Python Script: Update corresponding XLIFF files
Python Script-->>GitHub Actions: Commit changes to XLIFF
GitHub Actions->>Crowdin: Upload updated XLIFF file
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
Err, this should be for beta. |
Fixed now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post
Actionable comments posted: 14
Outside diff range, codebase verification and nitpick comments (2)
tests/unit/test_markdownTranslate.py (1)
15-39
: LGTM: Well-structured test class setup with a minor suggestion.The test class is well-organized with appropriate setup and teardown methods. The helper method
runMarkdownTranslateCommand
is a good practice for running commands and handling errors.Consider capturing and including the command output in the error message for easier debugging:
def runMarkdownTranslateCommand(self, description: str, args: list[str]): try: result = subprocess.run([sys.executable, self.markdownTranslateScriptPath, *args], check=True, capture_output=True, text=True) except subprocess.CalledProcessError as e: message = f"Failed when trying to {description} with command: {' '.join(args)}\nOutput: {e.output}\nError: {e.stderr}" self.fail(message)
user_docs/markdownTranslate.py (1)
50-61
: Enhance security by sanitizingfilePath
ingetLastCommitID
.The
getLastCommitID
function does not sanitize thefilePath
argument before using it in asubprocess.run
call. This poses a potential security risk for command injection. Consider adding validation or sanitization to ensurefilePath
is safe to use.
- Location:
user_docs/markdownTranslate.py
, lines 50-61Analysis chain
Enhance security in git command execution.
The
getLastCommitID
function usessubprocess.run
to execute a git command. While it usescheck=True
to raise an exception on non-zero exit codes, it's important to ensure that thefilePath
argument is properly sanitized to prevent command injection.Verify that
filePath
is properly sanitized before being used in the git command:Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if filePath is properly sanitized before being used in git commands rg -n 'subprocess\.run\(\["git".*\$\{?filePath' user_docs/markdownTranslate.pyLength of output: 80
Script:
#!/bin/bash # Search for the getLastCommitID function definition and examine the usage of filePath ast-grep --lang python --pattern $'def getLastCommitID(filePath: $_) {\n $$$\n}' user_docs/markdownTranslate.pyLength of output: 577
Script:
#!/bin/bash # Correct the pattern to search for the getLastCommitID function definition and examine the usage of filePath ast-grep --lang python --pattern 'def getLastCommitID(filePath):' user_docs/markdownTranslate.py -A 10Length of output: 104
Script:
#!/bin/bash # Use rg to search for the getLastCommitID function definition in the file rg -A 10 'def getLastCommitID' user_docs/markdownTranslate.pyLength of output: 480
Comments failed to post (14)
.github/workflows/regenerate_english_userDocs_translation_source.yml (4)
11-29: Consider using a requirements.txt file for dependency management.
The current setup installs the required Python packages directly in the workflow. While this works, using a
requirements.txt
file would be a more maintainable approach, especially if the number of dependencies grows in the future.Consider creating a
requirements.txt
file in your repository with the following content:lxml requests
Then, modify the "Install dependencies" step as follows:
- name: Install dependencies run: | python -m pip install --upgrade pip pip install -r requirements.txtThis approach centralizes dependency management and makes it easier to maintain and update dependencies in the future.
30-53: Enhance error handling and logging in the XLIFF update process.
While the current implementation is functional, it could benefit from improved error handling and more detailed logging. This would make troubleshooting easier if issues arise during the XLIFF update process.
Consider the following enhancements:
- Add error handling around the Python script execution:
try { python user_docs/markdownTranslate.py updateXliff -x $xliff -m $file -o $tempXliff if ($LASTEXITCODE -ne 0) { throw "Python script failed with exit code $LASTEXITCODE" } } catch { Write-Error "Failed to update XLIFF file: $_" exit 1 }
- Add more detailed logging:
Write-Host "Starting XLIFF update process for $file" # ... existing code ... Write-Host "XLIFF update process completed successfully for $file"
- Consider adding a summary at the end of the process:
$updatedFiles = @() # ... in the foreach loop ... $updatedFiles += $xliff # ... after the loop ... Write-Host "XLIFF update process completed. Updated files: $($updatedFiles -join ', ')"These changes will provide more visibility into the process and make it easier to identify and resolve any issues that may occur.
55-83: Improve security handling of SSH key.
The current implementation writes the SSH private key to a file, which could potentially be a security risk if the runner is compromised.
Consider using the
ssh-agent
to manage the SSH key more securely. Here's a suggested improvement:- name: Set up SSH key env: SSH_PRIVATE_KEY: ${{ secrets.XLIFF_DEPLOY_PRIVATE_KEY }} run: | mkdir -p ~/.ssh ssh-keyscan github.com >> ~/.ssh/known_hosts ssh-agent -a $SSH_AUTH_SOCK > /dev/null echo "$SSH_PRIVATE_KEY" | tr -d '\r' | ssh-add - env: SSH_AUTH_SOCK: /tmp/ssh_agent.sock - name: Commit and Push changes env: GIT_SSH_COMMAND: "ssh -o StrictHostKeyChecking=no -i ~/.ssh/id_rsa" run: | # ... rest of your existing script ...This approach uses
ssh-agent
to manage the key in memory, reducing the risk of the key being exposed on disk.Additionally, consider using a GitHub Action like
webfactory/ssh-agent
which handles SSH key setup securely:- uses: webfactory/[email protected] with: ssh-private-key: ${{ secrets.XLIFF_DEPLOY_PRIVATE_KEY }}This would replace the manual SSH key setup and provide a more secure, tested solution.
85-103: Enhance Crowdin upload process.
While the current implementation works, it could benefit from improved error handling and more flexibility.
Consider the following improvements:
- Add error handling for the Python script execution:
try { python appVeyor/crowdinSync.py uploadSourceFile 18 user_docs/en/userguide.xliff if ($LASTEXITCODE -ne 0) { throw "Crowdin upload failed with exit code $LASTEXITCODE" } Write-Host "Successfully uploaded userGuide.xliff to Crowdin" } catch { Write-Error "Failed to upload to Crowdin: $_" exit 1 }
- Make the file ID configurable:
Instead of hardcoding the file ID (18), consider storing it in a GitHub secret or variable:
env: crowdinProjectID: ${{ vars.CROWDIN_PROJECT_ID }} crowdinAuthToken: ${{ secrets.CROWDIN_AUTH_TOKEN }} crowdinUserGuideFileID: ${{ vars.CROWDIN_USERGUIDE_FILE_ID }}Then use this variable in your script:
python appVeyor/crowdinSync.py uploadSourceFile $env:crowdinUserGuideFileID user_docs/en/userguide.xliff
- Consider adding a mechanism to upload other XLIFF files if needed in the future:
$xliffFiles = @{ "userGuide.xliff" = $env:crowdinUserGuideFileID # Add more files here as needed } foreach ($file in $xliffFiles.Keys) { $changed = git diff --name-only ${{GITHUB.SHA}}.. -- "user_docs/en/$file" if ($changed) { Write-Host "Uploading $file to Crowdin" python appVeyor/crowdinSync.py uploadSourceFile $xliffFiles[$file] "user_docs/en/$file" } else { Write-Host "Not uploading $file to Crowdin as it has not changed" } }These changes will make the Crowdin upload process more robust, flexible, and easier to maintain as the project grows.
tests/unit/test_markdownTranslate.py (1)
41-135: LGTM: Comprehensive test method with suggestions for improvement.
The
test_markdownTranslate
method covers various functionalities of the markdownTranslate script, including generating XLIFF files, regenerating markdown files, updating XLIFF files, and translating to French. Each step is followed by a verification, which is a good practice.Consider the following improvements for better readability and maintainability:
Break down the large test method into smaller, focused test methods. This will make it easier to identify which specific functionality fails if a test doesn't pass.
Use parameterized tests to reduce code duplication for similar test cases.
Create helper methods for common operations, such as file path creation.
Here's an example of how you could refactor a part of the test:
import unittest from parameterized import parameterized class TestMarkdownTranslate(unittest.TestCase): # ... (existing setup code) ... def _get_file_path(self, filename): return os.path.join(self.outDir.name if filename.startswith("rebuilt_") else self.testDir, filename) @parameterized.expand([ ("2024.2", "en_2024.2_userGuide"), ("2024.3beta6", "en_2024.3beta6_userGuide"), ]) def test_generate_and_verify_markdown(self, version, file_prefix): xliff_file = f"{file_prefix}.xliff" md_file = f"{file_prefix}.md" rebuilt_md_file = f"rebuilt_{md_file}" self.runMarkdownTranslateCommand( f"Generate an xliff file from the English {version} user guide markdown file", ["generateXliff", "-m", self._get_file_path(md_file), "-o", self._get_file_path(xliff_file)], ) self.runMarkdownTranslateCommand( f"Regenerate the {version} markdown file from the generated {version} xliff file", ["generateMarkdown", "-x", self._get_file_path(xliff_file), "-o", self._get_file_path(rebuilt_md_file), "-u"], ) self.runMarkdownTranslateCommand( f"Ensure the regenerated {version} markdown file matches the original {version} markdown file", ["ensureMarkdownFilesMatch", self._get_file_path(rebuilt_md_file), self._get_file_path(md_file)], ) # ... (other test methods) ...This refactoring improves readability, reduces duplication, and makes it easier to add new test cases in the future.
user_docs/markdownTranslate.py (9)
544-649: Consider organizing CLI setup into a separate function.
The command-line interface setup is quite lengthy and could benefit from being organized into a separate function for better readability and maintainability.
Consider refactoring the CLI setup into a separate function:
+def setup_cli(): + mainParser = argparse.ArgumentParser() + commandParser = mainParser.add_subparsers(title="commands", dest="command", required=True) + + # Setup for generateXliff + generateXliffParser = commandParser.add_parser("generateXliff") + generateXliffParser.add_argument( + "-m", + "--markdown", + dest="md", + type=str, + required=True, + help="The markdown file to generate the xliff file for", + ) + generateXliffParser.add_argument( + "-o", "--output", dest="output", type=str, required=True, help="The file to output the xliff file to" + ) + + # Setup for other commands... + + return mainParser if __name__ == "__main__": - mainParser = argparse.ArgumentParser() - commandParser = mainParser.add_subparsers(title="commands", dest="command", required=True) - # ... (rest of the CLI setup) + parser = setup_cli() + args = parser.parse_args() - args = mainParser.parse_args() match args.command: case "generateXliff": generateXliff(mdPath=args.md, outputPath=args.output) # ... (rest of the command handling)Committable suggestion was skipped due to low confidence.
38-47: Consider adding error handling for file operations.
The
createAndDeleteTempFilePath_contextManager
function is well-implemented as a context manager. However, it might be beneficial to add error handling for file creation and deletion operations.Consider wrapping the file operations in try-except blocks to handle potential IOErrors:
@contextlib.contextmanager def createAndDeleteTempFilePath_contextManager( dir: str | None = None, prefix: str | None = None, suffix: str | None = None ) -> Generator[str, None, None]: """A context manager that creates a temporary file and deletes it when the context is exited""" with tempfile.NamedTemporaryFile(dir=dir, prefix=prefix, suffix=suffix, delete=False) as tempFile: tempFilePath = tempFile.name tempFile.close() yield tempFilePath + try: os.remove(tempFilePath) + except OSError as e: + print(f"Error deleting temporary file {tempFilePath}: {e}")Committable suggestion was skipped due to low confidence.
397-442: Consider adding a progress bar for better user feedback.
The
generateMarkdown
function processes potentially large files. Adding a progress bar could provide better feedback to the user.Consider using the
tqdm
library to add a progress bar:+from tqdm import tqdm + def generateMarkdown(xliffPath: str, outputPath: str, translated: bool = True) -> Result_generateMarkdown: print(f"Generating markdown file {prettyPathString(outputPath)} from {prettyPathString(xliffPath)}...") res = Result_generateMarkdown() with contextlib.ExitStack() as stack: outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline="")) xliff = lxml.etree.parse(xliffPath) xliffRoot = xliff.getroot() namespace = {"xliff": "urn:oasis:names:tc:xliff:document:2.0"} if xliffRoot.tag != "{urn:oasis:names:tc:xliff:document:2.0}xliff": raise ValueError("Not an xliff file") skeletonNode = xliffRoot.find("./xliff:file/xliff:skeleton", namespaces=namespace) if skeletonNode is None: raise ValueError("No skeleton found in xliff file") skeletonContent = xmlUnescape(skeletonNode.text).strip() + total_lines = len(skeletonContent.splitlines()) + pbar = tqdm(total=total_lines, desc="Generating Markdown") for line in skeletonContent.splitlines(keepends=True): res.numTotalLines += 1 if m := re_translationID.match(line): prefix, ID, suffix = m.groups() res.numTranslatableStrings += 1 unit = xliffRoot.find(f'./xliff:file/xliff:unit[@id="{ID}"]', namespaces=namespace) if unit is not None: segment = unit.find("./xliff:segment", namespaces=namespace) if segment is not None: source = segment.find("./xliff:source", namespaces=namespace) if translated: target = segment.find("./xliff:target", namespaces=namespace) else: target = None if target is not None and target.text: res.numTranslatedStrings += 1 translation = xmlUnescape(target.text) elif source is not None and source.text: translation = xmlUnescape(source.text) else: raise ValueError(f"No source or target found for unit {ID}") else: raise ValueError(f"No segment found for unit {ID}") else: raise ValueError(f"Cannot locate Unit {ID} in xliff file") outputFile.write(f"{prefix}{translation}{suffix}\n") else: outputFile.write(line) + pbar.update(1) + pbar.close() print( f"Generated markdown file with {res.numTotalLines} total lines, {res.numTranslatableStrings} translatable strings, and {res.numTranslatedStrings} translated strings" ) return resCommittable suggestion was skipped due to low confidence.
75-80: Consider adding error handling for file path operations.
The
getRawGithubURLForPath
function combines multiple operations. It might be beneficial to add error handling for cases where the file path is invalid or not within the git repository.Consider adding a try-except block to handle potential
OSError
orValueError
exceptions:def getRawGithubURLForPath(filePath: str) -> str: - gitDirPath = getGitDir() - commitID = getLastCommitID(filePath) - relativePath = os.path.relpath(os.path.abspath(filePath), gitDirPath) - relativePath = relativePath.replace("\\", "/") - return f"{RAW_GITHUB_REPO_URL}/{commitID}/{relativePath}" + try: + gitDirPath = getGitDir() + commitID = getLastCommitID(filePath) + relativePath = os.path.relpath(os.path.abspath(filePath), gitDirPath) + relativePath = relativePath.replace("\\", "/") + return f"{RAW_GITHUB_REPO_URL}/{commitID}/{relativePath}" + except (OSError, ValueError) as e: + raise ValueError(f"Error generating GitHub URL for {filePath}: {e}")Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def getRawGithubURLForPath(filePath: str) -> str: try: gitDirPath = getGitDir() commitID = getLastCommitID(filePath) relativePath = os.path.relpath(os.path.abspath(filePath), gitDirPath) relativePath = relativePath.replace("\\", "/") return f"{RAW_GITHUB_REPO_URL}/{commitID}/{relativePath}" except (OSError, ValueError) as e: raise ValueError(f"Error generating GitHub URL for {filePath}: {e}")
650-673: Consider adding global error handling and logging.
The main execution block could benefit from a global try-except block to catch and log any unexpected errors that might occur during command execution.
Consider adding a global try-except block and logging:
+import logging + +logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') + if __name__ == "__main__": - match args.command: - case "generateXliff": - generateXliff(mdPath=args.md, outputPath=args.output) - case "updateXliff": - updateXliff( - xliffPath=args.xliff, - mdPath=args.md, - outputPath=args.output, - ) - case "generateMarkdown": - generateMarkdown(xliffPath=args.xliff, outputPath=args.output, translated=args.translated) - case "translateXliff": - translateXliff( - xliffPath=args.xliff, - lang=args.lang, - pretranslatedMdPath=args.pretranslatedMd, - outputPath=args.output, - ) - case "pretranslateLangs": - pretranslateAllPossibleLanguages(langsDir=args.langsDir, mdBaseName=args.mdBaseName) - case "ensureMarkdownFilesMatch": - ensureMarkdownFilesMatch(path1=args.path1, path2=args.path2) - case _: - raise ValueError(f"Unknown command: {args.command}") + try: + match args.command: + case "generateXliff": + generateXliff(mdPath=args.md, outputPath=args.output) + case "updateXliff": + updateXliff( + xliffPath=args.xliff, + mdPath=args.md, + outputPath=args.output, + ) + case "generateMarkdown": + generateMarkdown(xliffPath=args.xliff, outputPath=args.output, translated=args.translated) + case "translateXliff": + translateXliff( + xliffPath=args.xliff, + lang=args.lang, + pretranslatedMdPath=args.pretranslatedMd, + outputPath=args.output, + ) + case "pretranslateLangs": + pretranslateAllPossibleLanguages(langsDir=args.langsDir, mdBaseName=args.mdBaseName) + case "ensureMarkdownFilesMatch": + ensureMarkdownFilesMatch(path1=args.path1, path2=args.path2) + case _: + raise ValueError(f"Unknown command: {args.command}") + except Exception as e: + logging.error(f"An error occurred: {e}") + raiseCommittable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.import logging logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') if __name__ == "__main__": try: match args.command: case "generateXliff": generateXliff(mdPath=args.md, outputPath=args.output) case "updateXliff": updateXliff( xliffPath=args.xliff, mdPath=args.md, outputPath=args.output, ) case "generateMarkdown": generateMarkdown(xliffPath=args.xliff, outputPath=args.output, translated=args.translated) case "translateXliff": translateXliff( xliffPath=args.xliff, lang=args.lang, pretranslatedMdPath=args.pretranslatedMd, outputPath=args.output, ) case "pretranslateLangs": pretranslateAllPossibleLanguages(langsDir=args.langsDir, mdBaseName=args.mdBaseName) case "ensureMarkdownFilesMatch": ensureMarkdownFilesMatch(path1=args.path1, path2=args.path2) case _: raise ValueError(f"Unknown command: {args.command}") except Exception as e: logging.error(f"An error occurred: {e}") raise
117-135: Consider adding more detailed logging.
The
generateSkeleton
function could benefit from more detailed logging, especially for larger files where the process might take some time.Consider adding more granular logging:
def generateSkeleton(mdPath: str, outputPath: str) -> Result_generateSkeleton: print(f"Generating skeleton file {prettyPathString(outputPath)} from {prettyPathString(mdPath)}...") res = Result_generateSkeleton() with ( open(mdPath, "r", encoding="utf8") as mdFile, open(outputPath, "w", encoding="utf8", newline="") as outputFile, ): + total_lines = sum(1 for _ in mdFile) + mdFile.seek(0) for mdLine in mdFile.readlines(): res.numTotalLines += 1 skelLine = skeletonizeLine(mdLine) if skelLine: res.numTranslationPlaceholders += 1 else: skelLine = mdLine outputFile.write(skelLine) + if res.numTotalLines % 1000 == 0: + print(f"Processed {res.numTotalLines}/{total_lines} lines...") print( f"Generated skeleton file with {res.numTotalLines} total lines and {res.numTranslationPlaceholders} translation placeholders" ) return resCommittable suggestion was skipped due to low confidence.
165-209: Consider breaking down the updateSkeleton function.
The
updateSkeleton
function is quite long and complex. Consider breaking it down into smaller, more manageable functions to improve readability and maintainability.Consider extracting the main loop into a separate function:
def updateSkeleton( origMdPath: str, newMdPath: str, origSkelPath: str, outputPath: str ) -> Result_updateSkeleton: print( f"Creating updated skeleton file {prettyPathString(outputPath)} from {prettyPathString(origSkelPath)} with changes from {prettyPathString(origMdPath)} to {prettyPathString(newMdPath)}..." ) res = Result_updateSkeleton() with contextlib.ExitStack() as stack: origMdFile = stack.enter_context(open(origMdPath, "r", encoding="utf8")) newMdFile = stack.enter_context(open(newMdPath, "r", encoding="utf8")) origSkelFile = stack.enter_context(open(origSkelPath, "r", encoding="utf8")) outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline="")) mdDiff = difflib.ndiff(origMdFile.readlines(), newMdFile.readlines()) origSkelLines = iter(origSkelFile.readlines()) + res = process_diff_lines(mdDiff, origSkelLines, outputFile) + print( + f"Updated skeleton file with {res.numAddedLines} added lines " + f"({res.numAddedTranslationPlaceholders} translation placeholders), " + f"{res.numRemovedLines} removed lines ({res.numRemovedTranslationPlaceholders} translation placeholders), " + f"and {res.numUnchangedLines} unchanged lines ({res.numUnchangedTranslationPlaceholders} translation placeholders)" + ) + return res + +def process_diff_lines(mdDiff, origSkelLines, outputFile) -> Result_updateSkeleton: + res = Result_updateSkeleton() for mdDiffLine in mdDiff: if mdDiffLine.startswith("?"): continue if mdDiffLine.startswith(" "): res.numUnchangedLines += 1 skelLine = next(origSkelLines) if re_translationID.match(skelLine): res.numUnchangedTranslationPlaceholders += 1 outputFile.write(skelLine) elif mdDiffLine.startswith("+"): res.numAddedLines += 1 skelLine = skeletonizeLine(mdDiffLine[2:]) if skelLine: res.numAddedTranslationPlaceholders += 1 else: skelLine = mdDiffLine[2:] outputFile.write(skelLine) elif mdDiffLine.startswith("-"): res.numRemovedLines += 1 origSkelLine = next(origSkelLines) if re_translationID.match(origSkelLine): res.numRemovedTranslationPlaceholders += 1 else: raise ValueError(f"Unexpected diff line: {mdDiffLine}") - print( - f"Updated skeleton file with {res.numAddedLines} added lines " - f"({res.numAddedTranslationPlaceholders} translation placeholders), " - f"{res.numRemovedLines} removed lines ({res.numRemovedTranslationPlaceholders} translation placeholders), " - f"and {res.numUnchangedLines} unchanged lines ({res.numUnchangedTranslationPlaceholders} translation placeholders)" - ) return resCommittable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def updateSkeleton( origMdPath: str, newMdPath: str, origSkelPath: str, outputPath: str ) -> Result_updateSkeleton: print( f"Creating updated skeleton file {prettyPathString(outputPath)} from {prettyPathString(origSkelPath)} with changes from {prettyPathString(origMdPath)} to {prettyPathString(newMdPath)}..." ) res = Result_updateSkeleton() with contextlib.ExitStack() as stack: origMdFile = stack.enter_context(open(origMdPath, "r", encoding="utf8")) newMdFile = stack.enter_context(open(newMdPath, "r", encoding="utf8")) origSkelFile = stack.enter_context(open(origSkelPath, "r", encoding="utf8")) outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline="")) mdDiff = difflib.ndiff(origMdFile.readlines(), newMdFile.readlines()) origSkelLines = iter(origSkelFile.readlines()) res = process_diff_lines(mdDiff, origSkelLines, outputFile) print( f"Updated skeleton file with {res.numAddedLines} added lines " f"({res.numAddedTranslationPlaceholders} translation placeholders), " f"{res.numRemovedLines} removed lines ({res.numRemovedTranslationPlaceholders} translation placeholders), " f"and {res.numUnchangedLines} unchanged lines ({res.numUnchangedTranslationPlaceholders} translation placeholders)" ) return res def process_diff_lines(mdDiff, origSkelLines, outputFile) -> Result_updateSkeleton: res = Result_updateSkeleton() for mdDiffLine in mdDiff: if mdDiffLine.startswith("?"): continue if mdDiffLine.startswith(" "): res.numUnchangedLines += 1 skelLine = next(origSkelLines) if re_translationID.match(skelLine): res.numUnchangedTranslationPlaceholders += 1 outputFile.write(skelLine) elif mdDiffLine.startswith("+"): res.numAddedLines += 1 skelLine = skeletonizeLine(mdDiffLine[2:]) if skelLine: res.numAddedTranslationPlaceholders += 1 else: skelLine = mdDiffLine[2:] outputFile.write(skelLine) elif mdDiffLine.startswith("-"): res.numRemovedLines += 1 origSkelLine = next(origSkelLines) if re_translationID.match(origSkelLine): res.numRemovedTranslationPlaceholders += 1 else: raise ValueError(f"Unexpected diff line: {mdDiffLine}") return res
217-326: Consider adding progress logging for long-running operations.
The
generateXliff
function might benefit from progress logging, especially when processing large files.Consider adding progress logging:
def generateXliff( mdPath: str, outputPath: str, skelPath: str | None = None, ) -> Result_generateXliff: # If a skeleton file is not provided, first generate one with contextlib.ExitStack() as stack: if not skelPath: skelPath = stack.enter_context( createAndDeleteTempFilePath_contextManager( dir=os.path.dirname(outputPath), prefix=os.path.basename(mdPath), suffix=".skel", ) ) generateSkeleton(mdPath=mdPath, outputPath=skelPath) with open(skelPath, "r", encoding="utf8") as skelFile: skelContent = skelFile.read() res = Result_generateXliff() print( f"Generating xliff file {prettyPathString(outputPath)} from {prettyPathString(mdPath)} and {prettyPathString(skelPath)}..." ) with contextlib.ExitStack() as stack: mdFile = stack.enter_context(open(mdPath, "r", encoding="utf8")) outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline="")) fileID = os.path.basename(mdPath) mdUri = getRawGithubURLForPath(mdPath) print(f"Including Github raw URL: {mdUri}") outputFile.write( '<?xml version="1.0"?>\n' f'<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en">\n' f'<file id="{fileID}" original="{mdUri}">\n' ) outputFile.write(f"<skeleton>\n{xmlEscape(skelContent)}\n</skeleton>\n") res.numTranslatableStrings = 0 + total_lines = sum(1 for _ in mdFile) + mdFile.seek(0) for lineNo, (mdLine, skelLine) in enumerate( zip_longest(mdFile.readlines(), skelContent.splitlines(keepends=True)), start=1 ): mdLine = mdLine.rstrip() skelLine = skelLine.rstrip() if m := re_translationID.match(skelLine): res.numTranslatableStrings += 1 prefix, ID, suffix = m.groups() if prefix and not mdLine.startswith(prefix): raise ValueError(f'Line {lineNo}: does not start with "{prefix}", {mdLine=}, {skelLine=}') if suffix and not mdLine.endswith(suffix): raise ValueError(f'Line {lineNo}: does not end with "{suffix}", {mdLine=}, {skelLine=}') source = mdLine[len(prefix) : len(mdLine) - len(suffix)] outputFile.write( f'<unit id="{ID}">\n' "<notes>\n" f'<note appliesTo="source">line: {lineNo + 1}</note>\n' ) if prefix: outputFile.write(f'<note appliesTo="source">prefix: {xmlEscape(prefix)}</note>\n') if suffix: outputFile.write(f'<note appliesTo="source">suffix: {xmlEscape(suffix)}</note>\n') outputFile.write( "</notes>\n" f"<segment>\n" f"<source>{xmlEscape(source)}</source>\n" "</segment>\n" "</unit>\n" ) else: if mdLine != skelLine: raise ValueError(f"Line {lineNo}: {mdLine=} does not match {skelLine=}") + if lineNo % 1000 == 0: + print(f"Processed {lineNo}/{total_lines} lines...") outputFile.write("</file>\n" "</xliff>") print(f"Generated xliff file with {res.numTranslatableStrings} translatable strings") return resCommittable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.def generateXliff( mdPath: str, outputPath: str, skelPath: str | None = None, ) -> Result_generateXliff: # If a skeleton file is not provided, first generate one with contextlib.ExitStack() as stack: if not skelPath: skelPath = stack.enter_context( createAndDeleteTempFilePath_contextManager( dir=os.path.dirname(outputPath), prefix=os.path.basename(mdPath), suffix=".skel", ) ) generateSkeleton(mdPath=mdPath, outputPath=skelPath) with open(skelPath, "r", encoding="utf8") as skelFile: skelContent = skelFile.read() res = Result_generateXliff() print( f"Generating xliff file {prettyPathString(outputPath)} from {prettyPathString(mdPath)} and {prettyPathString(skelPath)}..." ) with contextlib.ExitStack() as stack: mdFile = stack.enter_context(open(mdPath, "r", encoding="utf8")) outputFile = stack.enter_context(open(outputPath, "w", encoding="utf8", newline="")) fileID = os.path.basename(mdPath) mdUri = getRawGithubURLForPath(mdPath) print(f"Including Github raw URL: {mdUri}") outputFile.write( '<?xml version="1.0"?>\n' f'<xliff xmlns="urn:oasis:names:tc:xliff:document:2.0" version="2.0" srcLang="en">\n' f'<file id="{fileID}" original="{mdUri}">\n' ) outputFile.write(f"<skeleton>\n{xmlEscape(skelContent)}\n</skeleton>\n") res.numTranslatableStrings = 0 total_lines = sum(1 for _ in mdFile) mdFile.seek(0) for lineNo, (mdLine, skelLine) in enumerate( zip_longest(mdFile.readlines(), skelContent.splitlines(keepends=True)), start=1 ): mdLine = mdLine.rstrip() skelLine = skelLine.rstrip() if m := re_translationID.match(skelLine): res.numTranslatableStrings += 1 prefix, ID, suffix = m.groups() if prefix and not mdLine.startswith(prefix): raise ValueError(f'Line {lineNo}: does not start with "{prefix}", {mdLine=}, {skelLine=}') if suffix and not mdLine.endswith(suffix): raise ValueError(f'Line {lineNo}: does not end with "{suffix}", {mdLine=}, {skelLine=}') source = mdLine[len(prefix) : len(mdLine) - len(suffix)] outputFile.write( f'<unit id="{ID}">\n' "<notes>\n" f'<note appliesTo="source">line: {lineNo + 1}</note>\n' ) if prefix: outputFile.write(f'<note appliesTo="source">prefix: {xmlEscape(prefix)}</note>\n') if suffix: outputFile.write(f'<note appliesTo="source">suffix: {xmlEscape(suffix)}</note>\n') outputFile.write( "</notes>\n" f"<segment>\n" f"<source>{xmlEscape(source)}</source>\n" "</segment>\n" "</unit>\n" ) else: if mdLine != skelLine: raise ValueError(f"Line {lineNo}: {mdLine=} does not match {skelLine=}") if lineNo % 1000 == 0: print(f"Processed {lineNo}/{total_lines} lines...") outputFile.write("</file>\n" "</xliff>") print(f"Generated xliff file with {res.numTranslatableStrings} translatable strings") return res
328-387: Consider adding input validation for the
lang
parameter.In the
translateXliff
function, it might be beneficial to add input validation for thelang
parameter to ensure it's a valid language code.Consider adding a validation step for the
lang
parameter:def translateXliff( xliffPath: str, lang: str, pretranslatedMdPath: str, outputPath: str, allowBadAnchors: bool = False, ) -> Result_translateXliff: + # Validate lang parameter + if not re.match(r'^[a-z]{2,3}(-[A-Z]{2,3})?$', lang): + raise ValueError(f"Invalid language code: {lang}") print( f"Creating {lang} translated xliff file {prettyPathString(outputPath)} from {prettyPathString(xliffPath)} using {prettyPathString(pretranslatedMdPath)}..." ) res = Result_translateXliff() # ... rest of the function ...Committable suggestion was skipped due to low confidence.
Hi,
As one of the translators, let's give my one penny to this discussion
*smiles*
Can we remove changes for developers altogether from the changes file?
In the past, i haven't translated it for my languages, because developer
changes should be just for developers, and developers know english to be
able to read and have the access to the developer changes.
So, i always croslinked to english.
|
@zstanecic For now at least, we are going to keep the changes file as is. However, now that it is on Crowdin, translators can choose to simply not translate those strings if they wish. |
Please note that the .xliff file is now included into the installer, should this file be excluded? |
@wmhn1872265132 thanks for catching this. I've excluded xliff files now. |
Just wanted to ask: Since we plan to make extensive revisions to the Simplified Chinese version of the User Guide in the future, I would like to see a way to directly edit the Markdown version of the User Guide instead of a po file. For large-scale changes, we may prefer to use a text editor to edit the markdown file. Also want a script that uses structure comparison to ensure that the structure is not broken. Thanks |
@cary-rowen I'm sorry, but going forward documentation such as the user guide and changes files can only be translated on Crowdin, either through its interface, or using poedit to translate the xliff files. |
Thanks Mic, Will NV Access open source the script for converting markdown in the future? This allows us to edit markdown to xliff in a highly customizable way Best, |
Or can I use a project like https://github.com/cataria-rocks/md2xliff to do so? |
This pr adds a newly generated changes.xliff for English, which has also been uploaded to Crowdin.
This PR updates the user docs github action to upload the English changes.xliff to Crowdin if it has changed.
This pr also includes the initial set of translated user docs xliff files from Crowdin.
So far that is 20 translations of the user guide, and 7 translations of changes (what's new).
scons will see that these are newer than their markdown files, and rebuild the markdown files from these, and then build the html from the rebuilt markdown files.