Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve translation system workflow #17214

Merged
merged 39 commits into from
Oct 4, 2024
Merged

Improve translation system workflow #17214

merged 39 commits into from
Oct 4, 2024

Conversation

michaelDCurran
Copy link
Member

@michaelDCurran michaelDCurran commented Sep 25, 2024

Link to issue number:

None.

Summary of the issue:

There are several limitations to translating with the Crowdin workflow such as:

  • No way to locally verify html output for documentation
  • Crowdin is confused by empty target tags
  • Incorrect double escaping of xml.

Description of user facing changes

None.

Description of development approach

markdownTranslate:

  • No longer incorrectly xml escape content before inserting it into the lxml tree. lxml already does this.
  • No longer xml unescaped content coming from the lxml tree - lxml already does this.translateXliff: add a note to certain exceptions showing the line number of the markdown file to find markdown syntax errors easier.
  • translateXliff: for lines that are purely structural (no translaton still ensure that the line matches the skeleton, to catch markdown syntax errors much earlier.

md2html.py

  • Restructured no longer to be an SCons tool, rather it is now a standalone script in user_docs, so that other code such as the nvdaL10nUtil script can use it.

nvdaL10util:

Added a new nvdal10nUtil program, which provides the following commands:

  • xliff2md: converts a translated xliff file back into its original markdown file
nvdaL10util xliff2md <xliff file> <output xliff file>
  • md2html: converts a markdown file into html, including support for the keyCommands document.
nvdaL10util md2html -t {userGuide|changes|keyCommands} -l <lang> <markdown file> <output html file>
  • xliff2html: converts a xliff file all the way to html, including support for the keyCommands document.
nvdaL10util xliff2html -t {userGuide|changes|keyCommands} -l <lang> <xliff file> <output html file>
  • stripXliff: takes a translated xliff file and generates a new xliff file that has all unnecessary content removed, so that when uploaded, only valid, added/changed translation trings are submitted to Crowdin. It removes:
    • empty target tags
    • initial English source target tags
  • Optionally removes any translation that already appears in the old xliff, so that the new file only contains added / changed translations since last download.
nvdaL10nUtil stripXliff [-o <old xliff>] <translated xliff file> <output stripped xliff file>

Documentation updates

  • No longer talk about Crowdin approvals - this has been disabled due to inaccessibility
  • No longer recommend Poedit's Crowdin cloud translation feature as it is buggy. Instead just recommend ownloading, translating with Poedit, and uploding again.

Testing strategy:

  • Original unit tests.
  • Tested each new command on at least two languages.

Known issues with pull request:

None known.

Code Review Checklist:

  • Documentation:
    • Change log entry
    • User Documentation
    • Developer / Technical Documentation
    • Context sensitive help for GUI changes
  • Testing:
    • Unit tests
    • System (end to end) tests
    • Manual testing
  • UX of all users considered:
    • Speech
    • Braille
    • Low Vision
    • Different web browsers
    • Localization in other languages / culture than English
  • API is compatible with existing add-ons.
  • Security precautions taken.

Summary by CodeRabbit

Release Notes

  • New Features

    • Enhanced documentation build process with a new markdown-to-HTML conversion method.
    • Introduced the nvdaL10nUtil script for improved localization handling, including commands for converting XLIFF to markdown and HTML.
    • Added new sections in user documentation for Crowdin translation processes.
  • Bug Fixes

    • Improved error handling and reporting during markdown generation and translation processes.
  • Chores

    • Updated dependencies for better functionality and performance across the project.

michaelDCurran and others added 15 commits September 9, 2024 15:35
…Fail if purely structual lines do not match the skeleton.
… few things such as xliff to markdown, and markdown to html. This required slightly refactoring md2html.py and sconstruct.
…g '<target/>' where Crowdin has taken empty target tags as literal translations.
…string - lxml does this itself. We were doing it twice.
…etting / getting from lxml as this automatically happens. Otherwise, content is doubly escaped.
… xliff except for valid target translations, ready for upload to Crowdin. It also optionally ignores all existing target translations froma n existing xliff file so that only a delta can be uploaded.
@AppVeyorBot
Copy link

See test results for failed build of commit e5b81c15e9

@hwf1324
Copy link
Contributor

hwf1324 commented Sep 25, 2024

Hi,

Could be done in this PR by the way.

Fix the bug that scons -c deletes a portion of markdown files in non-English languages?

Thanks.

projectDocs/translating/crowdin.md Outdated Show resolved Hide resolved
projectDocs/translating/crowdin.md Outdated Show resolved Hide resolved
@CyrilleB79
Copy link
Collaborator

By the way, could you also add nvdaL10util.exe to the artifacts generated by appVeyor? Thanks.

@AppVeyorBot
Copy link

See test results for failed build of commit 999dc8fd52

@michaelDCurran michaelDCurran marked this pull request as ready for review September 30, 2024 22:49
@michaelDCurran michaelDCurran requested a review from a team as a code owner September 30, 2024 22:49
Copy link
Contributor

coderabbitai bot commented Sep 30, 2024

Walkthrough

The pull request includes updates to various files, primarily focusing on enhancing the documentation and localization processes for the NVDA project. Key changes involve modifications to the .gitignore file, updates to the build process in the sconscript and SConstruct files, improvements in translation handling within markdownTranslate.py, and the introduction of a new utility script for localization tasks. Additionally, the requirements.txt file has been updated to include several new dependencies relevant to the build and documentation processes.

Changes

File/Path Change Summary
.gitignore Updated to include new patterns for ignoring additional files and directories related to documentation and localization.
projectDocs/dev/developerGuide/sconscript Modified the build process to include a variable substitution step for markdown to HTML conversion.
projectDocs/translating/crowdin.md Expanded to include instructions for translating user documentation and updated the translation review process.
projectDocs/translating/readme.md Clarified guidance on the translation and localization process, emphasizing the use of Crowdin.
requirements.txt Added multiple new dependencies for build tools, documentation, and localization utilities.
SConstruct Updated to remove the md2html tool, streamline markdown processing, and enhance error handling.
user_docs/markdownTranslate.py Simplified XML handling and improved translation tracking with new attributes and exception handling.
user_docs/md2html.py Restructured to use argparse for command-line arguments, simplifying the markdown to HTML conversion process.
user_docs/nvdaL10nUtil.py Introduced new utility functions for handling XLIFF files and added CLI commands for localization tasks.

Possibly related issues

Possibly related PRs

Suggested labels

conceptApproved, merge-early

Suggested reviewers

  • SaschaCowley
  • Qchristensen

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Outside diff range and nitpick comments (11)
.gitignore (1)

Line range hint 30-35: LGTM! Consider adding comments for clarity.

The new entries in the .gitignore file are appropriate and align with the PR objectives. They effectively ignore generated documentation files and architecture-specific controller client files, which helps keep the repository clean and prevents potential conflicts.

Consider adding comments to explain the purpose of these new ignore patterns. This can help other developers understand why these files are being ignored. For example:

+# Ignore generated documentation files
 user_docs/*/*.md.sub
 user_docs/*/*.html
 user_docs/*/*.css
+# Ignore architecture-specific controller client files
 extras/controllerClient/x86/nvdaController.h
 extras/controllerClient/x64
 extras/controllerClient/arm64
projectDocs/dev/developerGuide/sconscript (1)

15-28: Approve changes with a minor suggestion for clarity

The introduction of the variable substitution step is a good improvement to the build process. It allows for dynamic content injection, such as the NVDA version, into the markdown file before HTML generation. This change enhances the flexibility and maintainability of the documentation.

For improved clarity, consider adding a brief comment explaining the purpose of the substitution step. For example:

# Substitute variables (e.g., NVDA version) into the markdown file
mdFileSub = env.Substfile(

This will help future developers quickly understand the purpose of this step in the build process.

projectDocs/translating/readme.md (1)

55-56: LGTM: Clear instructions for translating documentation

The changes accurately reflect the use of Crowdin for translating both the User Guide and the optional changes.md file. This is consistent with the PR objectives and provides clear guidance for translators.

Consider adding a brief explanation of what the changes.md file contains to provide more context for translators. For example:

- changes.md (optional): a list of changes between NVDA releases, see [Translating using Crowdin](./crowdin.md) for more information.
projectDocs/translating/crowdin.md (6)

Line range hint 3-31: LGTM! Comprehensive update to introduction and setup.

The changes effectively broaden the scope of Crowdin usage to include user documentation and provide updated setup instructions. The note about disabled translation approvals is crucial for translators to understand the current workflow.

Consider adding a brief explanation of why translation approvals have been disabled due to accessibility issues. This could help translators better understand the situation and potentially contribute to finding a solution.

Due to accessibility issues with the approval process, translation approvals have been temporarily disabled on Crowdin.
🧰 Tools
🪛 LanguageTool

[typographical] ~29-~29: Consider adding a comma here.
Context: ...on reviews Due to accessibility issues, for now translation approvals have been disable...

(FOR_NOW_COMMA)


[uncategorized] ~42-~42: Possible missing comma found.
Context: ...g PoEdit After opening a .po or .xliff file you will be placed on a list with all o...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~42-~42: Consider removing “of” to be more concise
Context: ... file you will be placed on a list with all of the strings to translate. You can read the...

(ALL_OF_THE)


[typographical] ~45-~45: The word “thus” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ... a message which has been automatically translated, thus it may be wrong. PoEdit will collect th...

(THUS_SENTENCE)


Line range hint 42-62: LGTM! Updated PoEdit instructions for .xliff files.

The inclusion of .xliff file handling is a valuable addition, reflecting the expanded scope of translation to include user documentation.

There's a minor grammatical issue in the first sentence. Consider adding a comma for better readability:

After opening a .po or .xliff file, you will be placed on a list with all the strings to translate.
🧰 Tools
🪛 LanguageTool

[typographical] ~29-~29: Consider adding a comma here.
Context: ...on reviews Due to accessibility issues, for now translation approvals have been disable...

(FOR_NOW_COMMA)


[uncategorized] ~42-~42: Possible missing comma found.
Context: ...g PoEdit After opening a .po or .xliff file you will be placed on a list with all o...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~42-~42: Consider removing “of” to be more concise
Context: ... file you will be placed on a list with all of the strings to translate. You can read the...

(ALL_OF_THE)


[typographical] ~45-~45: The word “thus” is an adverb that can’t be used like a conjunction, and therefore needs to be separated from the sentence.
Context: ... a message which has been automatically translated, thus it may be wrong. PoEdit will collect th...

(THUS_SENTENCE)


157-176: LGTM! Comprehensive instructions for translating user documentation.

This new section is a valuable addition, providing clear guidance for translating NVDA's user documentation. The instructions for handling .xliff files and using the nvdaL10nUtil program are well-explained and will be very helpful for translators.

Consider adding a brief explanation of what the nvdaL10nUtil program is and where translators can obtain it. This would help new translators who might not be familiar with this tool.

* Use the nvdaL10nUtil program (a utility provided by NVDA for localization tasks) to strip the xliff so that it only contains translations that were added / changed. E.g.

177-192: LGTM! Clear guidelines for translating markdown content.

This new section provides crucial information for translators on how to handle markdown content in .xliff files. The explanations about structural elements and inline syntax are clear and will help maintain the integrity of the documentation structure during translation.

For consistency and proper noun usage, consider capitalizing "Markdown" throughout the section:

### Translating Markdown
The English NVDA user documentation is written in Markdown syntax.
Content may still however contain inline Markdown syntax such as links, inline code fences (`\``), and table column separators (`|`).
🧰 Tools
🪛 LanguageTool

[grammar] ~178-~178: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...h NVDA user documentation is written in markdown syntax. The xliff file you are directly...

(MARKDOWN_NNP)


[style] ~182-~182: In American English, abbreviations like “etc.” require a period.
Context: ...rows, table header body separator lines etc) are not included here. Structural syn...

(ETC_PERIOD)


[grammar] ~185-~185: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ontent may still however contain inline markdown syntax such as links, inline code fence...

(MARKDOWN_NNP)


193-197: LGTM! Useful instructions for verifying translations.

This new section provides valuable guidance for translators to verify their work by generating HTML versions of the translated documentation. The example command using nvdaL10nUtil is particularly helpful.

There's a minor typo in the introductory sentence. Consider fixing it for better readability:

Whenever you have saved the xliff file with Poedit, you can use the nvdaL10nUtil program to generate the HTML version of the documentation file. E.g.
🧰 Tools
🪛 LanguageTool

[misspelling] ~193-~193: You can use ‘ever’ as an intensifier, but if the phrase is a pronoun or an adverb, use “Whenever” without a space.
Context: ...ontent. ### Verifying your translation When ever you have saved the xliff file with Poed...

(EVER_INTENSIFIER)


Line range hint 1-197: Excellent update to the translation documentation!

This comprehensive revision significantly improves the guidance for NVDA translators. The addition of sections on translating user documentation, handling Markdown content, and verifying translations are particularly valuable. The overall structure and clarity of the document have been enhanced.

Consider adding a table of contents at the beginning of the document to help translators quickly navigate to specific sections. This would be especially helpful given the expanded scope of the document.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~153-~153: Possible missing comma found.
Context: ...now be heard or brailled in your native language provided that the synthesizer you are u...

(AI_HYDRA_LEO_MISSING_COMMA)


[grammar] ~178-~178: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...h NVDA user documentation is written in markdown syntax. The xliff file you are directly...

(MARKDOWN_NNP)


[style] ~182-~182: In American English, abbreviations like “etc.” require a period.
Context: ...rows, table header body separator lines etc) are not included here. Structural syn...

(ETC_PERIOD)


[grammar] ~185-~185: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ontent may still however contain inline markdown syntax such as links, inline code fence...

(MARKDOWN_NNP)


[misspelling] ~193-~193: You can use ‘ever’ as an intensifier, but if the phrase is a pronoun or an adverb, use “Whenever” without a space.
Context: ...ontent. ### Verifying your translation When ever you have saved the xliff file with Poed...

(EVER_INTENSIFIER)

user_docs/nvdaL10nUtil.py (1)

55-58: Ensure consistent variable naming conventions.

The variables corruptTargetcount and sourceTargetcount have inconsistent capitalization in count. To maintain consistency, consider renaming them to corruptTargetCount and sourceTargetCount.

Apply this diff to correct the variable names:

-    	corruptTargetcount = 0
-    	sourceTargetcount = 0
+    	corruptTargetCount = 0
+    	sourceTargetCount = 0

Ensure all occurrences of these variables are updated throughout the code.

user_docs/markdownTranslate.py (1)

450-450: Recommend using logging module instead of print for warnings

In line 450, using print() to display a warning message is not ideal for production code. It's better to use the logging module, which offers more flexibility and control over log levels and output.

Apply this diff to switch to the logging module:

+import logging
...
-    print(f"Warning: line {lineNum} contained a corrupt empty translation. Using source")
+    logging.warning(f"Line {lineNum}: contained a corrupt empty translation. Using source")

Also, ensure that logging is configured appropriately in the module.

📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 9fe4541 and 90f9385.

📒 Files selected for processing (9)
  • .gitignore (1 hunks)
  • projectDocs/dev/developerGuide/sconscript (1 hunks)
  • projectDocs/translating/crowdin.md (4 hunks)
  • projectDocs/translating/readme.md (2 hunks)
  • requirements.txt (1 hunks)
  • sconstruct (3 hunks)
  • user_docs/markdownTranslate.py (5 hunks)
  • user_docs/md2html.py (6 hunks)
  • user_docs/nvdaL10nUtil.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (8)
projectDocs/dev/developerGuide/sconscript (1)

Pattern **/*: Focus on code smells, logic errors, edge cases, missing test cases, security flaws and serious issues. Avoid commenting on minor issues such as linting, formatting and style issues. This project uses tabs instead of spaces, do not suggest usage of spaces over tabs. Are there any 'red flags' in this code that might warrant closer investigation from a security standpoint? Explain what makes them suspicious. When providing code suggestions, particularly when requested, ensure GitHub's suggestion format is used, i.e.: suggestion <code changes>

projectDocs/translating/crowdin.md (2)

Pattern **/*: Focus on code smells, logic errors, edge cases, missing test cases, security flaws and serious issues. Avoid commenting on minor issues such as linting, formatting and style issues. This project uses tabs instead of spaces, do not suggest usage of spaces over tabs. Are there any 'red flags' in this code that might warrant closer investigation from a security standpoint? Explain what makes them suspicious. When providing code suggestions, particularly when requested, ensure GitHub's suggestion format is used, i.e.: suggestion <code changes>


Pattern **/*.md: Focus on grammar, spelling, and punctuation. Also consider readability and clarity of contents. Ensure that changes follow the style of one sentence per line, suggest changes if this is not met.

projectDocs/translating/readme.md (2)

Pattern **/*: Focus on code smells, logic errors, edge cases, missing test cases, security flaws and serious issues. Avoid commenting on minor issues such as linting, formatting and style issues. This project uses tabs instead of spaces, do not suggest usage of spaces over tabs. Are there any 'red flags' in this code that might warrant closer investigation from a security standpoint? Explain what makes them suspicious. When providing code suggestions, particularly when requested, ensure GitHub's suggestion format is used, i.e.: suggestion <code changes>


Pattern **/*.md: Focus on grammar, spelling, and punctuation. Also consider readability and clarity of contents. Ensure that changes follow the style of one sentence per line, suggest changes if this is not met.

requirements.txt (1)

Pattern **/*: Focus on code smells, logic errors, edge cases, missing test cases, security flaws and serious issues. Avoid commenting on minor issues such as linting, formatting and style issues. This project uses tabs instead of spaces, do not suggest usage of spaces over tabs. Are there any 'red flags' in this code that might warrant closer investigation from a security standpoint? Explain what makes them suspicious. When providing code suggestions, particularly when requested, ensure GitHub's suggestion format is used, i.e.: suggestion <code changes>

sconstruct (1)

Pattern **/*: Focus on code smells, logic errors, edge cases, missing test cases, security flaws and serious issues. Avoid commenting on minor issues such as linting, formatting and style issues. This project uses tabs instead of spaces, do not suggest usage of spaces over tabs. Are there any 'red flags' in this code that might warrant closer investigation from a security standpoint? Explain what makes them suspicious. When providing code suggestions, particularly when requested, ensure GitHub's suggestion format is used, i.e.: suggestion <code changes>

user_docs/markdownTranslate.py (2)

Pattern **/*: Focus on code smells, logic errors, edge cases, missing test cases, security flaws and serious issues. Avoid commenting on minor issues such as linting, formatting and style issues. This project uses tabs instead of spaces, do not suggest usage of spaces over tabs. Are there any 'red flags' in this code that might warrant closer investigation from a security standpoint? Explain what makes them suspicious. When providing code suggestions, particularly when requested, ensure GitHub's suggestion format is used, i.e.: suggestion <code changes>


Pattern **/*.py: _, pgettext, ngettext, and ngettext are defined globally, errors for this being undefined can be ignored.

user_docs/md2html.py (2)

Pattern **/*: Focus on code smells, logic errors, edge cases, missing test cases, security flaws and serious issues. Avoid commenting on minor issues such as linting, formatting and style issues. This project uses tabs instead of spaces, do not suggest usage of spaces over tabs. Are there any 'red flags' in this code that might warrant closer investigation from a security standpoint? Explain what makes them suspicious. When providing code suggestions, particularly when requested, ensure GitHub's suggestion format is used, i.e.: suggestion <code changes>


Pattern **/*.py: _, pgettext, ngettext, and ngettext are defined globally, errors for this being undefined can be ignored.

user_docs/nvdaL10nUtil.py (2)

Pattern **/*: Focus on code smells, logic errors, edge cases, missing test cases, security flaws and serious issues. Avoid commenting on minor issues such as linting, formatting and style issues. This project uses tabs instead of spaces, do not suggest usage of spaces over tabs. Are there any 'red flags' in this code that might warrant closer investigation from a security standpoint? Explain what makes them suspicious. When providing code suggestions, particularly when requested, ensure GitHub's suggestion format is used, i.e.: suggestion <code changes>


Pattern **/*.py: _, pgettext, ngettext, and ngettext are defined globally, errors for this being undefined can be ignored.

🪛 LanguageTool
projectDocs/translating/crowdin.md

[typographical] ~29-~29: Consider adding a comma here.
Context: ...on reviews Due to accessibility issues, for now translation approvals have been disable...

(FOR_NOW_COMMA)


[uncategorized] ~42-~42: Possible missing comma found.
Context: ...g PoEdit After opening a .po or .xliff file you will be placed on a list with all o...

(AI_HYDRA_LEO_MISSING_COMMA)


[style] ~42-~42: Consider removing “of” to be more concise
Context: ... file you will be placed on a list with all of the strings to translate. You can read the...

(ALL_OF_THE)


[uncategorized] ~153-~153: Possible missing comma found.
Context: ...now be heard or brailled in your native language provided that the synthesizer you are u...

(AI_HYDRA_LEO_MISSING_COMMA)


[grammar] ~178-~178: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...h NVDA user documentation is written in markdown syntax. The xliff file you are directly...

(MARKDOWN_NNP)


[style] ~182-~182: In American English, abbreviations like “etc.” require a period.
Context: ...rows, table header body separator lines etc) are not included here. Structural syn...

(ETC_PERIOD)


[grammar] ~185-~185: Did you mean the formatting language “Markdown” (= proper noun)?
Context: ...ontent may still however contain inline markdown syntax such as links, inline code fence...

(MARKDOWN_NNP)


[misspelling] ~193-~193: You can use ‘ever’ as an intensifier, but if the phrase is a pronoun or an adverb, use “Whenever” without a space.
Context: ...ontent. ### Verifying your translation When ever you have saved the xliff file with Poed...

(EVER_INTENSIFIER)

🪛 Ruff
user_docs/md2html.py

6-6: sys imported but unused

Remove unused import: sys

(F401)

user_docs/nvdaL10nUtil.py

22-22: Local variable namespace is assigned to but never used

Remove assignment to unused variable namespace

(F841)


42-42: f-string without any placeholders

Remove extraneous f prefix

(F541)


52-52: f-string without any placeholders

Remove extraneous f prefix

(F541)


53-53: f-string without any placeholders

Remove extraneous f prefix

(F541)

🔇 Additional comments (17)
requirements.txt (6)

Line range hint 11-12: LGTM: Addition of fast-diff-match-patch is appropriate.

The addition of fast-diff-match-patch==2.1.0 is well-documented and aligns with the PR objectives. Pinning the specific version ensures reproducibility.


Line range hint 14-15: LGTM: Addition of typing-extensions enhances type hinting capabilities.

The inclusion of typing-extensions==4.9.0 is well-justified and supports improved type hinting. The comment provides clear context, referencing PEP 696.


Line range hint 17-18: Approved, but please clarify the relevance to translation improvements.

The addition of pycaw==20240210 for Core Audio Windows functionality looks good. However, could you please clarify how this relates to the translation system improvements mentioned in the PR objectives?


24-25: LGTM: Addition of nuitka for xliff2html packaging.

The inclusion of nuitka==2.4.8 for packaging xliff2html is appropriate and aligns with the PR objectives.

Consider documenting the compilation process for xliff2html in the project's build documentation to ensure maintainability.


Line range hint 35-36: Excellent addition: nh3 for HTML sanitization.

The inclusion of nh3==0.2.18 for sanitizing HTML documentation output is a crucial security measure. This directly addresses potential XSS vulnerabilities from translated content.

To ensure proper usage across the project, consider documenting how and where nh3 should be applied in the translation workflow.


Line range hint 1-54: Overall, the dependency updates align well with the PR objectives.

The changes to requirements.txt effectively support the improvements to the translation system workflow. Key additions like fast-diff-match-patch, typing-extensions, nuitka, and nh3 directly contribute to the stated goals. The security aspect has been addressed with HTML sanitization.

Other additions and updates, while not requiring specific comments, appear to enhance the development environment and testing capabilities. These changes should contribute to a more robust and maintainable codebase.

projectDocs/translating/readme.md (2)

32-33: LGTM: Clear explanation of the translation process

The changes provide a clear and concise explanation of the current translation process, accurately reflecting the split between Crowdin and GitHub for different components. This aligns well with the PR objectives and should help translators understand the workflow.


Line range hint 1-56: Overall: Excellent updates to the translation documentation

The changes made to this document significantly improve the clarity and accuracy of the NVDA translation and localization process. The updates reflect the consolidation of translation processes using Crowdin for the NVDA interface and user documentation, while also mentioning the transition to GitHub for Character Descriptions, Symbols, and Gestures.

These changes align well with the PR objectives and should help translators better understand the current workflow. The document now provides a more streamlined and up-to-date guide for both new and existing translators.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~23-~23: It appears that hyphens are missing in the adjective “up-to-date”.
Context: ...ttps://crowdin.com/project/nvda) for an up to date report on the status of translating the...

(UP_TO_DATE_HYPHEN)

projectDocs/translating/crowdin.md (3)

35-38: LGTM! Simplified workflow description.

The reduction from three to two common workflows and the more concise descriptions improve clarity and make the process easier to understand for translators.


65-69: LGTM! Clear instructions for translating NVDA's interface.

This new section effectively separates interface translation from documentation translation, improving the overall organization of the document. The steps provided are clear and concise, making it easy for translators to understand the process.


148-156: LGTM! Improved instructions for testing interface translations.

The updated instructions provide a clearer and more detailed guide for translators to test their work. This improvement is crucial for ensuring the quality of translations and will help translators verify their work more effectively.

🧰 Tools
🪛 LanguageTool

[uncategorized] ~153-~153: Possible missing comma found.
Context: ...now be heard or brailled in your native language provided that the synthesizer you are u...

(AI_HYDRA_LEO_MISSING_COMMA)

user_docs/nvdaL10nUtil.py (1)

107-147: Verify Python version compatibility for the match statement.

The use of the match statement requires Python 3.10 or higher. Please verify that the codebase and deployment environments support Python 3.10+. If compatibility with earlier versions is needed, consider replacing match with if-elif-else statements.

sconstruct (2)

348-358: Variables substituted into markdown files

The code correctly substitutes variables such as NVDA_VERSION, NVDA_URL, and NVDA_COPYRIGHT_YEARS into the markdown files before converting them to HTML. This ensures that the generated documentation reflects the current build information.


380-387: Generating keyCommands.html files from userGuide.md.sub

The code successfully generates keyCommands.html files using the updated md2html.py script. It correctly specifies the language and document type, ensuring that the key commands documentation is accurately built for each locale.

user_docs/markdownTranslate.py (3)

160-160: Approved change: Removal of unnecessary xmlUnescape in extractSkeleton

In line 160, replacing outputFile.write(xmlUnescape(skeletonContent)) with outputFile.write(skeletonContent) is acceptable since lxml.etree handles unescaping when parsing the XML. This change simplifies the code by avoiding redundant unescaping.


349-349: Approved change: Simplification of skeleton content handling

In line 349, the removal of xmlUnescape when processing skeletonContent is appropriate, as lxml.etree correctly handles unescaping. Writing the skeleton content directly enhances code clarity.


418-419: Approved change: Simplification in generateMarkdown function

In lines 418-419, removing unnecessary unescaping of skeletonContent is appropriate, as lxml.etree handles the unescaping of XML content. Directly processing skeletonContent enhances performance and code readability.

user_docs/md2html.py Outdated Show resolved Hide resolved
Comment on lines 136 to 151
def main(source: str, dest: str, lang="en", docType=None):
print(f"Converting {docType or 'document'} at {source} to {dest}, {lang=}")
isUserGuide = docType == "userGuide"
isDevGuide = docType == "developerGuide"
isChanges = docType == "changes"
isKeyCommands = docType == "keyCommands"
if docType and not any([isUserGuide, isDevGuide, isChanges, isKeyCommands]):
raise ValueError(f"Unknown docType {docType}")
with open(source, "r", encoding="utf-8") as mdFile:
mdStr = mdFile.read()

mdStr = _replaceNVDATags(mdStr, env)

with io.StringIO() as mdBuffer:
mdBuffer.write(mdStr)
title = _getTitle(mdBuffer, isKeyCommands)

lang = pathlib.Path(source[0].path).parent.name
if isDevGuide and lang == "developerGuide":
# Parent folder in this case is the developerGuide folder in project docs
lang = "en"
lang = pathlib.Path(source).parent.name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Potential conflict with 'lang' parameter in function 'main'

The lang parameter passed to the main function is being overwritten on line 151 by lang = pathlib.Path(source).parent.name. This could lead to unexpected behavior if a different language code is provided via the --lang command-line argument.

Consider the following options to resolve this conflict:

  • Option 1: Remove the lang parameter from the function and the command-line argument if the language is always determined by the source file path.
  • Option 2: Use the provided lang argument and only default to the path-derived language if lang is not supplied.

Example modification for Option 2:

if not lang:
    lang = pathlib.Path(source).parent.name

Comment on lines 154 to 158
extraStylesheet = ""
elif isChanges or isKeyCommands:
extraStylesheet = ""
else:
raise ValueError(f"Unknown target type for {target[0].path}")
raise ValueError(f"Unknown target type for {dest}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Redundant assignment of 'extraStylesheet'

The variable extraStylesheet is assigned an empty string in all cases. This makes the conditional blocks unnecessary and adds complexity without functional benefit.

Simplify the code by removing the conditional statements and assigning extraStylesheet directly:

extraStylesheet = ""

If different stylesheets are intended for different document types, ensure that the appropriate stylesheets are assigned within the conditional blocks.

Comment on lines 189 to 193
args.add_argument("-l", "--lang", help="Language code", action="store", default="en")
args.add_argument("-t", "--docType", help="Type of document", action="store", choices=["userGuide", "developerGuide", "changes", "keyCommands"])
args.add_argument("source", help="Path to the markdown file")
args.add_argument("dest", help="Path to the resulting html file")
args = args.parse_args()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Make 'docType' a required command-line argument

The docType argument is optional in the command-line interface, but the code behaves incorrectly or raises an error when it is not provided. This can lead to a ValueError at runtime.

Modify the argument parser to make docType a required argument:

args.add_argument(
    "-t",
    "--docType",
    help="Type of document",
    choices=["userGuide", "developerGuide", "changes", "keyCommands"],
    required=True
)

Comment on lines 188 to 194
args = argparse.ArgumentParser()
args.add_argument("-l", "--lang", help="Language code", action="store", default="en")
args.add_argument("-t", "--docType", help="Type of document", action="store", choices=["userGuide", "developerGuide", "changes", "keyCommands"])
args.add_argument("source", help="Path to the markdown file")
args.add_argument("dest", help="Path to the resulting html file")
args = args.parse_args()
main(source=args.source, dest=args.dest, lang=args.lang, docType=args.docType)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Avoid reusing variable name 'args'

The variable args is used for both the ArgumentParser instance and the parsed arguments. This reuse can cause confusion and potential bugs.

Rename the ArgumentParser instance to parser for clarity:

parser = argparse.ArgumentParser()
parser.add_argument(...)
args = parser.parse_args()

user_docs/md2html.py Show resolved Hide resolved
user_docs/nvdaL10nUtil.py Show resolved Hide resolved
:param source: If True, fetch the source language, otherwise fetch the target language
:return: The language code
"""
namespace = {"xliff": "urn:oasis:names:tc:xliff:document:2.0"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Remove unused variable namespace.

In the fetchLanguageFromXliff function, the variable namespace is assigned but never used. Removing it will clean up the code.

Apply this diff to remove the unused variable:

-    	namespace = {"xliff": "urn:oasis:names:tc:xliff:document:2.0"}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
namespace = {"xliff": "urn:oasis:names:tc:xliff:document:2.0"}
🧰 Tools
🪛 Ruff

22-22: Local variable namespace is assigned to but never used

Remove assignment to unused variable namespace

(F841)

sconstruct Outdated
Comment on lines 735 to 740
nvdaL10nUtil = env.Command(
target=outputDir.File("nvdaL10nUtil.exe"),
source="user_docs/nvdaL10nUtil.py",
ENV=os.environ,
action=f"nuitka --standalone --onefile --output-dir=./output --include-module=mdx_truly_sane_lists --include-module=markdown_link_attr_modifier --include-module=mdx_gh_links user_docs/nvdaL10nUtil.py",
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Use outputDir variable instead of hardcoded ./output in nvdaL10nUtil command

To ensure consistency and flexibility, replace the hardcoded ./output directory with outputDir.abspath in the --output-dir parameter of the nuitka command. This change adheres to the existing build configuration and avoids potential issues with path resolution.

 nvdaL10nUtil = env.Command(
     target=outputDir.File("nvdaL10nUtil.exe"),
     source="user_docs/nvdaL10nUtil.py",
     ENV=os.environ,
-    action=f"nuitka --standalone --onefile --output-dir=./output --include-module=mdx_truly_sane_lists --include-module=markdown_link_attr_modifier --include-module=mdx_gh_links user_docs/nvdaL10nUtil.py",
+    action=f"nuitka --standalone --onefile --output-dir={outputDir.abspath} --include-module=mdx_truly_sane_lists --include-module=markdown_link_attr_modifier --include-module=mdx_gh_links user_docs/nvdaL10nUtil.py",
 )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
nvdaL10nUtil = env.Command(
target=outputDir.File("nvdaL10nUtil.exe"),
source="user_docs/nvdaL10nUtil.py",
ENV=os.environ,
action=f"nuitka --standalone --onefile --output-dir=./output --include-module=mdx_truly_sane_lists --include-module=markdown_link_attr_modifier --include-module=mdx_gh_links user_docs/nvdaL10nUtil.py",
)
nvdaL10nUtil = env.Command(
target=outputDir.File("nvdaL10nUtil.exe"),
source="user_docs/nvdaL10nUtil.py",
ENV=os.environ,
action=f"nuitka --standalone --onefile --output-dir={outputDir.abspath} --include-module=mdx_truly_sane_lists --include-module=markdown_link_attr_modifier --include-module=mdx_gh_links user_docs/nvdaL10nUtil.py",
)

user_docs/markdownTranslate.py Show resolved Hide resolved
@AppVeyorBot
Copy link

See test results for failed build of commit 6b9d864137

@seanbudd seanbudd added the conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review. label Oct 1, 2024
@AppVeyorBot
Copy link

See test results for failed build of commit f4753e7c3e

Copy link
Member

@SaschaCowley SaschaCowley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left a comment in https://github.com/nvaccess/nvda/pull/17214/files#r1785621397 which I think should fix the builds failing. Apologies for not leaving it as a review comment.

@AppVeyorBot
Copy link

See test results for failed build of commit d479586fdf

@AppVeyorBot
Copy link

  • Build execution time has reached the maximum allowed time for your plan (60 minutes).

See test results for failed build of commit 06d463da6a

… not block when asking to download dependencyWalker on appveyor.
@dpy013
Copy link
Contributor

dpy013 commented Oct 4, 2024

It is recommended to compress the uploaded file to .zip before uploading output\nvdaL10nUtil.build.

@michaelDCurran
Copy link
Member Author

@dpy013 hmm I'll change the build directory so the *.build and *.dist dirs don't go to the output directory. We only need the exe.

@wmhn1872265132
Copy link
Contributor

The nvdaL10nUtil program currently requires that it be run on the C drive, otherwise an error will occur.

Traceback (most recent call last):
File "C:\Users\WMHN1\AppData\Local\Temp\ONEFIL1\nvdaL10nUtil.py", line 192, in
File "C:\Users\WMHN1\AppData\Local\Temp\ONEFIL
1\markdownTranslate.py", line 415, in generateMarkdown
File "C:\Users\WMHN1\AppData\Local\Temp\ONEFIL~1\markdownTranslate.py", line 34, in prettyPathString
File "ntpath.py", line 766, in relpath
ValueError: path is on mount 'C:', start on mount 'F:'

@michaelDCurran michaelDCurran merged commit 281e646 into beta Oct 4, 2024
4 checks passed
@michaelDCurran michaelDCurran deleted the md2html branch October 4, 2024 07:11
@github-actions github-actions bot added this to the 2025.1 milestone Oct 4, 2024
seanbudd pushed a commit that referenced this pull request Nov 17, 2024
Fixes #17214

Summary of the issue:
#17214 changed the method of building html documents. Many .md.sub files will be generated in the process. These files are incorrectly included in the installer, increasing the installer size by about 2MB. After installing NVDA, it takes up an additional 30MB of space.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
conceptApproved Similar 'triaged' for issues, PR accepted in theory, implementation needs review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants