Japanese Text Analyzer

Analysis tool for ocr files in Mokuro processed manga. Also supports miscellaneous files.

Usage

japanese_text_analyzer directory_or_file_path OPTIONS

Options

--mokurojson (Default): Searches only for .json files in the specified path.

Note: The Mokuro _ocr json files must be present.
--mokuro: Searches only for .mokuro files in the specified path.

Note: The Mokuro .mokuro files must be present.
--any: Searches for all files in the specified path.
--any=EXTENSION: Searches for all files matching the file extension in the specified path.

Examples

japanese_text_analyzer ./mokuro_manga_path/

japanese_text_analyzer "./example path/" --any

japanese_text_analyzer "./example path/" --any=.html

Sample Output

analysis.txt (Stats on the analyzed text)

./sample_manga/
----------------------------------------------------------------------------
Number of Japanese characters: 43811
Number of kanji characters: 10952
Number of unique kanji: 1082
Number of unique kanji appearing only once: 285 (26.34% of unique kanji)
Number of words in total: 25204
Number of unique words: 3519 (13.96% of all words)
Number of words appearing only once: 2018 (57.35% of unique words)
Average volume length in characters: 14603 (3 total volumes)
Average page length in characters: 103 (422 total pages)
Average textbox length in characters: 11 (shortest: 1) (longest: 254) (4302 total textboxes)

word_list.csv (Deduped list of words along with the number of times they were found in the analyzed text)

て	831
の	805
に	710
た	702
です	555
は	528
で	521
が	508
ん	504
... (3510 more lines)

word_list_raw.csv (Unsorted list of words found in the analyzed text)

まぁ
まぁ
話し
て
き
まし
... (25198 more lines)

Building

Linux:

./setup.sh
cargo build --release

Windows:

setup.bat
cargo build --release

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
.github/workflows		.github/workflows
src		src
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
setup.bat		setup.bat
setup.sh		setup.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Japanese Text Analyzer

Usage

Options

Examples

Sample Output

Building

About

Releases 8

Sponsor this project

Packages

Languages

License

Kuuuube/japanese_text_analyzer

Folders and files

Latest commit

History

Repository files navigation

Japanese Text Analyzer

Usage

Options

Examples

Sample Output

Building

About

Resources

License

Stars

Watchers

Forks

Releases 8

Sponsor this project

Packages 0

Languages

Packages