A simple command line script for getting a more accurate word count on LaTeX projects. This is basically a wrapper for detex | wc
with support for configuration files for projects, so that a word count can be obtained from the terminal simply be entering texwc
.
Put texwc
on your path (e.g. ~/bin
).
detex
(sudo apt install texlive-extra-utils
, or see OpenDetex for a more recent version)
- From the terminal, run
texwc [path]
, wherepath
is the path of a.tex
file. - To get a word count from multiple files, specify the path of a
.texwc
config file forpath
, or the path of a directory containing a config file. If no value is specified forpath
, the current working directory is used. - Config files can be generated with the
-i
option (see below for details). - By default, the
\input
and\include
LaTeX commands are ignored. This is to allow control over which included files should be counted (e.g., appendices and title pages are usually not included in a word count). To expand these commands, use the--with-includes
option. - The output will show line, word and character counts for each specified file as well as a total:
$ texwc LINES WORDS CHARS FILE 35 595 3965 chapters/01_introduction 285 5370 33619 chapters/02_background 220 3002 18913 chapters/03_methodology 339 4106 25191 chapters/04_implementation 305 1669 10659 chapters/05_results 25 814 5156 chapters/06_conclusion 1209 15556 97503 TOTAL
The following options can be specified to modify the behaviour of the script:
-h
/--help
: Print help message with usage information.-i
/--init
: Initialise directory with a default config file containing all.tex
files in this directory.-r
/--recursive
: Recursively include.tex
files in subdirectories when initialising (only with-i
).-w
/--with-includes
: Expand\input
and\include
commands. (This takes precedence overdetex-options
.)-p
/--print-text
: Print output ofdetex
instead of word count. This can be useful to ensure that the correct text is included in the word count, e.g. that the right environments are being ignored.--plain
: Print in plain text, without formatting by ANSI escape sequences.
A .texwc
config file contains a JSON object representing configuration options. The fields of this file are:
"files"
(required): A list of relative.tex
file paths to be included in the word count."detex-options"
: A list of options to be passed todetex
. See thedetex
documentation for details."ignore-envs"
: A list of environments to exclude from the word count. (Please note thatdetex
only allows 10 environments to be included in this list, so you may need to remove environments you don't use.)
Here is an example of a typical .texwc
file:
{
"detex-options": [
"-l",
"-n"
],
"ignore-envs": [
"array",
"eqnarray",
"equation",
"figure",
"table",
"verbatim",
"lstlisting",
"sidewaystable"
],
"files": [
"chapters/01_introduction",
"chapters/02_background",
"chapters/03_methodology",
"chapters/04_implementation",
"chapters/05_results",
"chapters/06_conclusion"
]
}