Skip to content

Commit 45ab705

Browse files
authored
Add text stats to manuscript tool (#1717)
2 parents 30bedb2 + acf8170 commit 45ab705

29 files changed

+1063
-226
lines changed

.gitignore

+3
Original file line numberDiff line numberDiff line change
@@ -50,3 +50,6 @@ ToC.txt
5050
# Coverage
5151
/.coverage
5252
/coverage.*
53+
54+
# Other
55+
/test.py

docs/source/index.rst

+9-2
Original file line numberDiff line numberDiff line change
@@ -66,7 +66,6 @@ with pip. See :ref:`a_started` for more details.
6666
usage_format
6767
usage_shortcuts
6868
usage_typography
69-
usage_projectformat
7069

7170
.. toctree::
7271
:maxdepth: 1
@@ -80,7 +79,15 @@ with pip. See :ref:`a_started` for more details.
8079

8180
.. toctree::
8281
:maxdepth: 1
83-
:caption: Additional Topics
82+
:caption: Additional Details
83+
:hidden:
84+
85+
more_projectformat
86+
more_counting
87+
88+
.. toctree::
89+
:maxdepth: 1
90+
:caption: Technical Topics
8491
:hidden:
8592

8693
tech_locations

docs/source/more_counting.rst

+98
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,98 @@
1+
.. _a_counting:
2+
3+
********************
4+
Word and Text Counts
5+
********************
6+
7+
This is an overview of how words and other counts of your text are performed. The counting rules
8+
should be relatively standard, and are compared to LibreOffice Writer rules.
9+
10+
The counts provided in the app on the raw text is meant to be approximate. For more accurate
11+
counts, you need to build your manuscript in the :guilabel:`Manuscript Tool` and check the counts
12+
on the generated preview.
13+
14+
15+
Text Word Counts and Stats
16+
==========================
17+
18+
These are the rules for the main counts available for for each document in a project.
19+
20+
For all counts, the following rules apply.
21+
22+
#. Short (–) and long (—) dashes are considered word separators.
23+
#. Any line starting with ``%`` or ``@`` is ignored.
24+
#. Trailing white spaces are ignored, including line breaks.
25+
#. Leading ``>`` and trailing ``<`` are ignored with any spaces next to them.
26+
#. Valid shortcodes and other commands wrapped in brackets ``[]`` are ignored.
27+
#. In-line Markdown syntax in text paragraphs is treated as part of the text.
28+
29+
After the above preparation of the text, the following counts are available.
30+
31+
**Character Count**
32+
The character count is the sum of characters per line, including leading and in-text white space
33+
characters, but excluding trailing white space characters. Shortcodes in the text are not
34+
included, but Markdown codes are. Only headers and text are counted.
35+
36+
**Word Count**
37+
The words count is the sum of blocks of continuous character per line separated by any number of
38+
white space characters or dashes. Only headers and text are counted.
39+
40+
**Paragraph Count**
41+
The paragraph count is the number of text blocks separated by one or more empty line. A line
42+
consisting only of white spaces is considered empty.
43+
44+
45+
Manuscript Counts
46+
=================
47+
48+
These are the rules for the counts available for a manuscript in the :guilabel:`Manuscript Tool`.
49+
The rules have been tuned to agree with LibreOffice Writer, but will vary slightly depending on the
50+
content of your text. LibreOffice Writer also counts the text in the page header, which the
51+
Manuscript Tool does not.
52+
53+
The content of each line is counted after all formatting has been processed, so the result will be
54+
more accurate than the counts for text documents elsewhere in the app. The following rules apply:
55+
56+
#. Short (–) and long (—) dashes are considered word separators.
57+
#. Leading and trailing white spaces are generally included, but paragraph breaks are not.
58+
#. Hard line breaks within paragraph are considered white space characters.
59+
#. All formatting codes are ignored, including shortcodes, commands and Markdown.
60+
#. Scene and section separators are counted.
61+
#. Comments and meta data lines are counted after they are formatted.
62+
#. Headers are counted after they are formatted with custom formats.
63+
64+
The following counts are available:
65+
66+
**Header Count**
67+
The number of headers in the manuscript.
68+
69+
**Paragraph Count**
70+
The number of body text paragraphs in the manuscript.
71+
72+
**Total Word Count**
73+
The number of words in the manuscript, including any comments and meta data text.
74+
75+
**Text Word Count**
76+
The number of words in body text paragraphs, excluding all other text.
77+
78+
**Header Word Count**
79+
The number of words in headers, including inserted formatting like chapter numbers, etc.
80+
81+
**Total Character Count**
82+
The number of characters on all lines, including any comments and meta data text. Paragraph
83+
breaks are not counted, but in-paragraph hard line breaks are.
84+
85+
**Text Character Count**
86+
The number of characters in body text paragraphs. Paragraph breaks are not counted, but
87+
in-paragraph hard line breaks are.
88+
89+
**Header Character Count**
90+
The number of characters in headings.
91+
92+
**Text Words Character Count**
93+
The number of characters in body text paragraphs considered part of a word or punctuation. That
94+
is, white space characters are not counted.
95+
96+
**Header Words Character Count**
97+
The number of characters in headers considered part of a word or punctuation. That is, white
98+
space characters are not counted.

novelwriter/assets/icons/typicons_dark/icons.conf

+2
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,8 @@ status_time = typ_stopwatch-grey.svg
100100
sticky-off = typ_pin-outline.svg
101101
sticky-on = typ_pin.svg
102102
unchecked = mixed_input-unchecked.svg
103+
unfold-hide = typ_arrow-right.svg
104+
unfold-show = typ_arrow-down.svg
103105
up = typ_chevron-up.svg
104106
view = typ_eye.svg
105107
view_build = typ_export-grey.svg
Loading
Loading

novelwriter/assets/icons/typicons_light/icons.conf

+2
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,8 @@ status_time = typ_stopwatch-grey.svg
100100
sticky-off = typ_pin-outline.svg
101101
sticky-on = typ_pin.svg
102102
unchecked = mixed_input-unchecked.svg
103+
unfold-hide = typ_arrow-right.svg
104+
unfold-show = typ_arrow-down.svg
103105
up = typ_chevron-up.svg
104106
view = typ_eye.svg
105107
view_build = typ_export-grey.svg
Loading
Loading

novelwriter/constants.py

-5
Original file line numberDiff line numberDiff line change
@@ -23,8 +23,6 @@
2323
"""
2424
from __future__ import annotations
2525

26-
import re
27-
2826
from PyQt5.QtCore import QCoreApplication, QT_TRANSLATE_NOOP
2927

3028
from novelwriter.enum import nwBuildFmt, nwItemClass, nwItemLayout, nwOutline
@@ -70,9 +68,6 @@ class nwRegEx:
7068
FMT_SC = r"(?i)(?<!\\)(\[[\/\!]?(?:i|b|s|u|m|sup|sub)\])"
7169
FMT_SV = r"(?<!\\)(\[(?i)(?:fn|footnote):)(.+?)(?<!\\)(\])"
7270

73-
# Pre-Compiled RegEx
74-
RX_SC = re.compile(FMT_SC)
75-
7671
# END Class nwRegEx
7772

7873

novelwriter/core/docbuild.py

+7-2
Original file line numberDiff line numberDiff line change
@@ -52,14 +52,15 @@ class NWBuildDocument:
5252
manuscript, based on a build definition object (BuildSettings).
5353
"""
5454

55-
__slots__ = ("_project", "_build", "_queue", "_error", "_cache")
55+
__slots__ = ("_project", "_build", "_queue", "_error", "_cache", "_count")
5656

57-
def __init__(self, project: NWProject, build: BuildSettings) -> None:
57+
def __init__(self, project: NWProject, build: BuildSettings, doCount: bool = False) -> None:
5858
self._project = project
5959
self._build = build
6060
self._queue = []
6161
self._error = None
6262
self._cache = None
63+
self._count = doCount
6364
return
6465

6566
##
@@ -314,11 +315,15 @@ def _doBuild(self, bldObj: Tokenizer, tHandle: str, convert: bool = True) -> boo
314315
bldObj.addRootHeading(tHandle)
315316
if convert:
316317
bldObj.doConvert()
318+
if self._count:
319+
bldObj.countStats()
317320
elif tItem.isFileType():
318321
bldObj.setText(tHandle)
319322
bldObj.doPreProcessing()
320323
bldObj.tokenizeText()
321324
bldObj.doHeaders()
325+
if self._count:
326+
bldObj.countStats()
322327
if convert:
323328
bldObj.doConvert()
324329
else:

novelwriter/core/index.py

+4-87
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@
33
===========================
44
55
File History:
6-
Created: 2019-04-22 [0.0.1] countWords
76
Created: 2019-05-27 [0.1.4] NWIndex
87
Created: 2022-05-28 [2.0rc1] IndexItem
98
Created: 2022-05-28 [2.0rc1] IndexHeading
@@ -40,7 +39,8 @@
4039
from novelwriter.enum import nwComment, nwItemClass, nwItemType, nwItemLayout
4140
from novelwriter.error import logException
4241
from novelwriter.common import checkInt, isHandle, isItemClass, isTitleTag, jsonEncode
43-
from novelwriter.constants import nwFiles, nwKeyWords, nwRegEx, nwUnicode, nwHeaders
42+
from novelwriter.constants import nwFiles, nwKeyWords, nwHeaders
43+
from novelwriter.text.counting import standardCounter
4444

4545
if TYPE_CHECKING: # pragma: no cover
4646
from novelwriter.core.item import NWItem
@@ -266,7 +266,7 @@ def scanText(self, tHandle: str, text: str, blockSignal: bool = False) -> bool:
266266
self._itemIndex.add(tHandle, tItem)
267267

268268
# Run word counter for the whole text
269-
cC, wC, pC = countWords(text)
269+
cC, wC, pC = standardCounter(text)
270270
tItem.setCharCount(cC)
271271
tItem.setWordCount(wC)
272272
tItem.setParaCount(pC)
@@ -400,7 +400,7 @@ def _splitHeading(self, line: str) -> tuple[str, str]:
400400

401401
def _indexWordCounts(self, tHandle: str, text: str, sTitle: str) -> None:
402402
"""Count text stats and save the counts to the index."""
403-
cC, wC, pC = countWords(text)
403+
cC, wC, pC = standardCounter(text)
404404
self._itemIndex.setHeadingCounts(tHandle, sTitle, cC, wC, pC)
405405
return
406406

@@ -1315,86 +1315,3 @@ def processComment(text: str) -> tuple[nwComment, str, int]:
13151315
if content and (clean := classifier.strip().lower()) in CLASSIFIERS:
13161316
return CLASSIFIERS[clean], content.strip(), text.find(":") + 1
13171317
return nwComment.PLAIN, check, 0
1318-
1319-
1320-
def countWords(text: str) -> tuple[int, int, int]:
1321-
"""Count words in a piece of text, skipping special syntax and
1322-
comments.
1323-
"""
1324-
charCount = 0
1325-
wordCount = 0
1326-
paraCount = 0
1327-
prevEmpty = True
1328-
1329-
if not isinstance(text, str):
1330-
return charCount, wordCount, paraCount
1331-
1332-
# We need to treat dashes as word separators for counting words.
1333-
# The check+replace approach is much faster than direct replace for
1334-
# large texts, and a bit slower for small texts, but in the latter
1335-
# case it doesn't really matter.
1336-
if nwUnicode.U_ENDASH in text:
1337-
text = text.replace(nwUnicode.U_ENDASH, " ")
1338-
if nwUnicode.U_EMDASH in text:
1339-
text = text.replace(nwUnicode.U_EMDASH, " ")
1340-
1341-
# Strip shortcodes
1342-
if "[" in text:
1343-
text = nwRegEx.RX_SC.sub("", text)
1344-
1345-
for line in text.splitlines():
1346-
1347-
countPara = True
1348-
1349-
if not line:
1350-
prevEmpty = True
1351-
continue
1352-
1353-
if line[0] == "@" or line[0] == "%":
1354-
continue
1355-
1356-
if line[0] == "[":
1357-
check = line.lower()
1358-
if check.startswith(("[newpage]", "[new page]", "[vspace]")):
1359-
continue
1360-
elif check.startswith("[vspace:") and line.endswith("]"):
1361-
continue
1362-
1363-
elif line[0] == "#":
1364-
if line[:5] == "#### ":
1365-
line = line[5:]
1366-
countPara = False
1367-
elif line[:4] == "### ":
1368-
line = line[4:]
1369-
countPara = False
1370-
elif line[:3] == "## ":
1371-
line = line[3:]
1372-
countPara = False
1373-
elif line[:2] == "# ":
1374-
line = line[2:]
1375-
countPara = False
1376-
elif line[:3] == "#! ":
1377-
line = line[3:]
1378-
countPara = False
1379-
elif line[:4] == "##! ":
1380-
line = line[4:]
1381-
countPara = False
1382-
1383-
elif line[0] == ">" or line[-1] == "<":
1384-
if line[:2] == ">>":
1385-
line = line[2:].lstrip(" ")
1386-
elif line[:1] == ">":
1387-
line = line[1:].lstrip(" ")
1388-
if line[-2:] == "<<":
1389-
line = line[:-2].rstrip(" ")
1390-
elif line[-1:] == "<":
1391-
line = line[:-1].rstrip(" ")
1392-
1393-
wordCount += len(line.split())
1394-
charCount += len(line)
1395-
if countPara and prevEmpty:
1396-
paraCount += 1
1397-
1398-
prevEmpty = not countPara
1399-
1400-
return charCount, wordCount, paraCount

novelwriter/core/tohtml.py

+4-4
Original file line numberDiff line numberDiff line change
@@ -368,13 +368,12 @@ def replaceTabs(self, nSpaces: int = 8, spaceChar: str = "&nbsp;") -> None:
368368

369369
def getStyleSheet(self) -> list[str]:
370370
"""Generate a stylesheet for the current settings."""
371-
styles = []
372371
if not self._cssStyles:
373-
return styles
372+
return []
374373

375374
mScale = self._lineHeight/1.15
376-
textAlign = "justify" if self._doJustify else "left"
377375

376+
styles = []
378377
styles.append("body {{font-family: '{0:s}'; font-size: {1:d}pt;}}".format(
379378
self._textFont, self._textSize
380379
))
@@ -384,7 +383,7 @@ def getStyleSheet(self) -> list[str]:
384383
"margin-top: {2:.2f}em; margin-bottom: {3:.2f}em;"
385384
"}}"
386385
).format(
387-
textAlign,
386+
"justify" if self._doJustify else "left",
388387
round(100 * self._lineHeight),
389388
mScale * self._marginText[0],
390389
mScale * self._marginText[1],
@@ -449,6 +448,7 @@ def getStyleSheet(self) -> list[str]:
449448
))
450449

451450
styles.append("a {color: rgb(66, 113, 174);}")
451+
styles.append("mark {background: rgb(255, 255, 166);}")
452452
styles.append(".tags {color: rgb(245, 135, 31); font-weight: bold;}")
453453
styles.append(".break {text-align: left;}")
454454
styles.append(".synopsis {font-style: italic;}")

0 commit comments

Comments
 (0)