Skip to content

Commit

Permalink
small fixes
Browse files Browse the repository at this point in the history
  • Loading branch information
dirkroorda committed Jun 26, 2020
1 parent 0f9f635 commit b5a5e40
Show file tree
Hide file tree
Showing 25 changed files with 426 additions and 320 deletions.
97 changes: 19 additions & 78 deletions build.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,15 +3,20 @@
import re
from glob import glob

from time import sleep
from shutil import rmtree
from subprocess import run, call, Popen, PIPE

import errno
import time
import unicodedata

from tf.core.helpers import console
from pdocs import console, pdoc3serve, pdoc3, shipDocs

ORG = "annotation"
REPO = "text-fabric"
PKG = "tf"
PACKAGE = "text-fabric"
SCRIPT = "/Library/Frameworks/Python.framework/Versions/3.7/bin/{PACKAGE}"

DIST = "dist"

Expand All @@ -28,14 +33,11 @@
),
)

URL = "https://annotation.github.io/text-fabric/"
AN_BASE = os.path.expanduser("~/github/annotation")
AN_BASE = os.path.expanduser(f"~/github/{ORG}")
TUT_BASE = f"{AN_BASE}/tutorials"
TF_BASE = f"{AN_BASE}/text-fabric"
TF_BASE = f"{AN_BASE}/{REPO}"
TEST_BASE = f"{TF_BASE}/test"
APP_BASE = f"{TF_BASE}/apps"
PACKAGE = "text-fabric"
SCRIPT = "/Library/Frameworks/Python.framework/Versions/3.7/bin/text-fabric"

SRC = "site"
REMOTE = "origin"
Expand Down Expand Up @@ -78,7 +80,7 @@
ship : build for shipping
apps : commit and push all tf apps
tut : commit and push the tutorials repo
a : open text-fabric browser on specific dataset
a : open {PACKAGE} browser on specific dataset
({appStr})
t : run test suite (relations, qperf)
data : build data files for github release
Expand Down Expand Up @@ -379,73 +381,12 @@ def ghp_import():
return result, dec(err)


def gh_deploy():
(result, error) = ghp_import()
if not result:
print("Failed to deploy to GitHub with error: \n%s", error)
raise SystemExit(1)
else:
print("Your documentation should shortly be available at: " + URL)


# END COPIED FROM MKDOCS AND MODIFIED


PDOC3 = [
"pdoc3",
"--force",
"--html",
"--output-dir",
"site",
"--template-dir",
"docs/templates",
]
PDOC3STR = " ".join(PDOC3)


def pdoc3serve():
"""Build the docs into site and serve them.
"""

proc = Popen([*PDOC3, "--http", ":", "tf"])
sleep(1)
run("open http://localhost:8080/tf", shell=True)
try:
proc.wait()
except KeyboardInterrupt:
pass
proc.terminate()


def pdoc3():
"""Build the docs into site.
"""

cmdLines = [
"rm -rf site",
f"{PDOC3STR} tf",
"mv site/tf/* site",
"rmdir site/tf",
"cp -r docs/images site",
"touch site/.nojekyll",
]
console("Build docs")
for cmdLine in cmdLines:
print(cmdLine)
run(cmdLine, shell=True)


def shipDocs():
"""Build the docs into site and ship them.
"""

pdoc3()
gh_deploy()


def tfbrowse(dataset, remaining):
rargs = " ".join(remaining)
cmdLine = f"text-fabric {dataset} {rargs}"
cmdLine = f"{PACKAGE} {dataset} {rargs}"
try:
run(cmdLine, shell=True)
except KeyboardInterrupt:
Expand Down Expand Up @@ -478,7 +419,7 @@ def clean():
run(["python3", "setup.py", "develop", "-u"])
if os.path.exists(SCRIPT):
os.unlink(SCRIPT)
run(["pip3", "uninstall", "-y", "text-fabric"])
run(["pip3", "uninstall", "-y", PACKAGE])


def main():
Expand All @@ -490,11 +431,11 @@ def main():
elif task == "t":
tftest(msg, remaining)
elif task == "docs":
pdoc3serve()
pdoc3serve(PKG)
elif task == "pdocs":
pdoc3()
pdoc3(PKG)
elif task == "sdocs":
shipDocs()
shipDocs(ORG, REPO, PKG)
elif task == "clean":
clean()
elif task == "l":
Expand All @@ -503,7 +444,7 @@ def main():
elif task == "lp":
clean()
run(["python3", "setup.py", "sdist"])
distFiles = glob("dist/text-fabric-*.tar.gz")
distFiles = glob(f"dist/{PACKAGE}-*.tar.gz")
run(["pip3", "install", distFiles[0]])
elif task == "i":
clean
Expand All @@ -516,11 +457,11 @@ def main():
"--no-index",
"--find-links",
f'file://{TF_BASE}/dist"',
"text-fabric",
PACKAGE,
]
)
elif task == "g":
shipDocs()
shipDocs(ORG, REPO, PKG)
commit(task, msg)
elif task == "apps":
commitApps(msg)
Expand All @@ -539,7 +480,7 @@ def main():
answer = input("right version ? [yn]")
if answer != "y":
return
shipDocs()
shipDocs(ORG, REPO, PKG)
makeDist()
commit(task, msg)

Expand Down
Binary file not shown.
161 changes: 161 additions & 0 deletions docs/About/displaydesign.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
# Display design

In Text-Fabric we want to display pieces of corpus material in insightful ways.

We have implemented two types of displaying:

* **plain**: almost as the plain text of nodes, but with optional in-line
formatting
* **pretty**: almost as a tree-bank view, but for the fact that the text objects
are not merely trees, but graphs.

In both types of display it is possible to optionally show node numbers to the
relevant pieces of text, and to highlight nodes.

In addition to highlighting, the displays can be tweaked by passing a fair number of options,
in order to show of hide features of nodes, call in additional graphics, show or hide
intermediate levels, etc.

Whatever we want to display, we have to display in HTML, which is basically a
hierarchically organized set of presentation elements.

But a node and its constellation of relevant neighbourhood nodes
does not have a hierarchical structure, in general, that is.

The unravel algorithm solves the problem of turning a node and its associated piece
of the textual graph into a tree of node fragments in such a way that the order
of the slots is preserved.

![unravel](../images/DisplayDesign/DisplayDesign.001.png)

Unravelling is the core of the display algorithm in Text-Fabric.
When nodes violate the hierarchy, the algorithm *chunks* and *fragments* them
and *stacks* the fragments into a tree.

See `tf.advanced.unravel`.

This tree of fragments can then be transformed in various kinds of HTML with rather
straightforward code, see `tf.advanced.render.render`.


Here is an account of how *unravel* works and which challenges it has to meet.

## Neighbourhood

When we display a node, we consider all the slots to which this node is linked,
and then collect all other nodes in the corpus that share one or more of these slots.
(see `tf.core.locality.Locality.i`).

!!! explanation "with some subtleties"
We exclude some nodes from the neighbourhood, such as lexeme nodes, which have
characteristics that require special treatment.
We also exclude nodes of types that have a higher rank (read on).

### Descendant types

Node types are ranked: node types whose nodes occupy more slots on average have a higher rank
than types whose nodes occupy less slots on average.
You can inspect the ranking of the types in your dataset by `tf.core.nodes.Nodes.otypeRank`.

For each node type, we collect the set of descendant types: the types with lower or equal rank.
So each type is its own descendant. But we prevent the slot type from being its own
descendant.

## Discontinuity and chunking

The first problematic thing of nodes is that they can be linked to discontinuous sets
of slots, in other words: nodes may have gaps.
When nodes have gaps, and their holes are filled with other nodes, there is no way of
walking through the nodes one by one and generating HTML boxes for them without
mixing up the order of the slots in the final display.

Here is an example from the Hebrew Bible:

![discontinuity](../images/DisplayDesign/DisplayDesign.002.png)

> [Genesis 4:14](https://shebanq.ancient-data.org/hebrew/text?book=Genesis&chapter=4&verse=14&version=c&mr=m&qw=q&tp=txt_p&tr=hb&wget=v&qget=v&nget=vt),
> example taken from this [notebook](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/zz_test/030-bhsa.ipynb).
You see a sentence fragment with two clauses, of which the second is engulfed by the first
one, while remaining completely disjoint.

![chunking](../images/DisplayDesign/DisplayDesign.003.png)

We divide each node in our neighbourhood into its maximal contiguous chunks.
Such chunk are specified by tuples `(n, b, e)`, where `n` is the node (an integer),
and `b` is the first slot of the chunk and `e` its last slot.

When we display nodes, we will typically generate solid borders at node boundaries and
dotted borders at internal chunk boundaries.

## Overlapping hierarchy and fragmenting

Chunks of nodes do not necessarily respect the borders of chunks of nodes that are higher in the
tentative hierarchy.

Here is an example from a corpus of Old Babylonian letters (cuneiform):

![overlap](../images/DisplayDesign/DisplayDesign.004.png)

> [Tablet P509373 reverse:6'](https://cdli.ucla.edu/search/search_results.php?SearchMode=Text&ObjectID=P509373),
> example taken from [notebook](https://nbviewer.jupyter.org/github/annotation/tutorials/blob/master/zz_test/062-obb-clusters.ipynb).
Here you see a cluster that does not respect a word boundary.

![fragmenting](../images/DisplayDesign/DisplayDesign.005.png)

We use the word boundary to break up the cluster in question into two *fragments*.
A *fragment* is, like a chunk, a continuous part of a node, but not necessarily maximal.

We fragment all node chunks in our neighbourhood.

!!! explanation "with some subtleties"
We work from higher levels to lower levels: node chunks of higher levels fragment
node chunks of lower levels, not vice versa.
And for nodes at the same level: bigger nodes chunk smaller nodes, not vice versa.

### Levels

As an example why levels are important, see Genesis 4:14 again.

![levels](../images/DisplayDesign/DisplayDesign.006.png)

In the Hebrew Bible, as encoded in the
[BHSA](https://github.com/ETCBC/bhsa), the usual sequence of division is:
sentence, sentence atom, clause, clause atom, phrase, phrase atom word.
Look at the middle clause. It coincides with its clause atom, phrase and phrase atom.
Without ranking information, Text-Fabric cannot know which of these is the outer node and which
the inner node.
Text-Fabric has computed this information when it loaded the corpus for the first time,
based on the average size of nodes. It is also possible that the corpus designer has overridden
this by an explicit ranking in the settings of the TF-app of the corpus.

We end up with a rather fine partition of all nodes in fragments, in such a way
that no fragment crosses the boundaries of enclosing fragments.

### Canonical order

Before we feed fragments to the display, we sort them in *canonical order*, based on their
slots and node type. The following criteria will be checked *in that order*:

* Chunks have different begin slots: those with earlier first slots have precedence;
* Chunks with nodes with higher ranked types have precedence;
* Look at the slots the chunks do *not* have in common.
The chunk with the earlier such slot has precedence.
* Chunks with nodes that are smaller as integer have precedence.

See `tf.core.nodes.Nodes.sortKeyChunk`.

### Stacking

When we have a list of canonically ordered fragments, we can stack them into a tree.
Each new fragment is tried against the right-most branch of the tree under construction,
from bottom to top.
If there is no place on that branch, a new right-most branch is started.

![stack](../images/DisplayDesign/DisplayDesign.007.png)

### Output

When we render a tree of fragments, we produce output for the fragments, one by one.
For each fragment, the output consists of a contribution by the node of the fragment.
13 changes: 12 additions & 1 deletion docs/About/releases.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,12 +46,23 @@ text-fabric appName:hot

### 8.3

#### 8.3.4

2020-06-26

Various small fixes:

* Fix in result display in TF browser: the members of a result
form a row again instead of a column.
* Better error message in some cases in `tf.convert.walker`.
* Moved documentation of the walker functions into the docstrings of those functions.

#### 8.3.3

Small fix by Cody Kingham: when calling `use(api=...)` with an TF api constructed
before, the `TF` attribute of this api is not transported to the app object.

2020-06-11
2020-06-13

#### 8.3.1, 8.3.2

Expand Down
5 changes: 5 additions & 0 deletions docs/advanced/display.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,3 +39,8 @@ See `tf.advanced.options` for a list of display parameters.

Both `pretty` and `plain` are implemented as a call to the
`tf.advanced.render.render` function.

## See also

All about the nature and implementation of the display algorithm is in
`tf.about.displaydesign`.
Loading

0 comments on commit b5a5e40

Please sign in to comment.