Skip to content

Commit 5eee82d

Browse files
authored
addresses issue #310 (#311)
* addresses issue #310 add a prettyprinter for NFAs Signed-off-by: Tim Bray <[email protected]> * update nullPrinter and CONTRIBUTING.md per feedback Signed-off-by: Tim Bray <[email protected]> * improve CONTRIBUTING.md per feedback Signed-off-by: Tim Bray <[email protected]> --------- Signed-off-by: Tim Bray <[email protected]>
1 parent b74612e commit 5eee82d

14 files changed

+330
-319
lines changed

Diff for: CONTRIBUTING.md

+84-3
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,12 @@
22

33
## Basics
44

5+
Most of this document is concerned with the mechanics of raising issues
6+
and posting Pull Requests to offer improvements to Quamina. Following
7+
this, there is a section entitled **Developing** that describes
8+
technology issues that potential contributors will face
9+
and tools that might be helpful.
10+
511
Quamina is hosted in this GitHub repository
612
at `github.com/timbray/quamina` and welcomes
713
contributions.
@@ -12,7 +18,7 @@ This is important because possibly Quamina already
1218
does what you want, in which case perhaps what’s
1319
needed is a documentation fix. Possibly the idea
1420
has been raised before but failed to convince Quamina’s
15-
maintainers. (Doesnt mean it won’t find favor now;
21+
maintainers. (Doesn't mean it won’t find favor now;
1622
times change.)
1723

1824
Assuming there is agreement that a change in Quamina
@@ -27,7 +33,7 @@ The coding style suggested by the Go community is
2733
used in Quamina. See the
2834
[style doc](https://github.com/golang/go/wiki/CodeReviewComments) for details.
2935

30-
Try to limit column width to 120 characters for both code and markdown documents
36+
Try to limit column width to 120 characters for both code and Markdown documents
3137
such as this one.
3238

3339
### Format of the Commit Message
@@ -64,7 +70,7 @@ is recommended to break up your commits using distinct prefixes.
6470

6571
### Signing commits
6672

67-
Commits should be signed (not just the `-s`signd off on”) with
73+
Commits should be signed (not just the `-s`signed off on”) with
6874
any of the [styles GitHub supports](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits).
6975
Note that you can use `git config` to arrange that your commits are
7076
automatically signed with the right key.
@@ -99,3 +105,78 @@ instructions for installing it.
99105

100106
When opening a new issue, try to roughly follow the commit message format
101107
conventions above.
108+
109+
## Developing
110+
111+
### Automata
112+
113+
Quamina works by compiling the Patterns together into a Nondeterministic
114+
Finite Automaton (NFA) which proceeds byte-at-a-time through the UTF-encoded
115+
fields and values. NFAs are nondeterministic in the sense that a byte value
116+
may cause multiple transitions to different states.
117+
118+
The general workflow, for some specific pattern type, is to write code to build
119+
an automaton that matches that type. Examples are the functions `makeStringFA()` in
120+
`value_matcher.go` and `makeShellStyleAutomaton()` in `shell_style.go`. Then,
121+
insert calls to the automaton builder in `value_matcher.go`, which is reasonably
122+
straightforward code. It takes care of merging new automata with existing ones
123+
as required.
124+
125+
### Testing
126+
127+
A straightforward way to test a new feature is exemplified by `TestLongCase()` in
128+
`shell_style_test.go`:
129+
130+
1. Make a `coreMatcher` by calling `newCoreMatcher()`
131+
2. Add patterns to it by calling `addPattern()`
132+
3. Make test data and examine matching behavior by calling `matchesForJSONEvent()`
133+
134+
### Prettyprinting NFAs
135+
136+
NFAs can be difficult to build and to debug. For this reason, code
137+
is provided in `prettyprinter.go` which produces human-readable NFA
138+
representations.
139+
140+
To use the prettyprinter, make an instance with `newPrettyPrinter()` - the only
141+
argument is a seed used to generate state numbers. Then, instead of calling
142+
`addPattern()`, call `addPatternWithPrinter()`, passing your prettyprinter into
143+
the automaton-building code. New automata are created by `valueMatcher` calls,
144+
see `value_matcher.go`. Ensure that the prettyprinter is passed to your
145+
automaton-matching code; an example of this is in the `makeShellStyleAutomaton()`
146+
function. Then, in your automaton-building code, use `prettyprinter.labelTable()`
147+
to attach meaningful labels to the states of your automaton. Then at
148+
some convenient point, call `prettyprinter.printNFA()` to generate the NFA printout;
149+
real programmers debug with Print statements.
150+
151+
### Prettyprinter output
152+
153+
`makeShellStyleAutomaton()` code has `prettyprinter` call-outs to
154+
label the states and transitions it creates, and the `TestPP()` test in
155+
`prettyprinter_test.go` uses this. The pattern being matched is `"x*9"` and
156+
the prettyprinter output is:
157+
158+
```
159+
758 [START HERE] '"' → [910 on " at 0]
160+
910 [on " at 0] 'x' → [821 gS at 2]
161+
821 [gS at 2] '9' → [551 gX on 9 at 3] / ★ → [821 gS at 2]
162+
551 [gX on 9 at 3] '"' → [937 on " at 4] / '9' → [551 gX on 9 at 3] / ★ → [821 gS at 2]
163+
937 [on " at 4] '9' → [551 gX on 9 at 3] / 'ℵ' → [820 last step at 5] / ★ → [821 gS at 2]
164+
820 [last step at 5] [1 transition(s)]
165+
```
166+
167+
Each line represents one state.
168+
169+
Each step gets a 3-digit number and a text description. The construct `★ →` represents
170+
a default transition, which occurs in the case that none of the other transitions match. The
171+
symbol `` represents the end of the input value.
172+
173+
In this particular NFA, the `makeShellStyleAutomaton` code labels states corresponding to
174+
the `*` "glob" character with text including `gS` for "glob spin" and states that escape the
175+
"glob spin" state with `gX` for "glob exit".
176+
177+
Most of the NFA-building code does not exercise the prettyprinter. Normally, you would insert
178+
such code while debugging a particular builder and remove it after completion. Since the
179+
shell-style builder is unusually complex, the prettyprinting code is retained in anticipation
180+
of future issues and progress to full regular-expression NFAs.
181+
182+

Diff for: core_matcher.go

+7-1
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,12 @@ func (m *coreMatcher) fields() *coreFields {
5252
// addPattern - the patternBytes is a JSON text which must be an object. The X is what the matcher returns to indicate
5353
// that the provided pattern has been matched. In many applications it might be a string which is the pattern's name.
5454
func (m *coreMatcher) addPattern(x X, patternJSON string) error {
55+
return m.addPatternWithPrinter(x, patternJSON, sharedNullPrinter)
56+
}
57+
58+
// addPatternWithPrinter can be called from debugging and under-development code to allow viewing pretty-printed
59+
// NFAs
60+
func (m *coreMatcher) addPatternWithPrinter(x X, patternJSON string, printer printer) error {
5561
patternFields, err := patternFromJSON([]byte(patternJSON))
5662
if err != nil {
5763
return err
@@ -97,7 +103,7 @@ func (m *coreMatcher) addPattern(x X, patternJSON string) error {
97103
case existsFalseType:
98104
ns = state.addExists(false, field)
99105
default:
100-
ns = state.addTransition(field)
106+
ns = state.addTransition(field, printer)
101107
}
102108

103109
nextStates = append(nextStates, ns...)

Diff for: field_matcher.go

+2-2
Original file line numberDiff line numberDiff line change
@@ -93,7 +93,7 @@ func (m *fieldMatcher) addExists(exists bool, field *patternField) []*fieldMatch
9393
return []*fieldMatcher{trans}
9494
}
9595

96-
func (m *fieldMatcher) addTransition(field *patternField) []*fieldMatcher {
96+
func (m *fieldMatcher) addTransition(field *patternField, printer printer) []*fieldMatcher {
9797
// we build the new updateable state in freshStart so that we can blast it in atomically once computed
9898
current := m.fields()
9999
freshStart := &fmFields{
@@ -119,7 +119,7 @@ func (m *fieldMatcher) addTransition(field *patternField) []*fieldMatcher {
119119
// cases where this doesn't happen and reduce the number of fieldMatchStates
120120
var nextFieldMatchers []*fieldMatcher
121121
for _, val := range field.vals {
122-
nextFieldMatchers = append(nextFieldMatchers, vm.addTransition(val))
122+
nextFieldMatchers = append(nextFieldMatchers, vm.addTransition(val, printer))
123123

124124
// if the val is a number, let's add a transition on the canonicalized number
125125
// TODO: Only do this if asked

Diff for: nfa.go

+4-63
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,10 @@ func traverseOneFAStep(table *smallTable, index int, val []byte, transitions []*
2121
return transitions
2222
}
2323
index++
24+
// 1. Note no effort to traverse multiple next-steps in parallel. The traversal compute is tiny and the
25+
// necessary concurrency apparatus would almost certainly outweigh it
26+
// 2. TODO: It would probably be better to implement this iteratively rather than recursively.
27+
// The recursion will potentially go as deep as the val argument is long.
2428
for _, nextStep := range nextSteps.steps {
2529
transitions = append(transitions, nextStep.fieldTransitions...)
2630
transitions = traverseOneFAStep(nextStep.table, index, val, transitions)
@@ -101,66 +105,3 @@ func mergeFAStates(state1, state2 *faState, keyMemo map[faStepKey]*faState) *faS
101105

102106
return combined
103107
}
104-
105-
/**************************************/
106-
/* debugging apparatus from here down */
107-
/**************************************/
108-
/*
109-
func (t *smallTable) dump() string {
110-
return dump1(&faState{table: t}, 0, make(map[*smallTable]bool))
111-
}
112-
func dump1(fas *faState, indent int, already map[*smallTable]bool) string {
113-
t := fas.table
114-
s := " " + st2(t) + "\n"
115-
for _, step := range t.steps {
116-
if step != nil {
117-
for _, state := range step.steps {
118-
_, ok := already[state.table]
119-
if !ok {
120-
already[state.table] = true
121-
s += dump1(state, indent+1, already)
122-
}
123-
}
124-
}
125-
}
126-
return s
127-
}
128-
func (t *smallTable) shortDump() string {
129-
return fmt.Sprintf("%d-%s", t.serial, t.label)
130-
}
131-
132-
func (n *faNext) String() string {
133-
var snames []string
134-
for _, step := range n.steps {
135-
snames = append(snames, fmt.Sprintf("%d %s", step.table.serial, step.table.label))
136-
}
137-
return "[" + strings.Join(snames, " · ") + "]"
138-
}
139-
140-
func stString(t *smallTable) string {
141-
var rows []string
142-
143-
for i := range t.ceilings {
144-
c := t.ceilings[i]
145-
if i == 0 {
146-
c = 0
147-
} else {
148-
if c != valueTerminator && c != byte(byteCeiling) {
149-
c = t.ceilings[i-1]
150-
}
151-
}
152-
var trailer string
153-
if i == len(t.ceilings)-1 && c != valueTerminator && c != byte(byteCeiling) {
154-
trailer = "…"
155-
} else {
156-
trailer = ""
157-
}
158-
if t.steps[i] != nil {
159-
rows = append(rows, fmt.Sprintf("%s%s:%s ", branchChar(c), trailer, t.steps[i].String()))
160-
} else {
161-
rows = append(rows, fmt.Sprintf("%s%s:nil ", branchChar(c), trailer))
162-
}
163-
}
164-
return fmt.Sprintf("s%d [%s] ", t.serial, t.label) + strings.Join(rows, "/ ")
165-
}
166-
*/

Diff for: nfa_test.go

+2-2
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ func TestFocusedMerge(t *testing.T) {
5858

5959
for _, shellStyle := range shellStyles {
6060
str := `"` + shellStyle + `"`
61-
automaton, matcher := makeShellStyleAutomaton([]byte(str))
61+
automaton, matcher := makeShellStyleAutomaton([]byte(str), &nullPrinter{})
6262
automata = append(automata, automaton)
6363
matchers = append(matchers, matcher)
6464
}
@@ -76,7 +76,7 @@ func TestFocusedMerge(t *testing.T) {
7676
s := statsAccum{
7777
fmVisited: make(map[*fieldMatcher]bool),
7878
vmVisited: make(map[*valueMatcher]bool),
79-
stVisited: make(map[any]bool),
79+
stVisited: make(map[*smallTable]bool),
8080
}
8181
faStats(merged, &s)
8282
fmt.Println(s.stStats())

0 commit comments

Comments
 (0)