You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* addresses issue #310
add a prettyprinter for NFAs
Signed-off-by: Tim Bray <[email protected]>
* update nullPrinter and CONTRIBUTING.md per feedback
Signed-off-by: Tim Bray <[email protected]>
* improve CONTRIBUTING.md per feedback
Signed-off-by: Tim Bray <[email protected]>
---------
Signed-off-by: Tim Bray <[email protected]>
Copy file name to clipboardExpand all lines: CONTRIBUTING.md
+84-3
Original file line number
Diff line number
Diff line change
@@ -2,6 +2,12 @@
2
2
3
3
## Basics
4
4
5
+
Most of this document is concerned with the mechanics of raising issues
6
+
and posting Pull Requests to offer improvements to Quamina. Following
7
+
this, there is a section entitled **Developing** that describes
8
+
technology issues that potential contributors will face
9
+
and tools that might be helpful.
10
+
5
11
Quamina is hosted in this GitHub repository
6
12
at `github.com/timbray/quamina` and welcomes
7
13
contributions.
@@ -12,7 +18,7 @@ This is important because possibly Quamina already
12
18
does what you want, in which case perhaps what’s
13
19
needed is a documentation fix. Possibly the idea
14
20
has been raised before but failed to convince Quamina’s
15
-
maintainers. (Doesn’t mean it won’t find favor now;
21
+
maintainers. (Doesn't mean it won’t find favor now;
16
22
times change.)
17
23
18
24
Assuming there is agreement that a change in Quamina
@@ -27,7 +33,7 @@ The coding style suggested by the Go community is
27
33
used in Quamina. See the
28
34
[style doc](https://github.com/golang/go/wiki/CodeReviewComments) for details.
29
35
30
-
Try to limit column width to 120 characters for both code and markdown documents
36
+
Try to limit column width to 120 characters for both code and Markdown documents
31
37
such as this one.
32
38
33
39
### Format of the Commit Message
@@ -64,7 +70,7 @@ is recommended to break up your commits using distinct prefixes.
64
70
65
71
### Signing commits
66
72
67
-
Commits should be signed (not just the `-s` “signd off on”) with
73
+
Commits should be signed (not just the `-s` “signed off on”) with
68
74
any of the [styles GitHub supports](https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits).
69
75
Note that you can use `git config` to arrange that your commits are
70
76
automatically signed with the right key.
@@ -99,3 +105,78 @@ instructions for installing it.
99
105
100
106
When opening a new issue, try to roughly follow the commit message format
101
107
conventions above.
108
+
109
+
## Developing
110
+
111
+
### Automata
112
+
113
+
Quamina works by compiling the Patterns together into a Nondeterministic
114
+
Finite Automaton (NFA) which proceeds byte-at-a-time through the UTF-encoded
115
+
fields and values. NFAs are nondeterministic in the sense that a byte value
116
+
may cause multiple transitions to different states.
117
+
118
+
The general workflow, for some specific pattern type, is to write code to build
119
+
an automaton that matches that type. Examples are the functions `makeStringFA()` in
120
+
`value_matcher.go` and `makeShellStyleAutomaton()` in `shell_style.go`. Then,
121
+
insert calls to the automaton builder in `value_matcher.go`, which is reasonably
122
+
straightforward code. It takes care of merging new automata with existing ones
123
+
as required.
124
+
125
+
### Testing
126
+
127
+
A straightforward way to test a new feature is exemplified by `TestLongCase()` in
128
+
`shell_style_test.go`:
129
+
130
+
1. Make a `coreMatcher` by calling `newCoreMatcher()`
131
+
2. Add patterns to it by calling `addPattern()`
132
+
3. Make test data and examine matching behavior by calling `matchesForJSONEvent()`
133
+
134
+
### Prettyprinting NFAs
135
+
136
+
NFAs can be difficult to build and to debug. For this reason, code
137
+
is provided in `prettyprinter.go` which produces human-readable NFA
138
+
representations.
139
+
140
+
To use the prettyprinter, make an instance with `newPrettyPrinter()` - the only
141
+
argument is a seed used to generate state numbers. Then, instead of calling
142
+
`addPattern()`, call `addPatternWithPrinter()`, passing your prettyprinter into
143
+
the automaton-building code. New automata are created by `valueMatcher` calls,
144
+
see `value_matcher.go`. Ensure that the prettyprinter is passed to your
145
+
automaton-matching code; an example of this is in the `makeShellStyleAutomaton()`
146
+
function. Then, in your automaton-building code, use `prettyprinter.labelTable()`
147
+
to attach meaningful labels to the states of your automaton. Then at
148
+
some convenient point, call `prettyprinter.printNFA()` to generate the NFA printout;
149
+
real programmers debug with Print statements.
150
+
151
+
### Prettyprinter output
152
+
153
+
`makeShellStyleAutomaton()` code has `prettyprinter` call-outs to
154
+
label the states and transitions it creates, and the `TestPP()` test in
155
+
`prettyprinter_test.go` uses this. The pattern being matched is `"x*9"` and
156
+
the prettyprinter output is:
157
+
158
+
```
159
+
758 [START HERE] '"' → [910 on " at 0]
160
+
910 [on " at 0] 'x' → [821 gS at 2]
161
+
821 [gS at 2] '9' → [551 gX on 9 at 3] / ★ → [821 gS at 2]
162
+
551 [gX on 9 at 3] '"' → [937 on " at 4] / '9' → [551 gX on 9 at 3] / ★ → [821 gS at 2]
163
+
937 [on " at 4] '9' → [551 gX on 9 at 3] / 'ℵ' → [820 last step at 5] / ★ → [821 gS at 2]
164
+
820 [last step at 5] [1 transition(s)]
165
+
```
166
+
167
+
Each line represents one state.
168
+
169
+
Each step gets a 3-digit number and a text description. The construct `★ →` represents
170
+
a default transition, which occurs in the case that none of the other transitions match. The
171
+
symbol `ℵ` represents the end of the input value.
172
+
173
+
In this particular NFA, the `makeShellStyleAutomaton` code labels states corresponding to
174
+
the `*` "glob" character with text including `gS` for "glob spin" and states that escape the
175
+
"glob spin" state with `gX` for "glob exit".
176
+
177
+
Most of the NFA-building code does not exercise the prettyprinter. Normally, you would insert
178
+
such code while debugging a particular builder and remove it after completion. Since the
179
+
shell-style builder is unusually complex, the prettyprinting code is retained in anticipation
180
+
of future issues and progress to full regular-expression NFAs.
0 commit comments