-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
The Two Stage Solution #24
Comments
Sounds excellent. I have some alternative preferences about names for the options, but that's trivial and can be worked out as implementations fall in place. |
Notice that we can do preprocessing before the tokenize_dcg call, if it's useful. eg. say the string handling doesn't handle a backslash escape that's in the language being parsed. annies_tokenize -->
fix_oddball_string_escape,
tokenize_dcg(TokenizeOpts),
strip_comments(CommentsOpts),
make_strings(StringOpts),
make_numbers(NumOpts). |
Late last night Shon and I ended with a discussion that 'tokenize' itself might go away - just have a string of these. Each stage passes on anything that's not 'what it wants', which includes anything not a number for most stages. So, raw input
after passing through tokenize_words
Eventually we're rehashing already tokenized stuff. Eg we might have a |
We may still want |
because of the size of this, we decided not to move forward during the hack day. Work on this is checked into the twostage branch, expected to hang around a while. |
A big issue that came up is the issue that it's an offline solution - you don't have the original form, and don't have file location information. The parser will need location info when it prints error messages. |
This turns out to be what the current solution does, sorta. it's offline as well. |
Radical idea - we leave tokenize doing more or less what it's doing - perhaps less
It doesn't recognize strings, comments, numbers, etc.
We provide a separate set of downstream filters as part of the pack.
strip_comments takes some options and will either strip comments or turn them into tokens, or perhaps filter them. Maybe it returns the noncomments on a similar stream.
strip_whitespace
takes some options, and will do things like strip the space tokens, pack just them, provide the indent level, etc.make_strings takes some options and makes a
string/1
token out of bits of the string. This might be painful, as we've already lost authorial form.make_numbers parses numbers
and so on
If we make the options the first argument for all these, and make a
tokenize_dcg//1
that just moves the options arg to front, we can do chains with dcgs
The text was updated successfully, but these errors were encountered: