-
-
Notifications
You must be signed in to change notification settings - Fork 406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix Dollar-Quoted Strings (postgres + cql) #149
Conversation
regexp.go
Outdated
for i, group := range groups[1:] { | ||
iterators = append(iterators, emitters[i].Emit([]string{group}, lexer)) | ||
} | ||
return Concaterator(iterators...) | ||
}) | ||
} | ||
|
||
// BySubBlockGroups uses the specified code and sublexerNameGroup from the | ||
// passed groups to emit tokens. | ||
func BySubBlockGroups(sublexerGetFunc func(string) Lexer, sublexerNameGroup, codeGroup int, emitters ...Emitter) Emitter { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename this to UsingByGroup()
as that more accurately reflects its intent. Also, can you describe in more detail what the parameters do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
lexers/c/cql.go
Outdated
@@ -23,6 +23,15 @@ var CassandraCQL = internal.Register(MustNewLexer( | |||
{`(ascii|bigint|blob|boolean|counter|date|decimal|double|float|frozen|inet|int|list|map|set|smallint|text|time|timestamp|timeuuid|tinyint|tuple|uuid|varchar|varint)\b`, NameBuiltin, nil}, | |||
{Words(``, `\b`, `ADD`, `AGGREGATE`, `ALL`, `ALLOW`, `ALTER`, `AND`, `ANY`, `APPLY`, `AS`, `ASC`, `AUTHORIZE`, `BATCH`, `BEGIN`, `BY`, `CLUSTERING`, `COLUMNFAMILY`, `COMPACT`, `CONSISTENCY`, `COUNT`, `CREATE`, `CUSTOM`, `DELETE`, `DESC`, `DISTINCT`, `DROP`, `EACH_QUORUM`, `ENTRIES`, `EXISTS`, `FILTERING`, `FROM`, `FULL`, `GRANT`, `IF`, `IN`, `INDEX`, `INFINITY`, `INSERT`, `INTO`, `KEY`, `KEYS`, `KEYSPACE`, `KEYSPACES`, `LEVEL`, `LIMIT`, `LOCAL_ONE`, `LOCAL_QUORUM`, `MATERIALIZED`, `MODIFY`, `NAN`, `NORECURSIVE`, `NOSUPERUSER`, `NOT`, `OF`, `ON`, `ONE`, `ORDER`, `PARTITION`, `PASSWORD`, `PER`, `PERMISSION`, `PERMISSIONS`, `PRIMARY`, `QUORUM`, `RENAME`, `REVOKE`, `SCHEMA`, `SELECT`, `STATIC`, `STORAGE`, `SUPERUSER`, `TABLE`, `THREE`, `TO`, `TOKEN`, `TRUNCATE`, `TTL`, `TWO`, `TYPE`, `UNLOGGED`, `UPDATE`, `USE`, `USER`, `USERS`, `USING`, `VALUES`, `VIEW`, `WHERE`, `WITH`, `WRITETIME`, `REPLICATION`, `OR`, `REPLACE`, `FUNCTION`, `CALLED`, `INPUT`, `RETURNS`, `LANGUAGE`, `ROLE`, `ROLES`, `TRIGGER`, `DURABLE_WRITES`, `LOGIN`, `OPTIONS`, `LOGGED`, `SFUNC`, `STYPE`, `FINALFUNC`, `INITCOND`, `IS`, `CONTAINS`, `JSON`, `PAGING`, `OFF`), Keyword, nil}, | |||
{"[+*/<>=~!@#%^&|`?-]+", Operator, nil}, | |||
{`(?s)(java|javascript)(\s+)(AS)(\s+)('|\$\$)(.*?)(\5)`, | |||
BySubBlockGroups( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you pull the invocations of these out into functions with descriptive names. The number of arguments is quite long, so the intent gets a bit lost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it could be a separate func
, but I think that would be more problematic, because the declaration of the regex groups and the corresponding emitters would thus be in different locations.
I don't believe I can reduce the complexity/number of groups in the regex's as it's how the "lookahead" matching gets accomplished. The original pygments code used a function callback to achieve this behavior, but a "simple" regex in the right location results in significantly less complexity, ultimately.
(compare: the pygments postgresql language_callback is roughly ~50 lines of code, while this is only 8 lines, is inline to the regexp, and is reused across multiple lexers. I'm not an advocate for "less lines is better", and I'm definitely not playing code golf here -- but I do think the version I wrote is easier to understand, even if it causes the regex to be more complex/hard-to-read)
@@ -21,8 +21,15 @@ var Markdown = internal.Register(MustNewLexer( | |||
{`^(\s*)([*-])(\s)(.+\n)`, ByGroups(Text, Keyword, Text, UsingSelf("inline")), nil}, | |||
{`^(\s*)([0-9]+\.)( .+\n)`, ByGroups(Text, Keyword, UsingSelf("inline")), nil}, | |||
{`^(\s*>\s)(.+\n)`, ByGroups(Keyword, GenericEmph), nil}, | |||
{"^(```\\n)([\\w\\W]*?)(^```$)", ByGroups(LiteralString, Text, LiteralString), nil}, | |||
{"^(```)(\\w+)(\\n)([\\w\\W]*?)(^```$)", EmitterFunc(markdownCodeBlock), nil}, | |||
{"^(```\\n)([\\w\\W]*?)(^```$)", ByGroups(String, Text, String), nil}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should still be LiteralString
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String
is an alias for LiteralString
. Is the alias being deprecated?
This commit refactors code from the markdown lexer into the chroma package, and alters the PostgreSQL and CQL lexers to make use of it. Additionally, an example markdown with the various sublexers is added.
// See the lexers/m/markdown.go for the complete example. | ||
// | ||
// Note: panic's if the number emitters does not equal the number of matched | ||
// groups in the regex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great, thanks!
Fix Dollar-Quoted Strings (postgres + cql)
This commit refactors code from the markdown lexer into the chroma
package, and alters the PostgreSQL and CQL lexers to make use of it.
Additionally, an example markdown with the various sublexers is added.