Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Dollar-Quoted Strings (postgres + cql) #149

Merged
merged 1 commit into from
Jun 12, 2018
Merged

Fix Dollar-Quoted Strings (postgres + cql) #149

merged 1 commit into from
Jun 12, 2018

Conversation

kenshaw
Copy link
Contributor

@kenshaw kenshaw commented Jun 10, 2018

This commit refactors code from the markdown lexer into the chroma
package, and alters the PostgreSQL and CQL lexers to make use of it.

Additionally, an example markdown with the various sublexers is added.

regexp.go Outdated
for i, group := range groups[1:] {
iterators = append(iterators, emitters[i].Emit([]string{group}, lexer))
}
return Concaterator(iterators...)
})
}

// BySubBlockGroups uses the specified code and sublexerNameGroup from the
// passed groups to emit tokens.
func BySubBlockGroups(sublexerGetFunc func(string) Lexer, sublexerNameGroup, codeGroup int, emitters ...Emitter) Emitter {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename this to UsingByGroup() as that more accurately reflects its intent. Also, can you describe in more detail what the parameters do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

lexers/c/cql.go Outdated
@@ -23,6 +23,15 @@ var CassandraCQL = internal.Register(MustNewLexer(
{`(ascii|bigint|blob|boolean|counter|date|decimal|double|float|frozen|inet|int|list|map|set|smallint|text|time|timestamp|timeuuid|tinyint|tuple|uuid|varchar|varint)\b`, NameBuiltin, nil},
{Words(``, `\b`, `ADD`, `AGGREGATE`, `ALL`, `ALLOW`, `ALTER`, `AND`, `ANY`, `APPLY`, `AS`, `ASC`, `AUTHORIZE`, `BATCH`, `BEGIN`, `BY`, `CLUSTERING`, `COLUMNFAMILY`, `COMPACT`, `CONSISTENCY`, `COUNT`, `CREATE`, `CUSTOM`, `DELETE`, `DESC`, `DISTINCT`, `DROP`, `EACH_QUORUM`, `ENTRIES`, `EXISTS`, `FILTERING`, `FROM`, `FULL`, `GRANT`, `IF`, `IN`, `INDEX`, `INFINITY`, `INSERT`, `INTO`, `KEY`, `KEYS`, `KEYSPACE`, `KEYSPACES`, `LEVEL`, `LIMIT`, `LOCAL_ONE`, `LOCAL_QUORUM`, `MATERIALIZED`, `MODIFY`, `NAN`, `NORECURSIVE`, `NOSUPERUSER`, `NOT`, `OF`, `ON`, `ONE`, `ORDER`, `PARTITION`, `PASSWORD`, `PER`, `PERMISSION`, `PERMISSIONS`, `PRIMARY`, `QUORUM`, `RENAME`, `REVOKE`, `SCHEMA`, `SELECT`, `STATIC`, `STORAGE`, `SUPERUSER`, `TABLE`, `THREE`, `TO`, `TOKEN`, `TRUNCATE`, `TTL`, `TWO`, `TYPE`, `UNLOGGED`, `UPDATE`, `USE`, `USER`, `USERS`, `USING`, `VALUES`, `VIEW`, `WHERE`, `WITH`, `WRITETIME`, `REPLICATION`, `OR`, `REPLACE`, `FUNCTION`, `CALLED`, `INPUT`, `RETURNS`, `LANGUAGE`, `ROLE`, `ROLES`, `TRIGGER`, `DURABLE_WRITES`, `LOGIN`, `OPTIONS`, `LOGGED`, `SFUNC`, `STYPE`, `FINALFUNC`, `INITCOND`, `IS`, `CONTAINS`, `JSON`, `PAGING`, `OFF`), Keyword, nil},
{"[+*/<>=~!@#%^&|`?-]+", Operator, nil},
{`(?s)(java|javascript)(\s+)(AS)(\s+)('|\$\$)(.*?)(\5)`,
BySubBlockGroups(
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you pull the invocations of these out into functions with descriptive names. The number of arguments is quite long, so the intent gets a bit lost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it could be a separate func, but I think that would be more problematic, because the declaration of the regex groups and the corresponding emitters would thus be in different locations.

I don't believe I can reduce the complexity/number of groups in the regex's as it's how the "lookahead" matching gets accomplished. The original pygments code used a function callback to achieve this behavior, but a "simple" regex in the right location results in significantly less complexity, ultimately.

(compare: the pygments postgresql language_callback is roughly ~50 lines of code, while this is only 8 lines, is inline to the regexp, and is reused across multiple lexers. I'm not an advocate for "less lines is better", and I'm definitely not playing code golf here -- but I do think the version I wrote is easier to understand, even if it causes the regex to be more complex/hard-to-read)

@@ -21,8 +21,15 @@ var Markdown = internal.Register(MustNewLexer(
{`^(\s*)([*-])(\s)(.+\n)`, ByGroups(Text, Keyword, Text, UsingSelf("inline")), nil},
{`^(\s*)([0-9]+\.)( .+\n)`, ByGroups(Text, Keyword, UsingSelf("inline")), nil},
{`^(\s*>\s)(.+\n)`, ByGroups(Keyword, GenericEmph), nil},
{"^(```\\n)([\\w\\W]*?)(^```$)", ByGroups(LiteralString, Text, LiteralString), nil},
{"^(```)(\\w+)(\\n)([\\w\\W]*?)(^```$)", EmitterFunc(markdownCodeBlock), nil},
{"^(```\\n)([\\w\\W]*?)(^```$)", ByGroups(String, Text, String), nil},
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should still be LiteralString

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

String is an alias for LiteralString. Is the alias being deprecated?

This commit refactors code from the markdown lexer into the chroma
package, and alters the PostgreSQL and CQL lexers to make use of it.

Additionally, an example markdown with the various sublexers is added.
// See the lexers/m/markdown.go for the complete example.
//
// Note: panic's if the number emitters does not equal the number of matched
// groups in the regex.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great, thanks!

@alecthomas alecthomas merged commit d7b2ed2 into alecthomas:master Jun 12, 2018
mrsdizzie pushed a commit to mrsdizzie/chroma that referenced this pull request Jul 15, 2020
Fix Dollar-Quoted Strings (postgres + cql)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants