Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Org mode lexer (https://orgmode.org) #156

Merged
merged 1 commit into from
Jul 31, 2018

Conversation

kaushalmodi
Copy link
Contributor

@kaushalmodi kaushalmodi commented Jul 27, 2018

Hello,

I believe I finally have sort of a workable Org mode lexer.

It is quite rough around edges and I would need your help refine that. I will post my questions inline in this PR. This PR now covers most of Org syntax.

(Also, how do I verify that the test I added passes? :)

I simply took the input sample Org file, ran chroma -f tokens sample.org and manually converted the output to JSON.) Thanks for your comment; now I do chroma --json -l org org.actual > org.expected.

Copy link
Contributor Author

@kaushalmodi kaushalmodi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please help me out with those questions?

@@ -29,7 +29,7 @@ translators for Pygments lexers and styles.

## Supported languages

ABNF, ANTLR, APL, ActionScript, ActionScript 3, Ada, Angular2, ApacheConf, AppleScript, Awk, BNF, Ballerina, Base Makefile, Bash, Batchfile, BlitzBasic, Brainfuck, C, C#, C++, CFEngine3, CMake, COBOL, CSS, Cap'n Proto, Ceylon, ChaiScript, Cheetah, Clojure, CoffeeScript, Common Lisp, Coq, Crystal, Cython, DTD, Dart, Diff, Django/Jinja, Docker, EBNF, Elixir, Elm, EmacsLisp, Erlang, FSharp, Factor, Fish, Forth, Fortran, GAS, GDScript, GLSL, Genshi, Genshi HTML, Genshi Text, Gnuplot, Go, Go HTML Template, Go Text Template, Groovy, HTML, HTTP, Handlebars, Haskell, Haxe, Hexdump, Hy, INI, Idris, Io, JSON, JSX, Java, JavaScript, Julia, Kotlin, LLVM, Lighttpd configuration file, Lua, Mako, Mason, Mathematica, MiniZinc, Modula-2, MorrowindScript, MySQL, Myghty, NASM, Newspeak, Nginx configuration file, Nim, Nix, OCaml, Objective-C, Octave, PHP, PL/pgSQL, POVRay, PacmanConf, Perl, Pig, PkgConfig, PostScript, PostgreSQL SQL dialect, PowerShell, Prolog, Protocol Buffer, Puppet, Python, Python 3, QBasic, R, Racket, Ragel, Rexx, Ruby, Rust, SCSS, SPARQL, SQL, Sass, Scala, Scheme, Scilab, Smalltalk, Smarty, Snobol, Solidity, SquidConf, Swift, TASM, TOML, Tcl, Tcsh, TeX, Termcap, Terminfo, Terraform, Thrift, Transact-SQL, Turtle, Twig, TypeScript, TypoScript, TypoScriptCssData, TypoScriptHtmlData, VHDL, VimL, WDTE, XML, Xorg, YAML, cfstatement, markdown, reStructuredText, reg, systemverilog, verilog
ABNF, ANTLR, APL, ActionScript, ActionScript 3, Ada, Angular2, ApacheConf, AppleScript, Awk, BNF, Ballerina, Base Makefile, Bash, Batchfile, BlitzBasic, Brainfuck, C, C#, C++, CFEngine3, CMake, COBOL, CSS, Cap'n Proto, Ceylon, ChaiScript, Cheetah, Clojure, CoffeeScript, Common Lisp, Coq, Crystal, Cython, DTD, Dart, Diff, Django/Jinja, Docker, EBNF, Elixir, Elm, EmacsLisp, Erlang, FSharp, Factor, Fish, Forth, Fortran, GAS, GDScript, GLSL, Genshi, Genshi HTML, Genshi Text, Gnuplot, Go, Go HTML Template, Go Text Template, Groovy, HTML, HTTP, Handlebars, Haskell, Haxe, Hexdump, Hy, INI, Idris, Io, JSON, JSX, Java, JavaScript, Julia, Kotlin, LLVM, Lighttpd configuration file, Lua, Mako, Mason, Mathematica, MiniZinc, Modula-2, MorrowindScript, MySQL, Myghty, NASM, Newspeak, Nginx configuration file, Nim, Nix, OCaml, Objective-C, Octave, Org Mode, PHP, PL/pgSQL, POVRay, PacmanConf, Perl, Pig, PkgConfig, PostScript, PostgreSQL SQL dialect, PowerShell, Prolog, Protocol Buffer, Puppet, Python, Python 3, QBasic, R, Racket, Ragel, Rexx, Ruby, Rust, SCSS, SPARQL, SQL, Sass, Scala, Scheme, Scilab, Smalltalk, Smarty, Snobol, Solidity, SquidConf, Swift, TASM, TOML, Tcl, Tcsh, TeX, Termcap, Terminfo, Terraform, Thrift, Transact-SQL, Turtle, Twig, TypeScript, TypoScript, TypoScriptCssData, TypoScriptHtmlData, VHDL, VimL, WDTE, XML, Xorg, YAML, cfstatement, markdown, reStructuredText, reg, systemverilog, verilog
Copy link
Contributor Author

@kaushalmodi kaushalmodi Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to break this up into multiple lines auto-filled at 70 chars (or something like that)? From the diff it's not quickly evident as to what language actually got added.

lexers/o/org.go Outdated
"root": {
{`^#\s.*$`, Comment, nil},
// Headings
{`^(\*+)( COMMENT)( .*)$`, ByGroups(GenericHeading, NameBuiltin, Text), nil},
Copy link
Contributor Author

@kaushalmodi kaushalmodi Jul 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unable to have something like `^(\*)( (COMMENT|DONE))( .*)$` working. Then we end up with 4 groups. So I changed the later part to ByGroups(GenericHeading, Text, NameBuiltin, Text). But that did not work. Strangely that duplicated group 3 in the final output. So "* DONE foo" will show up as "* DONEDONE".

So what's the best way to optimize these six lines?

Update: Unrelated to this.. but I have a typo here which I have locally fixed.. I meant ^(\*) .. instead of ^(\*+) ..

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nested sub-groups need to be non-capturing (ie. ( (?:COMMENT|DONE))).

lexers/o/org.go Outdated
{`^(\s*)([0-9]+[.)])( .+)$`, ByGroups(Text, Keyword, UsingSelf("inline")), nil},
// Blocks
{`^(\s*#\+begin_src)( [^ \n]+)(.*\n)([\w\W]*?)(^\s*#\+end_src$)`,
UsingByGroup(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole lexer borrows the structure from the Markdown lexer. So I retained this "UsingByGroup" part from there. While it seems to work, I don't understand what it's doing exactly. Why isn't ByGroups used like elsewhere? What is internal.Get doing? And later "2, 4" refer to those regex groups? What do they mean? The last part ("Comment, Comment, ..") makes sense as they look like the ByGroups args.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refer to the docs on UsingByGroup(), it should explain things.

lexers/o/org.go Outdated
// Keywords
// Fri Jul 27 00:36:20 EDT 2018 - kmodi
// Unable to get the below to work.
// {`^(#\+options)(:)(((\s)([^:\n]+?)(:[^:\s\n]+))+)()$`, ByGroups(GenericStrong, Text, None, None, Text, GenericEmph, Text, Text), nil}, // Org keyword #+options
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to parse a line like:

#+options: foo1:val1 foo2:val2 ..

Here we don't know how many fooN:valN pairs we will have.. So I tried to match that with (((\s)([^:\n]+?)(:[^:\s\n]+))+).. trying to mean that we have at least one of those fooN:valN pairs. But I couldn't figure out how to set a different category for each for those subgroups.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll need to introduce a new lexer state and use Push("options"). You can read the Pygments docs for more information; Chroma is identical in this regard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might have to postpone this feature for a future PR until I study up on those docs and Push examples from other lexers in chroma. Thanks.

lexers/o/org.go Outdated
// {`^(#\+options)(:)(((\s)([^:\n]+?)(:[^:\s\n]+))+)()$`, ByGroups(GenericStrong, Text, None, None, Text, GenericEmph, Text, Text), nil}, // Org keyword #+options
{`^(#\+\w+)(:)(.*)$`, ByGroups(GenericStrong, Text, Comment), nil}, // Other Org keywords like #+title
// Properties
{`^\s*:PROPERTIES:\n`, Comment, nil},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a way to do case-insensitive regex matches for cases like these.. where I would want to match with ":PROPERTIES: or ":properties:"?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, use the (?i) mode modifier.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! That helped.

lexers/o/org.go Outdated
// Fri Jul 27 00:36:20 EDT 2018 - kmodi
// Unable to get the below to work.
// {`^(#\+options)(:)(((\s)([^:\n]+?)(:[^:\s\n]+))+)()$`, ByGroups(GenericStrong, Text, None, None, Text, GenericEmph, Text, Text), nil}, // Org keyword #+options
{`^(#\+\w+)(:)(.*)$`, ByGroups(GenericStrong, Text, Comment), nil}, // Other Org keywords like #+title
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I would actually like to have the whole line to be Comment and nest GenericStrong only for the first group.

So something like:

#+keywords: abc def

will look like:

<Comment><GenericStrong>#+keywords</GenericStrong>: abc def</Comment>

Is that possible?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I'm understanding you correctly, I think you'll need to push a new lexer state here too.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though token types can't be "overlaid", if that's what you're trying to achieve. You'd have to pick one of the existing Comment* variants, or add a new one eg. CommentStrong.

@alecthomas
Copy link
Owner

(Also, how do I verify that the test I added passes? :) I simply took the input sample Org file, ran chroma -f tokens sample.org and manually converted the output to JSON.)

You should just be able to run go test ./... (also FYI you can use chroma --json to output JSON formatted tokens).

@kaushalmodi
Copy link
Contributor Author

You should just be able to run go test ./...

I am not a Go developer. Can you help me out on how to make go test ./... work? Right now, I get this cannot find package "github.com/alecthomas/assert" error.

km²~/go.apps/:../github.com/alecthomas/chroma> pwd 
/home/kmodi/go.apps/src/github.com/alecthomas/chroma

km²~/go.apps/:../github.com/alecthomas/chroma> ls
COPYING      coalesce_test.go  doc.go        lexer_test.go        quick/          style.go
README.md    colour.go         formatter.go  lexers/              regexp.go       style_test.go
_tools/      colour_test.go    formatters/   mutators.go          regexp_test.go  styles/
cmd/         delegate.go       iterator.go   mutators_test.go     remap.go        tokentype_string.go
coalesce.go  delegate_test.go  lexer.go      pygments-lexers.txt  remap_test.go   types.go

km²~/go.apps/:../github.com/alecthomas/chroma> go test ./...
# github.com/alecthomas/chroma
coalesce_test.go:6:2: cannot find package "github.com/alecthomas/assert" in any of:
        /home/kmodi/go/src/github.com/alecthomas/assert (from $GOROOT)
        /home/kmodi/go.apps/src/github.com/alecthomas/assert (from $GOPATH)
FAIL    github.com/alecthomas/chroma [setup failed]
# github.com/alecthomas/chroma/formatters
coalesce_test.go:6:2: cannot find package "github.com/alecthomas/assert" in any of:
        /home/kmodi/go/src/github.com/alecthomas/assert (from $GOROOT)
        /home/kmodi/go.apps/src/github.com/alecthomas/assert (from $GOPATH)
FAIL    github.com/alecthomas/chroma/formatters [setup failed]
# github.com/alecthomas/chroma/formatters/html
coalesce_test.go:6:2: cannot find package "github.com/alecthomas/assert" in any of:
        /home/kmodi/go/src/github.com/alecthomas/assert (from $GOROOT)
        /home/kmodi/go.apps/src/github.com/alecthomas/assert (from $GOPATH)
FAIL    github.com/alecthomas/chroma/formatters/html [setup failed]
# github.com/alecthomas/chroma/lexers
coalesce_test.go:6:2: cannot find package "github.com/alecthomas/assert" in any of:
        /home/kmodi/go/src/github.com/alecthomas/assert (from $GOROOT)
        /home/kmodi/go.apps/src/github.com/alecthomas/assert (from $GOPATH)
FAIL    github.com/alecthomas/chroma/lexers [setup failed]
# github.com/alecthomas/chroma/lexers/g
coalesce_test.go:6:2: cannot find package "github.com/alecthomas/assert" in any of:
        /home/kmodi/go/src/github.com/alecthomas/assert (from $GOROOT)
        /home/kmodi/go.apps/src/github.com/alecthomas/assert (from $GOPATH)
FAIL    github.com/alecthomas/chroma/lexers/g [setup failed]
?       github.com/alecthomas/chroma/cmd/chroma [no test files]
?       github.com/alecthomas/chroma/lexers/a   [no test files]
?       github.com/alecthomas/chroma/lexers/b   [no test files]
?       github.com/alecthomas/chroma/lexers/c   [no test files]
?       github.com/alecthomas/chroma/lexers/circular    [no test files]
?       github.com/alecthomas/chroma/lexers/d   [no test files]
?       github.com/alecthomas/chroma/lexers/e   [no test files]
?       github.com/alecthomas/chroma/lexers/f   [no test files]
?       github.com/alecthomas/chroma/lexers/h   [no test files]
?       github.com/alecthomas/chroma/lexers/i   [no test files]
?       github.com/alecthomas/chroma/lexers/internal    [no test files]
?       github.com/alecthomas/chroma/lexers/j   [no test files]
?       github.com/alecthomas/chroma/lexers/k   [no test files]
?       github.com/alecthomas/chroma/lexers/l   [no test files]
?       github.com/alecthomas/chroma/lexers/m   [no test files]
?       github.com/alecthomas/chroma/lexers/n   [no test files]
?       github.com/alecthomas/chroma/lexers/o   [no test files]
?       github.com/alecthomas/chroma/lexers/p   [no test files]
?       github.com/alecthomas/chroma/lexers/q   [no test files]
?       github.com/alecthomas/chroma/lexers/r   [no test files]
?       github.com/alecthomas/chroma/lexers/s   [no test files]
?       github.com/alecthomas/chroma/lexers/t   [no test files]
?       github.com/alecthomas/chroma/lexers/v   [no test files]
?       github.com/alecthomas/chroma/lexers/w   [no test files]
?       github.com/alecthomas/chroma/lexers/x   [no test files]
?       github.com/alecthomas/chroma/lexers/y   [no test files]
ok      github.com/alecthomas/chroma/quick      (cached) [no tests to run]
?       github.com/alecthomas/chroma/styles     [no test files]

@kaushalmodi kaushalmodi changed the title [WIP] Add Org mode lexer (https://orgmode.org) Add Org mode lexer (https://orgmode.org) Jul 31, 2018
@kaushalmodi
Copy link
Contributor Author

@alecthomas Apart from that "go test" question, this PR is now good to go from my side.

@kaushalmodi kaushalmodi force-pushed the add-org-mode branch 2 times, most recently from b2f90e1 to 1f97789 Compare July 31, 2018 18:44
@alecthomas
Copy link
Owner

I am not a Go developer. Can you help me out on how to make go test ./... work? Right now, I get this cannot find package "github.com/alecthomas/assert" error.

Ah! Well, an extra special thank you for contributing in that case, I appreciate it!

Use go get -t ./... to pull down test dependencies.

@alecthomas
Copy link
Owner

(that said, the CI has already run the tests and they passed)

@alecthomas alecthomas merged commit 3fb10fb into alecthomas:master Jul 31, 2018
@alecthomas
Copy link
Owner

Thanks @kaushalmodi !

@kaushalmodi
Copy link
Contributor Author

For my future self, I was successfully able to run the tests locally by doing:

cd ~/go.apps/src/github.com/alecthomas/chroma/
go get -t ./...
go test ./lexers # As the lexer tests are in that dir

@kaushalmodi kaushalmodi deleted the add-org-mode branch August 2, 2018 16:36
mrsdizzie pushed a commit to mrsdizzie/chroma that referenced this pull request Jul 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants