-
-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Org mode lexer (https://orgmode.org) #156
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please help me out with those questions?
@@ -29,7 +29,7 @@ translators for Pygments lexers and styles. | |||
|
|||
## Supported languages | |||
|
|||
ABNF, ANTLR, APL, ActionScript, ActionScript 3, Ada, Angular2, ApacheConf, AppleScript, Awk, BNF, Ballerina, Base Makefile, Bash, Batchfile, BlitzBasic, Brainfuck, C, C#, C++, CFEngine3, CMake, COBOL, CSS, Cap'n Proto, Ceylon, ChaiScript, Cheetah, Clojure, CoffeeScript, Common Lisp, Coq, Crystal, Cython, DTD, Dart, Diff, Django/Jinja, Docker, EBNF, Elixir, Elm, EmacsLisp, Erlang, FSharp, Factor, Fish, Forth, Fortran, GAS, GDScript, GLSL, Genshi, Genshi HTML, Genshi Text, Gnuplot, Go, Go HTML Template, Go Text Template, Groovy, HTML, HTTP, Handlebars, Haskell, Haxe, Hexdump, Hy, INI, Idris, Io, JSON, JSX, Java, JavaScript, Julia, Kotlin, LLVM, Lighttpd configuration file, Lua, Mako, Mason, Mathematica, MiniZinc, Modula-2, MorrowindScript, MySQL, Myghty, NASM, Newspeak, Nginx configuration file, Nim, Nix, OCaml, Objective-C, Octave, PHP, PL/pgSQL, POVRay, PacmanConf, Perl, Pig, PkgConfig, PostScript, PostgreSQL SQL dialect, PowerShell, Prolog, Protocol Buffer, Puppet, Python, Python 3, QBasic, R, Racket, Ragel, Rexx, Ruby, Rust, SCSS, SPARQL, SQL, Sass, Scala, Scheme, Scilab, Smalltalk, Smarty, Snobol, Solidity, SquidConf, Swift, TASM, TOML, Tcl, Tcsh, TeX, Termcap, Terminfo, Terraform, Thrift, Transact-SQL, Turtle, Twig, TypeScript, TypoScript, TypoScriptCssData, TypoScriptHtmlData, VHDL, VimL, WDTE, XML, Xorg, YAML, cfstatement, markdown, reStructuredText, reg, systemverilog, verilog | |||
ABNF, ANTLR, APL, ActionScript, ActionScript 3, Ada, Angular2, ApacheConf, AppleScript, Awk, BNF, Ballerina, Base Makefile, Bash, Batchfile, BlitzBasic, Brainfuck, C, C#, C++, CFEngine3, CMake, COBOL, CSS, Cap'n Proto, Ceylon, ChaiScript, Cheetah, Clojure, CoffeeScript, Common Lisp, Coq, Crystal, Cython, DTD, Dart, Diff, Django/Jinja, Docker, EBNF, Elixir, Elm, EmacsLisp, Erlang, FSharp, Factor, Fish, Forth, Fortran, GAS, GDScript, GLSL, Genshi, Genshi HTML, Genshi Text, Gnuplot, Go, Go HTML Template, Go Text Template, Groovy, HTML, HTTP, Handlebars, Haskell, Haxe, Hexdump, Hy, INI, Idris, Io, JSON, JSX, Java, JavaScript, Julia, Kotlin, LLVM, Lighttpd configuration file, Lua, Mako, Mason, Mathematica, MiniZinc, Modula-2, MorrowindScript, MySQL, Myghty, NASM, Newspeak, Nginx configuration file, Nim, Nix, OCaml, Objective-C, Octave, Org Mode, PHP, PL/pgSQL, POVRay, PacmanConf, Perl, Pig, PkgConfig, PostScript, PostgreSQL SQL dialect, PowerShell, Prolog, Protocol Buffer, Puppet, Python, Python 3, QBasic, R, Racket, Ragel, Rexx, Ruby, Rust, SCSS, SPARQL, SQL, Sass, Scala, Scheme, Scilab, Smalltalk, Smarty, Snobol, Solidity, SquidConf, Swift, TASM, TOML, Tcl, Tcsh, TeX, Termcap, Terminfo, Terraform, Thrift, Transact-SQL, Turtle, Twig, TypeScript, TypoScript, TypoScriptCssData, TypoScriptHtmlData, VHDL, VimL, WDTE, XML, Xorg, YAML, cfstatement, markdown, reStructuredText, reg, systemverilog, verilog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want me to break this up into multiple lines auto-filled at 70 chars (or something like that)? From the diff it's not quickly evident as to what language actually got added.
lexers/o/org.go
Outdated
"root": { | ||
{`^#\s.*$`, Comment, nil}, | ||
// Headings | ||
{`^(\*+)( COMMENT)( .*)$`, ByGroups(GenericHeading, NameBuiltin, Text), nil}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was unable to have something like `^(\*)( (COMMENT|DONE))( .*)$`
working. Then we end up with 4 groups. So I changed the later part to ByGroups(GenericHeading, Text, NameBuiltin, Text)
. But that did not work. Strangely that duplicated group 3 in the final output. So "* DONE foo" will show up as "* DONEDONE".
So what's the best way to optimize these six lines?
Update: Unrelated to this.. but I have a typo here which I have locally fixed.. I meant ^(\*) ..
instead of ^(\*+) ..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nested sub-groups need to be non-capturing (ie. ( (?:COMMENT|DONE))
).
lexers/o/org.go
Outdated
{`^(\s*)([0-9]+[.)])( .+)$`, ByGroups(Text, Keyword, UsingSelf("inline")), nil}, | ||
// Blocks | ||
{`^(\s*#\+begin_src)( [^ \n]+)(.*\n)([\w\W]*?)(^\s*#\+end_src$)`, | ||
UsingByGroup( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This whole lexer borrows the structure from the Markdown lexer. So I retained this "UsingByGroup" part from there. While it seems to work, I don't understand what it's doing exactly. Why isn't ByGroups used like elsewhere? What is internal.Get doing? And later "2, 4" refer to those regex groups? What do they mean? The last part ("Comment, Comment, ..") makes sense as they look like the ByGroups args.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refer to the docs on UsingByGroup()
, it should explain things.
lexers/o/org.go
Outdated
// Keywords | ||
// Fri Jul 27 00:36:20 EDT 2018 - kmodi | ||
// Unable to get the below to work. | ||
// {`^(#\+options)(:)(((\s)([^:\n]+?)(:[^:\s\n]+))+)()$`, ByGroups(GenericStrong, Text, None, None, Text, GenericEmph, Text, Text), nil}, // Org keyword #+options |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to parse a line like:
#+options: foo1:val1 foo2:val2 ..
Here we don't know how many fooN:valN pairs we will have.. So I tried to match that with (((\s)([^:\n]+?)(:[^:\s\n]+))+)
.. trying to mean that we have at least one of those fooN:valN pairs. But I couldn't figure out how to set a different category for each for those subgroups.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll need to introduce a new lexer state and use Push("options")
. You can read the Pygments docs for more information; Chroma is identical in this regard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might have to postpone this feature for a future PR until I study up on those docs and Push
examples from other lexers in chroma. Thanks.
lexers/o/org.go
Outdated
// {`^(#\+options)(:)(((\s)([^:\n]+?)(:[^:\s\n]+))+)()$`, ByGroups(GenericStrong, Text, None, None, Text, GenericEmph, Text, Text), nil}, // Org keyword #+options | ||
{`^(#\+\w+)(:)(.*)$`, ByGroups(GenericStrong, Text, Comment), nil}, // Other Org keywords like #+title | ||
// Properties | ||
{`^\s*:PROPERTIES:\n`, Comment, nil}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way to do case-insensitive regex matches for cases like these.. where I would want to match with ":PROPERTIES: or ":properties:"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, use the (?i)
mode modifier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! That helped.
lexers/o/org.go
Outdated
// Fri Jul 27 00:36:20 EDT 2018 - kmodi | ||
// Unable to get the below to work. | ||
// {`^(#\+options)(:)(((\s)([^:\n]+?)(:[^:\s\n]+))+)()$`, ByGroups(GenericStrong, Text, None, None, Text, GenericEmph, Text, Text), nil}, // Org keyword #+options | ||
{`^(#\+\w+)(:)(.*)$`, ByGroups(GenericStrong, Text, Comment), nil}, // Other Org keywords like #+title |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, I would actually like to have the whole line to be Comment
and nest GenericStrong
only for the first group.
So something like:
#+keywords: abc def
will look like:
<Comment><GenericStrong>#+keywords</GenericStrong>: abc def</Comment>
Is that possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm understanding you correctly, I think you'll need to push a new lexer state here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though token types can't be "overlaid", if that's what you're trying to achieve. You'd have to pick one of the existing Comment*
variants, or add a new one eg. CommentStrong
.
You should just be able to run |
I am not a Go developer. Can you help me out on how to make
|
da2b09a
to
8447bb5
Compare
@alecthomas Apart from that "go test" question, this PR is now good to go from my side. |
b2f90e1
to
1f97789
Compare
1f97789
to
e4dff9a
Compare
Ah! Well, an extra special thank you for contributing in that case, I appreciate it! Use |
(that said, the CI has already run the tests and they passed) |
Thanks @kaushalmodi ! |
For my future self, I was successfully able to run the tests locally by doing:
|
Add Org mode lexer (https://orgmode.org)
Hello,
I believe I finally have sort of a workable Org mode lexer.
It is quite rough around edges and I would need your help refine that. I will post my questions inline in this PR.This PR now covers most of Org syntax.(Also, how do I verify that the test I added passes? :)
I simply took the input sample Org file, ranThanks for your comment; now I dochroma -f tokens sample.org
and manually converted the output to JSON.)chroma --json -l org org.actual > org.expected
.