Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

caddyfile: Expose Tokenize function for modules to use the lexer #3549

Merged
merged 1 commit into from
Jul 20, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions caddyconfig/caddyfile/lexer.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ package caddyfile

import (
"bufio"
"bytes"
"io"
"unicode"
)
Expand Down Expand Up @@ -168,3 +169,21 @@ func (l *lexer) next() bool {
val = append(val, ch)
}
}

// Tokenize takes bytes as input and lexes it into
// a list of tokens that can be parsed as a Caddyfile.
// Also takes a filename to fill the token's File as
// the source of the tokens, which is important to
// determine relative paths for `import` directives.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a scary warning that this function is experimental and is not guaranteed to be stable.

Copy link
Member Author

@francislavoie francislavoie Jul 6, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's experimental... It's the exact same code that was in parser.go and also in the tests (with I guess very minor differences due to types)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whether it's experimental or not, we need to say that it's not stable ("experimental" is a good consistent term) because I wasn't ever expecting to export this API when I designed it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand how it wouldn't be stable if it's actually used by the Caddyfile parser itself.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just mean that right now we can change it whenever we want, because we know it's unexported and nobody else is using it. I just try to avoid exporting things.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in the other PR, this is not a must-have.
We can give it some time and see how caddy-docker-proxy code will evolve.
And also see if the lexer will change that often in caddy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucaslorentz Is Tokenize() the right function to export? Or do you need env var replacements, I can't remember. If you need env var replacements, maybe allTokens is better: https://github.com/caddyserver/caddy/pull/3549/files#diff-06ecc7b9618d6f991e3d8846d817e0dcR85

Copy link
Member Author

@francislavoie francislavoie Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty certain we don't want env var replacement to happen because the step that caddy-docker-proxy takes is just rebuilding a valid Caddyfile from the labels the user inputs. The env vars can be replaced later when Caddy actually loads that rebuilt config. If env var replacement was done at that stage, then it would use the env of the controller (the part that is doing the reading of labels off the docker containers), which might not be the correct stuff.

Copy link

@lucaslorentz lucaslorentz Jul 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't need want env var replacements.

I have my own in memory representation of caddyfile, that doesn't depend on plugins structs and is still easy to manipulate.
https://github.com/lucaslorentz/caddy-docker-proxy/blob/master/plugin/caddyfile/caddyfile.go

I also have a tiny parser, that parses all tokens into my in memory caddyfile struct:
https://github.com/lucaslorentz/caddy-docker-proxy/blob/master/plugin/caddyfile/marshal.go#L164

Currently, the tokenization is an overlap between caddy implementation and my own implementation. Thus, we're considering exposing that feature.

But, another option, is to expose in caddy a function similar to the existing caddyfile.Parse, but that keeps env variables and snippets untouched. Then I could change my parser to read ServerBlocks returned by parseCaddyfile and tokens from each ServerBlocks.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, okay. Thanks!

func Tokenize(input []byte, filename string) ([]Token, error) {
mholt marked this conversation as resolved.
Show resolved Hide resolved
l := lexer{}
if err := l.load(bytes.NewReader(input)); err != nil {
return nil, err
}
var tokens []Token
for l.next() {
l.token.File = filename
tokens = append(tokens, l.token)
}
return tokens, nil
}
84 changes: 37 additions & 47 deletions caddyconfig/caddyfile/lexer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -15,37 +15,35 @@
package caddyfile

import (
"log"
"strings"
"testing"
)

type lexerTestCase struct {
input string
input []byte
expected []Token
}

func TestLexer(t *testing.T) {
testCases := []lexerTestCase{
{
input: `host:123`,
input: []byte(`host:123`),
expected: []Token{
{Line: 1, Text: "host:123"},
},
},
{
input: `host:123
input: []byte(`host:123

directive`,
directive`),
expected: []Token{
{Line: 1, Text: "host:123"},
{Line: 3, Text: "directive"},
},
},
{
input: `host:123 {
input: []byte(`host:123 {
directive
}`,
}`),
expected: []Token{
{Line: 1, Text: "host:123"},
{Line: 1, Text: "{"},
Expand All @@ -54,7 +52,7 @@ func TestLexer(t *testing.T) {
},
},
{
input: `host:123 { directive }`,
input: []byte(`host:123 { directive }`),
expected: []Token{
{Line: 1, Text: "host:123"},
{Line: 1, Text: "{"},
Expand All @@ -63,12 +61,12 @@ func TestLexer(t *testing.T) {
},
},
{
input: `host:123 {
input: []byte(`host:123 {
#comment
directive
# comment
foobar # another comment
}`,
}`),
expected: []Token{
{Line: 1, Text: "host:123"},
{Line: 1, Text: "{"},
Expand All @@ -78,10 +76,10 @@ func TestLexer(t *testing.T) {
},
},
{
input: `host:123 {
input: []byte(`host:123 {
# hash inside string is not a comment
redir / /some/#/path
}`,
}`),
expected: []Token{
{Line: 1, Text: "host:123"},
{Line: 1, Text: "{"},
Expand All @@ -92,14 +90,14 @@ func TestLexer(t *testing.T) {
},
},
{
input: "# comment at beginning of file\n# comment at beginning of line\nhost:123",
input: []byte("# comment at beginning of file\n# comment at beginning of line\nhost:123"),
expected: []Token{
{Line: 3, Text: "host:123"},
},
},
{
input: `a "quoted value" b
foobar`,
input: []byte(`a "quoted value" b
foobar`),
expected: []Token{
{Line: 1, Text: "a"},
{Line: 1, Text: "quoted value"},
Expand All @@ -108,15 +106,15 @@ func TestLexer(t *testing.T) {
},
},
{
input: `A "quoted \"value\" inside" B`,
input: []byte(`A "quoted \"value\" inside" B`),
expected: []Token{
{Line: 1, Text: "A"},
{Line: 1, Text: `quoted "value" inside`},
{Line: 1, Text: "B"},
},
},
{
input: "An escaped \"newline\\\ninside\" quotes",
input: []byte("An escaped \"newline\\\ninside\" quotes"),
expected: []Token{
{Line: 1, Text: "An"},
{Line: 1, Text: "escaped"},
Expand All @@ -125,7 +123,7 @@ func TestLexer(t *testing.T) {
},
},
{
input: "An escaped newline\\\noutside quotes",
input: []byte("An escaped newline\\\noutside quotes"),
expected: []Token{
{Line: 1, Text: "An"},
{Line: 1, Text: "escaped"},
Expand All @@ -135,7 +133,7 @@ func TestLexer(t *testing.T) {
},
},
{
input: "line1\\\nescaped\nline2\nline3",
input: []byte("line1\\\nescaped\nline2\nline3"),
expected: []Token{
{Line: 1, Text: "line1"},
{Line: 1, Text: "escaped"},
Expand All @@ -144,7 +142,7 @@ func TestLexer(t *testing.T) {
},
},
{
input: "line1\\\nescaped1\\\nescaped2\nline4\nline5",
input: []byte("line1\\\nescaped1\\\nescaped2\nline4\nline5"),
expected: []Token{
{Line: 1, Text: "line1"},
{Line: 1, Text: "escaped1"},
Expand All @@ -154,34 +152,34 @@ func TestLexer(t *testing.T) {
},
},
{
input: `"unescapable\ in quotes"`,
input: []byte(`"unescapable\ in quotes"`),
expected: []Token{
{Line: 1, Text: `unescapable\ in quotes`},
},
},
{
input: `"don't\escape"`,
input: []byte(`"don't\escape"`),
expected: []Token{
{Line: 1, Text: `don't\escape`},
},
},
{
input: `"don't\\escape"`,
input: []byte(`"don't\\escape"`),
expected: []Token{
{Line: 1, Text: `don't\\escape`},
},
},
{
input: `un\escapable`,
input: []byte(`un\escapable`),
expected: []Token{
{Line: 1, Text: `un\escapable`},
},
},
{
input: `A "quoted value with line
input: []byte(`A "quoted value with line
break inside" {
foobar
}`,
}`),
expected: []Token{
{Line: 1, Text: "A"},
{Line: 1, Text: "quoted value with line\n\t\t\t\t\tbreak inside"},
Expand All @@ -191,21 +189,21 @@ func TestLexer(t *testing.T) {
},
},
{
input: `"C:\php\php-cgi.exe"`,
input: []byte(`"C:\php\php-cgi.exe"`),
expected: []Token{
{Line: 1, Text: `C:\php\php-cgi.exe`},
},
},
{
input: `empty "" string`,
input: []byte(`empty "" string`),
expected: []Token{
{Line: 1, Text: `empty`},
{Line: 1, Text: ``},
{Line: 1, Text: `string`},
},
},
{
input: "skip those\r\nCR characters",
input: []byte("skip those\r\nCR characters"),
expected: []Token{
{Line: 1, Text: "skip"},
{Line: 1, Text: "those"},
Expand All @@ -214,37 +212,37 @@ func TestLexer(t *testing.T) {
},
},
{
input: "\xEF\xBB\xBF:8080", // test with leading byte order mark
input: []byte("\xEF\xBB\xBF:8080"), // test with leading byte order mark
expected: []Token{
{Line: 1, Text: ":8080"},
},
},
{
input: "simple `backtick quoted` string",
input: []byte("simple `backtick quoted` string"),
expected: []Token{
{Line: 1, Text: `simple`},
{Line: 1, Text: `backtick quoted`},
{Line: 1, Text: `string`},
},
},
{
input: "multiline `backtick\nquoted\n` string",
input: []byte("multiline `backtick\nquoted\n` string"),
expected: []Token{
{Line: 1, Text: `multiline`},
{Line: 1, Text: "backtick\nquoted\n"},
{Line: 3, Text: `string`},
},
},
{
input: "nested `\"quotes inside\" backticks` string",
input: []byte("nested `\"quotes inside\" backticks` string"),
expected: []Token{
{Line: 1, Text: `nested`},
{Line: 1, Text: `"quotes inside" backticks`},
{Line: 1, Text: `string`},
},
},
{
input: "reverse-nested \"`backticks` inside\" quotes",
input: []byte("reverse-nested \"`backticks` inside\" quotes"),
expected: []Token{
{Line: 1, Text: `reverse-nested`},
{Line: 1, Text: "`backticks` inside"},
Expand All @@ -254,22 +252,14 @@ func TestLexer(t *testing.T) {
}

for i, testCase := range testCases {
actual := tokenize(testCase.input)
actual, err := Tokenize(testCase.input, "")
if err != nil {
t.Errorf("%v", err)
}
lexerCompare(t, i, testCase.expected, actual)
}
}

func tokenize(input string) (tokens []Token) {
l := lexer{}
if err := l.load(strings.NewReader(input)); err != nil {
log.Printf("[ERROR] load failed: %v", err)
}
for l.next() {
tokens = append(tokens, l.token)
}
return
}

func lexerCompare(t *testing.T, n int, expected, actual []Token) {
if len(expected) != len(actual) {
t.Errorf("Test case %d: expected %d token(s) but got %d", n, len(expected), len(actual))
Expand Down
8 changes: 1 addition & 7 deletions caddyconfig/caddyfile/parse.go
Original file line number Diff line number Diff line change
Expand Up @@ -87,16 +87,10 @@ func allTokens(filename string, input []byte) ([]Token, error) {
if err != nil {
return nil, err
}
l := new(lexer)
err = l.load(bytes.NewReader(input))
tokens, err := Tokenize(input, filename)
if err != nil {
return nil, err
}
var tokens []Token
for l.next() {
l.token.File = filename
tokens = append(tokens, l.token)
}
return tokens, nil
}

Expand Down