Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown: Sanitizier Configuration #9075

Merged
merged 7 commits into from
Dec 7, 2019
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions custom/conf/app.ini.sample
Original file line number Diff line number Diff line change
Expand Up @@ -877,6 +877,12 @@ SHOW_FOOTER_VERSION = true
; Show template execution time in the footer
SHOW_FOOTER_TEMPLATE_LOAD_TIME = true

[markup.sanitizer]
; The following keys can be used multiple times to define sanitation policy rules.
;ELEMENT = span
;ALLOW_ATTR = class
;REGEXP = ^(info|warning|error)$

[markup.asciidoc]
ENABLED = false
; List of file extensions that should be rendered by an external command
Expand Down
18 changes: 18 additions & 0 deletions docs/content/doc/advanced/config-cheat-sheet.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -578,6 +578,24 @@ Two special environment variables are passed to the render command:
- `GITEA_PREFIX_SRC`, which contains the current URL prefix in the `src` path tree. To be used as prefix for links.
- `GITEA_PREFIX_RAW`, which contains the current URL prefix in the `raw` path tree. To be used as prefix for image paths.


Gitea supports customizing the sanitization policy for rendered HTML. The example below will support KaTeX output from pandoc.

```ini
[markup.sanitizer]
; Pandoc renders TeX segments as <span>s with the "math" class, optionally
; with "inline" or "display" classes depending on context.
ELEMENT = span
ALLOW_ATTR = class
REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+
cipherboy marked this conversation as resolved.
Show resolved Hide resolved
```

- `ELEMENT`: The element this policy applies to. Must be non-empty.
- `ALLOW_ATTR`: The attribute this policy allows. Must be non-empty.
- `REGEXP`: A regex to match the contents of the attribute against. Must be present but may be empty for unconditional whitelisting of this attribute.

You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry.

## Time (`time`)

- `FORMAT`: Time format to diplay on UI. i.e. RFC1123 or 2006-01-02 15:04:05
Expand Down
18 changes: 18 additions & 0 deletions docs/content/doc/advanced/external-renderers.en-us.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,4 +68,22 @@ RENDER_COMMAND = rst2html.py
IS_INPUT_FILE = false
```

If your external markup relies on additional classes and attributes on the generated HTML elements, you might need to enable custom sanitizer policies. Gitea uses the [`bluemonday`](https://godoc.org/github.com/microcosm-cc/bluemonday) package as our HTML sanitizier. The example below will support [KaTeX](https://katex.org/) output from [`pandoc`](https://pandoc.org/).

```ini
[markup.sanitizer]
; Pandoc renders TeX segments as <span>s with the "math" class, optionally
; with "inline" or "display" classes depending on context.
ELEMENT = span
ALLOW_ATTR = class
REGEXP = ^\s*((math(\s+|$)|inline(\s+|$)|display(\s+|$)))+

[markup.markdown]
ENABLED = true
FILE_EXTENSIONS = .md,.markdown
RENDER_COMMAND = pandoc -f markdown -t html --katex
```

You may redefine `ELEMENT`, `ALLOW_ATTR`, and `REGEXP` multiple times; each time all three are defined is a single policy entry. All three must be defined, but `REGEXP` may be blank to allow unconditional whitelisting of that attribute.

Once your configuration changes have been made, restart Gitea to have changes take effect.
9 changes: 9 additions & 0 deletions modules/markup/sanitizer.go
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,15 @@ func ReplaceSanitizer() {

// Allow <kbd> tags for keyboard shortcut styling
sanitizer.policy.AllowElements("kbd")

// Custom keyword markup
for _, rule := range setting.ExternalSanitizerRules {
if rule.Regexp != nil {
sanitizer.policy.AllowAttrs(rule.AllowAttr).Matching(rule.Regexp).OnElements(rule.Element)
} else {
sanitizer.policy.AllowAttrs(rule.AllowAttr).OnElements(rule.Element)
}
}
}

// Sanitize takes a string that contains a HTML fragment or document and applies policy whitelist.
Expand Down
119 changes: 97 additions & 22 deletions modules/setting/markup.go
Original file line number Diff line number Diff line change
Expand Up @@ -9,11 +9,14 @@ import (
"strings"

"code.gitea.io/gitea/modules/log"

"gopkg.in/ini.v1"
)

// ExternalMarkupParsers represents the external markup parsers
var (
ExternalMarkupParsers []MarkupParser
ExternalMarkupParsers []MarkupParser
ExternalSanitizerRules []MarkupSanitizerRule
)

// MarkupParser defines the external parser configured in ini
Expand All @@ -25,42 +28,114 @@ type MarkupParser struct {
IsInputFile bool
}

// MarkupSanitizerRule defines the policy for whitelisting attributes on
// certain elements.
type MarkupSanitizerRule struct {
Element string
AllowAttr string
Regexp *regexp.Regexp
}

func newMarkup() {
extensionReg := regexp.MustCompile(`\.\w`)
for _, sec := range Cfg.Section("markup").ChildSections() {
name := strings.TrimPrefix(sec.Name(), "markup.")
if name == "" {
log.Warn("name is empty, markup " + sec.Name() + "ignored")
continue
}

extensions := sec.Key("FILE_EXTENSIONS").Strings(",")
var exts = make([]string, 0, len(extensions))
for _, extension := range extensions {
if !extensionReg.MatchString(extension) {
log.Warn(sec.Name() + " file extension " + extension + " is invalid. Extension ignored")
} else {
exts = append(exts, extension)
}
if name == "sanitizer" {
newMarkupSanitizer(name, sec)
} else {
newMarkupRenderer(name, sec)
}
}
}

func newMarkupSanitizer(name string, sec *ini.Section) {
haveElement := sec.HasKey("ELEMENT")
haveAttr := sec.HasKey("ALLOW_ATTR")
haveRegexp := sec.HasKey("REGEXP")

if !haveElement && !haveAttr && !haveRegexp {
log.Warn("Skipping empty section: markup.%s.", name)
return
}

if !haveElement || !haveAttr || !haveRegexp {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't haveRegexp be ignored as your above code?

Copy link
Contributor Author

@cipherboy cipherboy Nov 21, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you mean by "as my above code"? The first check is for a completely empty section; there's nothing to do so exit with only a warning.

Per above config-cheat-sheet description for REGEXP: "Must be present but may be empty for unconditional whitelisting of this attribute."

So if it is present but empty, haveRegexp = ini.Section.HasKey("REGEXP") will return true, but pattern = ini.Section.Key("REGEXP").ValueWithShadows()[i] == "" for some i.

This check is thus for if ELEMENT= and HAVEATTR= have been specified, but REGEXP hasn't been. (Well -- strictly it checks for if any haven't been specified having earlier checked that all haven't been specified).

This is because we aren't parsing the ini ourselves. We only have an external library that parses it for us, and gives us parsed results. Particularly, parsed results without any ordering between keys and without knowledge of where a missing key might be.

If we allowed:

[markup.sanitizer]
ELEMENT=span
HAVEATTR=class

The user might later add:

[markup.sanitizer]
ELEMENT=span
HAVEATTR=class
ELEMENT=div
HAVEATTR=class
REGEXP=^(error|warning|info)$

We wouldn't be able to tell which element/attr pair the regex belongs to, because the ini module doesn't expose neighbor key information. That'd surprise the user (a working config + another working config should equal a working config, since we say that we support adding configs to get more rules).

So I maintain this is desired behavior unless there's something I'm missing?

log.Error("Missing required keys from markup.%s. Must have all three of ELEMENT, ALLOW_ATTR, and REGEXP defined!", name)
return
}

elements := sec.Key("ELEMENT").ValueWithShadows()
allowAttrs := sec.Key("ALLOW_ATTR").ValueWithShadows()
regexps := sec.Key("REGEXP").ValueWithShadows()

if len(elements) != len(allowAttrs) ||
len(elements) != len(regexps) {
log.Error("All three keys in markup.%s (ELEMENT, ALLOW_ATTR, REGEXP) must be defined the same number of times! Got %d, %d, and %d respectively.", name, len(elements), len(allowAttrs), len(regexps))
return
}

if len(exts) == 0 {
log.Warn(sec.Name() + " file extension is empty, markup " + name + " ignored")
ExternalSanitizerRules = make([]MarkupSanitizerRule, 0, len(elements))

for index, pattern := range regexps {
if pattern == "" {
rule := MarkupSanitizerRule{
Element: elements[index],
AllowAttr: allowAttrs[index],
Regexp: nil,
}
ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
continue
}

command := sec.Key("RENDER_COMMAND").MustString("")
if command == "" {
log.Warn(" RENDER_COMMAND is empty, markup " + name + " ignored")
// Validate when parsing the config that this is a valid regular
// expression. Then we can use regexp.MustCompile(...) later.
compiled, err := regexp.Compile(pattern)
if err != nil {
log.Error("In module.%s: REGEXP at definition %d failed to compile: %v", name, index+1, err)
continue
}

ExternalMarkupParsers = append(ExternalMarkupParsers, MarkupParser{
Enabled: sec.Key("ENABLED").MustBool(false),
MarkupName: name,
FileExtensions: exts,
Command: command,
IsInputFile: sec.Key("IS_INPUT_FILE").MustBool(false),
})
rule := MarkupSanitizerRule{
Element: elements[index],
AllowAttr: allowAttrs[index],
Regexp: compiled,
}
ExternalSanitizerRules = append(ExternalSanitizerRules, rule)
}
}

func newMarkupRenderer(name string, sec *ini.Section) {
extensionReg := regexp.MustCompile(`\.\w`)

extensions := sec.Key("FILE_EXTENSIONS").Strings(",")
var exts = make([]string, 0, len(extensions))
for _, extension := range extensions {
if !extensionReg.MatchString(extension) {
log.Warn(sec.Name() + " file extension " + extension + " is invalid. Extension ignored")
} else {
exts = append(exts, extension)
}
}

if len(exts) == 0 {
log.Warn(sec.Name() + " file extension is empty, markup " + name + " ignored")
return
}

command := sec.Key("RENDER_COMMAND").MustString("")
if command == "" {
log.Warn(" RENDER_COMMAND is empty, markup " + name + " ignored")
return
}

ExternalMarkupParsers = append(ExternalMarkupParsers, MarkupParser{
Enabled: sec.Key("ENABLED").MustBool(false),
MarkupName: name,
FileExtensions: exts,
Command: command,
IsInputFile: sec.Key("IS_INPUT_FILE").MustBool(false),
})
}