Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate to comrak/Rust #185

Merged
merged 15 commits into from
Aug 5, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 7 additions & 2 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ on:
types: [opened, synchronize, reopened]
push:
branches:
- production
- main

jobs:
test:
Expand All @@ -16,7 +16,7 @@ jobs:
- ubuntu-latest
- macos-latest
- windows-latest
ruby-version: [3.1, 3.0.0, 2.7.2, 2.6.6]
ruby-version: [3.1, 3.0]

runs-on: ${{ matrix.os }}

Expand All @@ -31,6 +31,11 @@ jobs:
ruby-version: ${{ matrix.ruby-version }}
bundler-cache: true # 'bundle install' and cache

- name: Set Cargo triple
if: runner.os == 'Windows'
shell: bash
run: rustup default stable-x86_64-pc-windows-gnu

- name: Run ${{ matrix.os }} tests
shell: bash
run: script/cibuild
5 changes: 3 additions & 2 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,13 @@
/InstalledFiles
/pkg/
/spec/reports/
/test/tmp/
/test/version_tmp/
/tmp/
tmp/
/vendor/gems
/vendor/cache
Gemfile.lock
*.log
ports/

## Specific to RubyMotion:
.dat*
Expand Down
4 changes: 0 additions & 4 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -1,4 +0,0 @@
[submodule "ext/commonmarker/cmark-upstream"]
path = ext/commonmarker/cmark-upstream
url = https://github.com/github/cmark-gfm.git
ignore = dirty
1 change: 0 additions & 1 deletion .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ inherit_gem:
- config/minitest.yml

AllCops:
TargetRubyVersion: 2.6
Exclude:
- "ext/**/*"
- "vendor/**/*"
Expand Down
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -7,5 +7,6 @@ gemspec
group :benchmark do
gem "benchmark-ips"
gem "kramdown"
gem "kramdown-parser-gfm"
gem "redcarpet"
end
26 changes: 0 additions & 26 deletions Makefile

This file was deleted.

204 changes: 25 additions & 179 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,98 +29,12 @@ Call `render_html` on a string to convert it to HTML:

``` ruby
require 'commonmarker'
CommonMarker.render_html('Hi *there*', :DEFAULT)
CommonMarker.to_html('Hi *there*', :DEFAULT)
# <p>Hi <em>there</em></p>\n
```

The second argument is optional--[see below](#options) for more information.

### Generating a document

You can also parse a string to receive a `Document` node. You can then print that node to HTML, iterate over the children, and other fun node stuff. For example:

``` ruby
require 'commonmarker'

doc = CommonMarker.render_doc('*Hello* world', :DEFAULT)
puts(doc.to_html) # <p>Hi <em>there</em></p>\n

doc.walk do |node|
puts node.type # [:document, :paragraph, :text, :emph, :text]
end
```

The second argument is optional--[see below](#options) for more information.

#### Example: walking the AST

You can use `walk` or `each` to iterate over nodes:

- `walk` will iterate on a node and recursively iterate on a node's children.
- `each` will iterate on a node and its children, but no further.

``` ruby
require 'commonmarker'

# parse the files specified on the command line
doc = CommonMarker.render_doc("# The site\n\n [GitHub](https://www.github.com)")

# Walk tree and print out URLs for links
doc.walk do |node|
if node.type == :link
printf("URL = %s\n", node.url)
end
end

# Capitalize all regular text in headers
doc.walk do |node|
if node.type == :header
node.each do |subnode|
if subnode.type == :text
subnode.string_content = subnode.string_content.upcase
end
end
end
end

# Transform links to regular text
doc.walk do |node|
if node.type == :link
node.insert_before(node.first_child)
node.delete
end
end
```

### Creating a custom renderer

You can also derive a class from CommonMarker's `HtmlRenderer` class. This produces slower output, but is far more customizable. For example:

``` ruby
class MyHtmlRenderer < CommonMarker::HtmlRenderer
def initialize
super
@headerid = 1
end

def header(node)
block do
out("<h", node.header_level, " id=\"", @headerid, "\">",
:children, "</h", node.header_level, ">")
@headerid += 1
end
end
end

myrenderer = MyHtmlRenderer.new
puts myrenderer.render(doc)

# Print any warnings to STDERR
renderer.warnings.each do |w|
STDERR.write("#{w}\n")
end
```

## Options

CommonMarker accepts the same options that CMark does, as symbols. Note that there is a distinction in CMark for "parse" options and "render" options, which are represented in the tables below.
Expand Down Expand Up @@ -158,101 +72,28 @@ CommonMarker accepts the same options that CMark does, as symbols. Note that the

### Passing options

To apply a single option, pass it in as a symbol argument:
To apply an option, pass it as part of the hash:

``` ruby
CommonMarker.render_doc("\"Hello,\" said the spider.", :SMART)
CommonMarker.to_html("\"Hello,\" said the spider.", :SMART)
# <p>“Hello,” said the spider.</p>\n
```

To have multiple options applied, pass in an array of symbols:

``` ruby
CommonMarker.render_html("\"'Shelob' is my name.\"", [:HARDBREAKS, :SOURCEPOS])
CommonMarker.to_html("\"'Shelob' is my name.\"", [:HARDBREAKS, :SOURCEPOS])
```

For more information on these options, see [the CMark documentation](https://git.io/v7nh1).

## Extensions

Both `render_html` and `render_doc` take an optional third argument defining the extensions you want enabled as your CommonMark document is being processed. The documentation for these extensions are [defined in this spec](https://github.github.com/gfm/), and the rationale is provided [in this blog post](https://githubengineering.com/a-formal-spec-for-github-markdown/).

The available extensions are:

* `:table` - This provides support for tables.
* `:tasklist` - This provides support for task list items.
* `:strikethrough` - This provides support for strikethroughs.
* `:autolink` - This provides support for automatically converting URLs to anchor tags.
* `:tagfilter` - This escapes [several "unsafe" HTML tags](https://github.github.com/gfm/#disallowed-raw-html-extension-), causing them to not have any effect.
For more information on these options, see [the comrak documentation](https://github.com/kivikakk/comrak#usage).

## Output formats

Like CMark, CommonMarker can generate output in several formats: HTML, XML, plaintext, and commonmark are currently supported.
Commonmarker can only generate output in one format: HTML.

### HTML

The default output format, HTML, will be generated when calling `to_html` or using `--to=html` on the command line.

```ruby
doc = CommonMarker.render_doc('*Hello* world!', :DEFAULT)
puts(doc.to_html)

<p><em>Hello</em> world!</p>
```

### XML

XML will be generated when calling `to_xml` or using `--to=xml` on the command line.

```ruby
doc = CommonMarker.render_doc('*Hello* world!', :DEFAULT)
puts(doc.to_xml)

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE document SYSTEM "CommonMark.dtd">
<document xmlns="http://commonmark.org/xml/1.0">
<paragraph>
<emph>
<text xml:space="preserve">Hello</text>
</emph>
<text xml:space="preserve"> world!</text>
</paragraph>
</document>
```

### Plaintext

Plaintext will be generated when calling `to_plaintext` or using `--to=plaintext` on the command line.

```ruby
doc = CommonMarker.render_doc('*Hello* world!', :DEFAULT)
puts(doc.to_plaintext)
html = CommonMarker.to_html('*Hello* world!', :DEFAULT)
puts(html)

Hello world!
```

### Commonmark

Commonmark will be generated when calling `to_commonmark` or using `--to=commonmark` on the command line.

``` ruby
text = <<-TEXT
1. I am a numeric list.
2. I continue the list.
* Suddenly, an unordered list!
* What fun!
TEXT

doc = CommonMarker.render_doc(text, :DEFAULT)
puts(doc.to_commonmark)

1. I am a numeric list.
2. I continue the list.

<!-- end list -->

- Suddenly, an unordered list\!
- What fun\!
# <p><em>Hello</em> world!</p>
```

## Developing locally
Expand All @@ -264,7 +105,7 @@ script/bootstrap
bundle exec rake compile
```

If there were no errors, you're done! Otherwise, make sure to follow the CMark dependency instructions.
If there were no errors, you're done! Otherwise, make sure to follow the comrak dependency instructions.

## Benchmarks

Expand All @@ -273,16 +114,21 @@ Some rough benchmarks:
```
$ bundle exec rake benchmark

input size = 11063727 bytes
input size = 11064832 bytes

redcarpet
0.070000 0.020000 0.090000 ( 0.079641)
github-markdown
0.070000 0.010000 0.080000 ( 0.083535)
Warming up --------------------------------------
redcarpet 2.000 i/100ms
commonmarker with to_html
1.000 i/100ms
kramdown 1.000 i/100ms
Calculating -------------------------------------
redcarpet 22.634 (± 4.4%) i/s - 114.000 in 5.054490s
commonmarker with to_html
0.100000 0.010000 0.110000 ( 0.111947)
commonmarker with ruby HtmlRenderer
1.830000 0.030000 1.860000 ( 1.866203)
kramdown
4.610000 0.070000 4.680000 ( 4.678398)
7.340 (± 0.0%) i/s - 37.000 in 5.058352s
kramdown 0.343 (± 0.0%) i/s - 2.000 in 5.834208s

Comparison:
redcarpet: 22.6 i/s
commonmarker with to_html: 7.3 i/s - 3.08x (± 0.00) slower
kramdown: 0.3 i/s - 66.02x (± 0.00) slower
```
Loading