Skip to content

Commit

Permalink
Fix performance issue caused by using repeated > characters inside …
Browse files Browse the repository at this point in the history
…comments (#171)

A `<` is treated as a string delimiter. 
In certain cases, if `<` is used in succession, read and match are
repeated, which slows down the process. Therefore, the following is used
to read ahead to a specific part of the string in advance.
  • Loading branch information
Watson1978 authored Jul 16, 2024
1 parent 0af55fa commit c1b64c1
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 1 deletion.
3 changes: 2 additions & 1 deletion lib/rexml/parsers/baseparser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,7 @@ class BaseParser
module Private
INSTRUCTION_END = /#{NAME}(\s+.*?)?\?>/um
INSTRUCTION_TERM = "?>"
COMMENT_TERM = "-->"
TAG_PATTERN = /((?>#{QNAME_STR}))\s*/um
CLOSE_PATTERN = /(#{QNAME_STR})\s*>/um
ATTLISTDECL_END = /\s+#{NAME}(?:#{ATTDEF})*\s*>/um
Expand Down Expand Up @@ -243,7 +244,7 @@ def pull_event
return process_instruction(start_position)
elsif @source.match("<!", true)
if @source.match("--", true)
md = @source.match(/(.*?)-->/um, true)
md = @source.match(/(.*?)-->/um, true, term: Private::COMMENT_TERM)
if md.nil?
raise REXML::ParseException.new("Unclosed comment", @source)
end
Expand Down
11 changes: 11 additions & 0 deletions test/parse/test_comment.rb
Original file line number Diff line number Diff line change
@@ -1,8 +1,12 @@
require "test/unit"
require "core_assertions"

require "rexml/document"

module REXMLTests
class TestParseComment < Test::Unit::TestCase
include Test::Unit::CoreAssertions

def parse(xml)
REXML::Document.new(xml)
end
Expand Down Expand Up @@ -117,5 +121,12 @@ def test_after_root

assert_equal(" ok comment ", events[:comment])
end

def test_gt_linear_performance
seq = [10000, 50000, 100000, 150000, 200000]
assert_linear_performance(seq, rehearsal: 10) do |n|
REXML::Document.new('<!-- ' + ">" * n + ' -->')
end
end
end
end

0 comments on commit c1b64c1

Please sign in to comment.