Skip to content

Commit 6a20ee4

Browse files
authored
Merge pull request #2510 from sparklemotion/flavorjones-encoding-reader-performance-v1.13.x
improve encoding reader performance (backport to v1.13.x)
2 parents b848031 + e444525 commit 6a20ee4

File tree

2 files changed

+13
-1
lines changed

2 files changed

+13
-1
lines changed

lib/nokogiri/html4/document.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -268,7 +268,7 @@ def start_element(name, attrs = [])
268268
end
269269

270270
def self.detect_encoding(chunk)
271-
(m = chunk.match(/\A(<\?xml[ \t\r\n]+[^>]*>)/)) &&
271+
(m = chunk.match(/\A(<\?xml[ \t\r\n][^>]*>)/)) &&
272272
(return Nokogiri.XML(m[1]).encoding)
273273

274274
if Nokogiri.jruby?

test/html4/test_document_encoding.rb

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -155,6 +155,18 @@ def binopen(file)
155155
end
156156
end
157157
end
158+
159+
it "does not start backtracking during detection of XHTML encoding" do
160+
# this test is a quick and dirty version
161+
# of the more complete perf test that is on main.
162+
n = 40_000
163+
redos_string = "<?xml " + (" " * n)
164+
redos_string.encode!("ASCII-8BIT")
165+
start_time = Time.now
166+
Nokogiri::HTML4(redos_string)
167+
elapsed_time = Time.now - start_time
168+
assert_operator(elapsed_time, :<, 1)
169+
end
158170
end
159171
end
160172
end

0 commit comments

Comments
 (0)