Skip to content

Commit e6a2015

Browse files
committed
Speed up ProhibitSurrogateCharactersValidator
I've noticed that this validator is using a per-character loop. Replacing it with a regex results in a pretty significant speedup. Here are results from my benchmark: String length Old implementation New implementation time (sec) time (sec) 1 2.833e-07 1.765e-07 10 5.885e-07 2.030e-07 100 3.598e-06 4.144e-07 1000 3.329e-05 2.463e-06 10000 0.0003338 2.449e-05 100000 0.003338 0.0002284 1000000 0.03333 0.002278 10000000 0.3389 0.02377 100000000 3.250 0.2365 For large strings, the speedups are more than an order of magnitude.
1 parent 985dd73 commit e6a2015

File tree

1 file changed

+6
-3
lines changed

1 file changed

+6
-3
lines changed

rest_framework/validators.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66
object creation, and makes it possible to switch between using the implicit
77
`ModelSerializer` class and an equivalent explicit `Serializer` class.
88
"""
9+
import re
10+
911
from django.core.exceptions import FieldError
1012
from django.db import DataError
1113
from django.db.models import Exists
@@ -216,13 +218,14 @@ def __eq__(self, other):
216218

217219

218220
class ProhibitSurrogateCharactersValidator:
221+
_regex = re.compile(r'[\ud800-\udfff]')
222+
219223
message = _('Surrogate characters are not allowed: U+{code_point:X}.')
220224
code = 'surrogate_characters_not_allowed'
221225

222226
def __call__(self, value):
223-
for surrogate_character in (ch for ch in str(value)
224-
if 0xD800 <= ord(ch) <= 0xDFFF):
225-
message = self.message.format(code_point=ord(surrogate_character))
227+
if match := self._regex.search(str(value)):
228+
message = self.message.format(code_point=ord(match.group()))
226229
raise ValidationError(message, code=self.code)
227230

228231
def __eq__(self, other):

0 commit comments

Comments
 (0)