Skip to content

Fix fallback for non-mapped Unicode char #1609

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 13 additions & 6 deletions src/actions/transformations/js_decode.cc
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -71,14 +71,21 @@ int JsDecode::inplace(unsigned char *input, uint64_t input_len) {
&& (VALID_HEX(input[i + 4])) && (VALID_HEX(input[i + 5]))) {
/* \uHHHH */

/* Use only the lower byte. */
*d = utils::string::x2c(&input[i + 4]);
unsigned char lowestByte = utils::string::x2c(&input[i + 4]);

/* Full width ASCII (ff01 - ff5e) needs 0x20 added */
if ((*d > 0x00) && (*d < 0x5f)
if ((lowestByte > 0x00) && (lowestByte < 0x5f)
&& ((input[i + 2] == 'f') || (input[i + 2] == 'F'))
&& ((input[i + 3] == 'f') || (input[i + 3] == 'F'))) {
(*d) += 0x20;
&& ((input[i + 3] == 'f') || (input[i + 3] == 'F')))
{
/* Full width ASCII (ff01 - ff5e) needs 0x20 added. */
/* This is because the first printable char in ASCII is 0x20, and corresponds to 0xFF00. */
*d = lowestByte + 0x20;
}
else
{
/* There was no good ASCII character to map this unicode character to. */
/* Put a placeholder that is hopefully as innocent as the unicode character. */
*d = 'x';
}

d++;
Expand Down
29 changes: 16 additions & 13 deletions src/actions/transformations/url_decode_uni.cc
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ int UrlDecodeUni::inplace(unsigned char *input, uint64_t input_len,
if (input[i] == '%') {
if ((i + 1 < input_len) &&
((input[i + 1] == 'u') || (input[i + 1] == 'U'))) {
/* Character is a percent sign. */
/* Character is a percent sign. */
/* IIS-specific %u encoding. */
if (i + 5 < input_len) {
/* We have at least 4 data bytes. */
Expand Down Expand Up @@ -113,18 +113,21 @@ int UrlDecodeUni::inplace(unsigned char *input, uint64_t input_len,
if (hmap != -1) {
*d = hmap;
} else {
/* We first make use of the lower byte here,
* ignoring the higher byte. */
*d = utils::string::x2c(&input[i + 4]);

/* Full width ASCII (ff01 - ff5e)
* needs 0x20 added */
if ((*d > 0x00) && (*d < 0x5f)
&& ((input[i + 2] == 'f')
|| (input[i + 2] == 'F'))
&& ((input[i + 3] == 'f')
|| (input[i + 3] == 'F'))) {
(*d) += 0x20;
unsigned char lowestByte = utils::string::x2c(&input[i + 4]);

if ((lowestByte > 0x00) && (lowestByte < 0x5f)
&& ((input[i + 2] == 'f') || (input[i + 2] == 'F'))
&& ((input[i + 3] == 'f') || (input[i + 3] == 'F')))
{
/* Full width ASCII (ff01 - ff5e) needs 0x20 added. */
/* This is because the first printable char in ASCII is 0x20, and corresponds to 0xFF00. */
*d = lowestByte + 0x20;
}
else
{
/* There was no good ASCII character to map this unicode character to. */
/* Put a placeholder that is hopefully as innocent as the unicode character. */
*d = 'x';
}
}
d++;
Expand Down