Fix fallback for non-mapped Unicode char #1609

allanbomsft · 2017-11-03T08:52:35Z

For example see this request:

POST / HTTP/1.1
Host: somehost:8080
Accept: */*
User-Agent: someagent
Content-Length: 30
Content-Type: application/json;charset=utf-8

{
    "a": "娧    "
}

t:utf8toUnicode turns this Unicode char into %u5a27 , and so the lowest byte is 0x27, which is a single quote in ASCII. This triggers false positives.

This happens with any unicode character that doesn't have a mapping in the SecUnicodeMapFile, and whose last byte in its code point happens to be 0x27. Likewise for characters that end in 0x22 would be treated as a double quote, etc.

The same problem exists with t:jsDecode given a request with the unicode character full width G (code point FF27).

Said differently: the last byte in the Unicode code point does not have any meaningful relation to whatever ASCII char happens to be represented by the same byte, and so we shouldn't treat it so.

I suggest replacing with an x. I also considered question marks or space, but that could also trigger false positives (too many non-alphanum in a row). Also considered just omitting the char, but that could also trigger a false positive where for example "-娧-" would have been OK but "--" is not.

… random byte is the lowest in the unicode code point

zimmerle · 2018-11-26T19:35:37Z

Hi @allanbomsft,

Thank you for the patch. The transformation in ModSecurity are basically used as a way to prevent evasion. That is the case of t:utf8toUnicode. The convertion takes into consideration SecUnicodeMapFile. The convertion here may not need a fallback, as it is working in the exactly manner that it was designed to: matching wathever happens on the backend app.

Python

>>> hex(ord("娧"))
'0x5a27'

php

$ /tmp  cat a.php
<?php
echo json_encode("娧");
?>

$ /tmp  php a.php
"\u5a27"

JavaScript

> encodeURIComponent(escape("娧"))
< "%25u5A27"

The rule that are making usage of t:utf8toUnicode needs to be ware that the result will be a an unicode, as well as it is high recommended to have the SecUnicodeMapFile configured correctly. Therefore I am closing this without a merge. If you point us to the specific rule that is leading to the false positive, we may be able to assist you better. Thank you.

allanbomsft · 2018-11-27T21:17:11Z

It's been more than a year since I sent this, so my memory on this issue is a bit hazy :-) I've dug through my notes and reproed the scenario again on the SpiderLabs branch (we are running with my patch in production on the Microsoft branch, so no repro there).

I understand that the conversion takes SecUnicodeMapFile into consideration, but this fix relates only to characters that there exist no mapping for in the SecUnicodeMapFile.

For example if there is no mapping for 娧 in the file, then the following request false positives CRS 942110.

POST / HTTP/1.1
Host: somehost:8080
Accept: */*
User-Agent: someagent
Content-Length: 37
Content-Type: application/json;charset=utf-8

{
    "a": "娧",
    "b": "娧"
}

This is because ModSecurity misunderstands this request as if it was

{
    "a": "'",
    "b": "'"
}

because, as mentioned in the original post, the last octet of codepoint 5A27 is 27. It is this fallback mapping from codepoint 5A27 to 27 that is incorrect. It is not what the backend receives. The UTF-8 encoded representation of 娧 is E5A8A7.

This is true for any char whose codepoint ends in 27, such as

5727  圧
5627  唧
5427  吧

allanrbo added 2 commits November 2, 2017 01:14

Unicode chars without an ASCII mapping should not default to whatever…

103abf5

… random byte is the lowest in the unicode code point

Fix unicode fallback in jsdecode and add back full width handling

0e5129c

allanbomsft mentioned this pull request Nov 6, 2017

Fix fallback for non-mapped Unicode char on v2 #1611

Closed

zimmerle added the 3.x Related to ModSecurity version 3.x label Feb 28, 2018

zimmerle force-pushed the v3/master branch from 1ab62e1 to 15b38fb Compare March 23, 2018 02:01

zimmerle self-assigned this Apr 24, 2018

zimmerle self-requested a review April 24, 2018 01:58

victorhora self-assigned this Sep 14, 2018

victorhora self-requested a review September 14, 2018 20:46

victorhora added this to the v3.0.4 milestone Nov 13, 2018

victorhora added enhancement RIP - libmodsecurity labels Nov 13, 2018

zimmerle closed this Nov 26, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix fallback for non-mapped Unicode char #1609

Fix fallback for non-mapped Unicode char #1609

Uh oh!

allanbomsft commented Nov 3, 2017

Uh oh!

zimmerle commented Nov 26, 2018

Uh oh!

allanbomsft commented Nov 27, 2018 •

edited

Loading

Uh oh!

Uh oh!

Fix fallback for non-mapped Unicode char #1609

Fix fallback for non-mapped Unicode char #1609

Uh oh!

Conversation

allanbomsft commented Nov 3, 2017

Uh oh!

zimmerle commented Nov 26, 2018

Uh oh!

allanbomsft commented Nov 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

allanbomsft commented Nov 27, 2018 •

edited

Loading