-
Notifications
You must be signed in to change notification settings - Fork 497
Remove some false positives from basic auth #123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove some false positives from basic auth #123
Conversation
detect_secrets/plugins/basic_auth.py
Outdated
@@ -5,7 +5,7 @@ | |||
from .base import RegexBasedDetector | |||
|
|||
|
|||
SPECIAL_URL_CHARACTERS = ':/?#[]@' | |||
SPECIAL_URL_CHARACTERS = ':/?#[]@"\'' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These SPECIAL_URL_CHARACTERS
are actually more than a blacklist of characters: they are based off RFC 3986 (Section 2.2 Reserved Characters) and inspired by "The Tangled Web".
Cross-referencing other sources of information, this nice overview, seems to suggest that the semi-colon has been left out from this set of reserved characters, perhaps due to compatibility reasons.
Perhaps a better way is to reclassify this is renaming this RESERVED_CHARACTERS
, and adding a new set of SUB_DELIMITER_CHARACTERS
, where we can add a variety of other known delimiters (including quote marks)? After all, Basic Auth credentials should not exist in the query string, and limiting this would both help with performance and lowering false positives.
And for bonus points, please add a comment to why these URL characters were chosen, for posterity and better maintenance down the road!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the very informative background. Did a quick glance of The Tangled Web, looks like a really good book.
The false positive we found in some code and want to avoid is "https://url:8000";@something else'
. We've further expanded the format to replace double quote with the single quote. As you pointed out, if we add the semicolon to the special URL character list, it would also serve the purpose. My original implementation of adding quotes is not ideal.
I did read the compatibility reason link. If I understand correctly, it says for some CGI implementor, instead of writing http://host/?x=1&y=2
, you can write http://host/?x=1;y=2
. It's mainly for separating the URL parameters. For our case, we are trying to capture the use of base authentication. Is there any legit use case that a semicolon would appear in the username and password field before @
sign? If not, I think we can add the semicolon back to the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that I don't think there's a legitimate use case that a semi-colon would appear in the username/password field, as it's uncertain how it will be parsed as a URL.
What I'm suggesting is something like:
RESERVED_CHARACTERS = ':/?#[]@'
SUB_DELIMITER_CHARACTERS = '!$&\';' # and anything else we might need
Then, we can modify the blacklist to be:
blacklist = [
re.compile(
r'://[^{}\s]+:([^{}\s]+)@'.format(
re.escape(RESERVED_CHARACTERS + SUB_DELIMITER_CHARACTERS),
re.escape(RESERVED_CHARACTERS + SUB_DELIMITER_CHARACTERS),
)
]
If this fits your purpose, I think it could both reduce false positives and improve regex performance seeing that there's a tighter set of constraints, allowing the regex engine to fail quicker and require less backtracking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for our case well. I've rebased master and made the change you suggested. @domanchi
Exclude single and double quotes in matching
e3441a7
to
3e179cd
Compare
Exclude single and double quotes in matching. See added test case for details.