-
Notifications
You must be signed in to change notification settings - Fork 1.7k
add Ruby YAML unsafe_* methods as sink #12301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
QHelp previews: ruby/ql/src/experimental/CWE-502/YAMLUnsafeYamlDeserialization.qhelpDeserialization of user-controlled data by YAMLProcessing an unvalidated user input can allow an attacker to execute arbitrary code in your application. Unsafe deserializing the malicious serialized yaml document through the Psych (YAML) library, making it possible to execute some code or execute arbitrary code with the help of a complete gadget chain. RecommendationAfter Psych(YAML) 4.0.0, the load method is same as safe_load method. This vulnerability can be prevented by using YAML.load (same as ExampleIn the example below, you can see safe and unsafe methods get called by a remote user input. You can give correct authorization to users, or you can use safe methods for loading yaml documents. require 'yaml'
class UsersController < ActionController::Base
def example
# safe
Psych.load(params[:yaml_string])
Psych.load_file(params[:yaml_file])
Psych.parse_stream(params[:yaml_string])
Psych.parse(params[:yaml_string])
Psych.parse_file(params[:yaml_file])
# unsafe
Psych.unsafe_load(params[:yaml_string])
Psych.unsafe_load_file(params[:yaml_file])
Psych.load_stream(params[:yaml_string])
parse_output = Psych.parse_stream(params[:yaml_string])
parse_output.to_ruby
Psych.parse(params[:yaml_string]).to_ruby
Psych.parse_file(params[:yaml_file]).to_ruby
end
end
|
// for detecting The CVE we should uncomment following line | ||
// sink.getLocation().getFile().toString().matches("%yaml_column%") and | ||
sink instanceof YamlSink or | ||
sink = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you mean to place the sink at the receiver of the call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For detecting the mentioned CVE there are a lot of results after that you uncomment source instanceof DataFlow::LocalSourceNode
in isSource.
so I just want to filter the results to see the CVE taint path easier than before
that is my mistake I wrote a bad comment for that.
.getAMethodCall("to_ruby") | ||
} | ||
|
||
override predicate isAdditionalTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to add these taint steps as global taint steps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understood this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The taintsteps defined in the isAdditionalTaintStep
of a TaintTracking configuration only apply to that particular configuration. I can imagine untrusted YAML data being parsed and then one of the properties flowing into a different injection sink (eg: SQL injection). In those cases we want to apply the taintsteps to all Tracking configurations. To do that, create a class that extends from AdditionalTaintStep
:
private import codeql.ruby.dataflow.FlowSteps
private class YamlParseStep extends AdditionalTaintStep {
override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
...
}
}
You can see some examples here. You would need to create a Yaml.qll file in ruby/ql/lib/codeql/ruby/frameworks
and then import it from https://github.com/github/codeql/blob/main/ruby/ql/lib/codeql/ruby/Frameworks.qll
// for detecting The CVE we should uncomment following line | ||
// sink.getLocation().getFile().toString().matches("%yaml_column%") and | ||
sink instanceof YamlSink or | ||
sink = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make its own YamlSink
for this. Would it make sense to add the sink at the reciever of the to_ruby
call instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't put the parse_*.to_ruby
sinks to YamlSink as I couldn't find a way to do taint step to do following:
for a code like
require 'yaml'
class UsersController < ActionController::Base
def example
Psych.parse_stream(params[:yaml_string]).to_ruby
end
end
I wanted to make to_ruby
a Mandatory condition, and also I should have a taint sink like getArgument
to be able to connect the chain of parse_*.to_ruby
to the taint steps.
so I couldn't use YamlSink
and I had to separate parse_*.to_ruby
from YamlSink
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and I thought it doesn't have any problem, as we can see the taint steps in vscode, so we can see that the user input first go to parse_* and then the .to_ruby will cause the problem.
@pwntester While the bug bounty policy says that new queries should go into the "experimental" folder, I think in this case the additional sinks could be added to the existing unsafe deserialization query. We could validate the additional sinks by running the existing and the updated query against a batch of say 1000 repos and see whether the new results are good. What do you think? |
I added another dangerous sink in this pull request, as I saw this is not a good idea to opening another one. |
As @aibaars says, I think these new sinks are best added to the existing class YamlLoadArgument extends Sink {
YamlLoadArgument() {
exists(API::Node yaml | yaml = API::getTopLevelMember(["YAML", "Psych"]) |
this = yaml
.getAMethodCall(["load", "unsafe_load_file", "unsafe_load", "load_stream"])
.getArgument(0)
or
this = yaml
.getAMethodCall(["unsafe_load", "load_stream"])
.getKeywordArgument("yaml")
// or ...
)
}
} |
Hi, I want to know about the |
Hi @amammad, as per @aibaars and @hmac suggestion, can you please add the new YAML/PLIST sinks into the existing
In most of the cases we remove a sink when it is no longer vulnerable in some version since Repos that use the old version should listen to dependabot an upgrade (or turn on dependabot in the first place). |
I removed the |
I have an important question, I want to submit bug slayer request for this query too, if i merged it in a non-experimental directory so all GitHub repositories that are using codeql will recognize the vulnerability and I can't really submit my bug slayer request as maintainers will fix the vulnerabilities according to codeql action alerts. |
It will take some time to merge this PR and make it available for Code Scanning. In the meantime you can mass scan for this issue using MRVA and report any findings to the maintainers. |
@@ -23,6 +24,27 @@ class Configuration extends TaintTracking::Configuration { | |||
|
|||
override predicate isSink(DataFlow::Node sink) { sink instanceof UnsafeDeserialization::Sink } | |||
|
|||
override predicate isAdditionalTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please define these as global taint steps
I didn't change example files because i think it need to add test results too, so I will add them after final changes if you agree with this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general this looks like a great addition, but I think there's some cleaning up we could do to make it more mergeable.
@@ -0,0 +1,30 @@ | |||
/** | |||
* add additional steps for to_ruby method of YAML/Psych library |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The file comment here is typically a bit more general, for example
/**
* Provides modeling for the `YAML` and `Psych` libraries. */
*/
|
||
private class YamlParseStep extends AdditionalTaintStep { | ||
override predicate step(DataFlow::Node pred, DataFlow::Node succ) { | ||
exists(DataFlow::CallNode yaml_parser_methods | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The preferred style is camelCase
for variable names. We also tend to use singular nouns. So this should be yamlParserMethod
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you prefer following format
private class YamlParseStep extends AdditionalTaintStep {
override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
exists(DataFlow::CallNode yamlParserMethod |
yamlParserMethod =
API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall(["parse", "parse_stream"]) and
(
pred = yamlParserMethod.getArgument(0) or
pred = yamlParserMethod.getKeywordArgument("yaml")
) and
succ = yamlParserMethod.getAMethodCall("to_ruby")
or
yamlParserMethod = API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall("parse_file") and
(
pred = yamlParserMethod.getArgument(0) or
pred = yamlParserMethod.getKeywordArgument("filename")
) and
succ = yamlParserMethod.getAMethodCall("to_ruby")
)
}
}
Or two separate exists
quantifiers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a single exists
is preferable. It may provide a performance improvement as well, but I don't know enough to be sure of that.
private import codeql.ruby.DataFlow | ||
private import codeql.ruby.ApiGraphs | ||
|
||
private class YamlParseStep extends AdditionalTaintStep { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could add a doc comment to this class explaining the step we are adding here. Perhaps something like
A taint step related to the result of `YAML.parse` calls, or similar.
In the following example, this step will propagate taint from
`source` to `sink`:
```rb
x = source
result = YAML.parse(x)
sink result.to_ruby
@@ -75,13 +75,41 @@ module UnsafeDeserialization { | |||
} | |||
|
|||
/** | |||
* An argument in a call to `YAML.load`, considered a sink | |||
* An argument in a call to `YAML.unsafe_*` and `YAML.load_stream` , considered sinks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to leave the YAML.load
sink in here for now. We can consider removing it separately, and that will make it easier to analyse the impact of such a change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it look good?
/**
* An argument in a call to `YAML.unsafe_*` and `YAML.load_stream` , considered sinks
* for unsafe deserialization. The `YAML` module is an alias of `Psych` in
* recent versions of Ruby.
* the `this = yamlNode().getAMethodCall("load").getArgument(0)` is safe
* in recent versions of YAML library, so it will be removed in future.
*/
class YamlLoadArgument extends Sink {
YamlLoadArgument() {
this = yamlNode().getAMethodCall("load").getArgument(0)
or
this =
yamlNode().getAMethodCall(["unsafe_load_file", "unsafe_load", "load_stream"]).getArgument(0)
or
this = yamlNode().getAMethodCall(["unsafe_load", "load_stream"]).getKeywordArgument("yaml")
or
this = yamlNode().getAMethodCall("unsafe_load_file").getKeywordArgument("filename")
}
}
/**
* YAML/Psych Top level Class member
*/
private API::Node yamlNode() { result = API::getTopLevelMember(["YAML", "Psych"]) }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great - if you know it, can you mention the minimum version of the YAML library for which the load
method is safe? That will help others to decide whether to make this change in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, now I mention to the exact version too.
.getAMethodCall("unsafe_load_file") | ||
.getKeywordArgument("filename") | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could extract the API::getTopLevelMember(["YAML", "Psych"])
expression into a separate predicate, to reduce duplication a bit:
private API::Node yamlNode() {
result = API::getTopLevelMember(["YAML", "Psych"])
}
Then in the charpred above:
this = yamlNode().getAMethodCall(...)
@@ -9,6 +9,7 @@ | |||
private import codeql.ruby.AST | |||
private import codeql.ruby.DataFlow | |||
private import codeql.ruby.TaintTracking | |||
private import codeql.ruby.ApiGraphs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed?
* @problem.severity warning | ||
* @security-severity 9.8 | ||
* @precision high | ||
* @id rb/Plist-unsafe-deserialization |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @id rb/Plist-unsafe-deserialization | |
* @id rb/plist-unsafe-deserialization |
* check whether an input argument has desired "key: value" input or not. | ||
* borrowed from UnsafeDeserialization module with some changes | ||
*/ | ||
predicate checkkeyBalue(CfgNodes::ExprNodes::PairCfgNode p, string key, string value) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
predicate checkkeyBalue(CfgNodes::ExprNodes::PairCfgNode p, string key, string value) { | |
predicate checkkeyValue(CfgNodes::ExprNodes::PairCfgNode p, string key, string value) { |
plistParsexml.getNumberOfArguments() = 1 | ||
) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not great that we have this class duplicated. It would be better to find a way to keep the single class. For example, can we remove this query and update the existing Unsafe Deserialization qhelp to describe the plist vulnerability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you right, I just kept my query files and added them to non-experimental libraries for better tracking. if you want to there is no new query file in experimental directory, it can be removed.
nodeTo = yaml_parser_methods.getAMethodCall("to_ruby") | ||
) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto with this file. The existing Unsafe Deserialization query should catch these cases now that you've added these sinks and the extra taint step, so do we need a separate query?
QHelp previews: ruby/ql/src/experimental/cwe-502/PlistUnsafeDeserialization.qhelpUnsafe Deserialization of user-controlled data by PlistProcessing an unvalidated user input can allow an attacker to execute arbitrary code in your application. Unsafe deserializing the malicious serialized xml document through the Plist library, making it possible to execute some code or execute arbitrary code with the help of a complete gadget chain. RecommendationThis vulnerability in Plist can be prevented by calling ExampleIn the example below, you can see safe and unsafe Plist dangerous method calls that can be abused by a remote user input. You can use "marshal: false" as an arugument for require 'plist'
class UsersController < ActionController::Base
def example
# not safe
config = true
result = Plist.parse_xml(params[:yaml_string])
result = Plist.parse_xml(params[:yaml_string], marshal: config)
result = Plist.parse_xml(params[:yaml_string], marshal: true)
# safe
config = false
result = Plist.parse_xml(params[:yaml_string], marshal: false)
result = Plist.parse_xml(params[:yaml_string], marshal: config)
end
end
References
ruby/ql/src/experimental/cwe-502/YAMLUnsafeDeserialization.qhelpUnsafe Deserialization of user-controlled data by YAMLProcessing an unvalidated user input can allow an attacker to execute arbitrary code in your application. Unsafe deserializing the malicious serialized yaml document through the Psych (YAML) library, making it possible to execute some code or execute arbitrary code with the help of a complete gadget chain. RecommendationAfter Psych(YAML) 4.0.0, the load method is same as safe_load method. This vulnerability can be prevented by using YAML.load (same as ExampleIn the example below, you can see safe and unsafe methods get called by a remote user input. You can give correct authorization to users, or you can use safe methods for loading yaml documents. require 'yaml'
class UsersController < ActionController::Base
def example
# safe
Psych.load(params[:yaml_string])
Psych.load_file(params[:yaml_file])
Psych.parse_stream(params[:yaml_string])
Psych.parse(params[:yaml_string])
Psych.parse_file(params[:yaml_file])
# unsafe
Psych.unsafe_load(params[:yaml_string])
Psych.unsafe_load_file(params[:yaml_file])
Psych.load_stream(params[:yaml_string])
parse_output = Psych.parse_stream(params[:yaml_string])
parse_output.to_ruby
Psych.parse(params[:yaml_string]).to_ruby
Psych.parse_file(params[:yaml_file]).to_ruby
end
end
References
ruby/ql/src/queries/security/cwe-502/UnsafeDeserialization.qhelpDeserialization of user-controlled dataDeserializing untrusted data using any method that allows the construction of arbitrary objects is easily exploitable and, in many cases, allows an attacker to execute arbitrary code. RecommendationAvoid deserialization of untrusted data if possible. If the architecture permits it, use serialization formats that cannot represent arbitrary objects. For libraries that support it, such as the Ruby standard library's YAML/Psych recommendation: After Psych(YAML) 4.0.0, the load method is same as safe_load method. This vulnerability can be prevented by using YAML.load (same as This vulnerability in Plist can be prevented by calling ExampleThe following example calls the require 'json'
require 'yaml'
require 'oj'
class UserController < ActionController::Base
def marshal_example
data = Base64.decode64 params[:data]
object = Marshal.load data
# ...
end
def json_example
object = JSON.load params[:json]
# ...
end
def yaml_example
object = YAML.load params[:yaml]
# ...
end
def oj_example
object = Oj.load params[:json]
# ...
end
end In the example below, you can see safe and unsafe methods get called by a remote user input. You can give correct authorization to users, or you can use safe methods for loading yaml documents. require 'yaml'
class UsersController < ActionController::Base
def example
# safe
Psych.load(params[:yaml_string])
Psych.load_file(params[:yaml_file])
Psych.parse_stream(params[:yaml_string])
Psych.parse(params[:yaml_string])
Psych.parse_file(params[:yaml_file])
# unsafe
Psych.unsafe_load(params[:yaml_string])
Psych.unsafe_load_file(params[:yaml_file])
Psych.load_stream(params[:yaml_string])
parse_output = Psych.parse_stream(params[:yaml_string])
parse_output.to_ruby
Psych.parse(params[:yaml_string]).to_ruby
Psych.parse_file(params[:yaml_file]).to_ruby
end
end
In the example below, you can see safe and unsafe Plist dangerous method calls that can be abused by a remote user input. You can use "marshal: false" as an arugument for require 'plist'
class UsersController < ActionController::Base
def example
# not safe
config = true
result = Plist.parse_xml(params[:yaml_string])
result = Plist.parse_xml(params[:yaml_string], marshal: config)
result = Plist.parse_xml(params[:yaml_string], marshal: true)
# safe
config = false
result = Plist.parse_xml(params[:yaml_string], marshal: false)
result = Plist.parse_xml(params[:yaml_string], marshal: config)
end
end
Using require 'json'
class UserController < ActionController::Base
def safe_json_example
object = JSON.parse params[:json]
# ...
end
def safe_yaml_example
object = YAML.safe_load params[:yaml]
# ...
end
def safe_oj_example
object = Oj.load params[:yaml], { mode: :strict }
# or
object = Oj.safe_load params[:yaml]
# ...
end
end References
|
@amammad sorry for the delay with this PR. I think we can merge it in after a few more changes. Do you mind if I make the changes myself? The credit will remain all yours, of course :) |
Closing in favour of #13307 |
No description provided.