Skip to content

add Ruby YAML unsafe_* methods as sink #12301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ruby/ql/lib/codeql/ruby/Frameworks.qll
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@ private import codeql.ruby.frameworks.PosixSpawn
private import codeql.ruby.frameworks.StringFormatters
private import codeql.ruby.frameworks.Json
private import codeql.ruby.frameworks.Twirp
private import codeql.ruby.frameworks.Yaml
30 changes: 30 additions & 0 deletions ruby/ql/lib/codeql/ruby/frameworks/Yaml.qll
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
/**
* add additional steps for to_ruby method of YAML/Psych library
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file comment here is typically a bit more general, for example

/**
 * Provides modeling for the `YAML` and `Psych` libraries. */
 */

*/

private import codeql.ruby.dataflow.FlowSteps
private import codeql.ruby.DataFlow
private import codeql.ruby.ApiGraphs

private class YamlParseStep extends AdditionalTaintStep {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a doc comment to this class explaining the step we are adding here. Perhaps something like

A taint step related to the result of `YAML.parse` calls, or similar.
In the following example, this step will propagate taint from 
`source` to `sink`:

```rb
x = source
result = YAML.parse(x)
sink result.to_ruby

override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
exists(DataFlow::CallNode yaml_parser_methods |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred style is camelCase for variable names. We also tend to use singular nouns. So this should be yamlParserMethod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you prefer following format

private class YamlParseStep extends AdditionalTaintStep {
  override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
    exists(DataFlow::CallNode yamlParserMethod |
      yamlParserMethod =
        API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall(["parse", "parse_stream"]) and
      (
        pred = yamlParserMethod.getArgument(0) or
        pred = yamlParserMethod.getKeywordArgument("yaml")
      ) and
      succ = yamlParserMethod.getAMethodCall("to_ruby")
      or
      yamlParserMethod = API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall("parse_file") and
      (
        pred = yamlParserMethod.getArgument(0) or
        pred = yamlParserMethod.getKeywordArgument("filename")
      ) and
      succ = yamlParserMethod.getAMethodCall("to_ruby")
    )
  }
}

Or two separate exists quantifiers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a single exists is preferable. It may provide a performance improvement as well, but I don't know enough to be sure of that.

yaml_parser_methods =
API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall(["parse", "parse_stream"]) and
(
pred = yaml_parser_methods.getArgument(0) or
pred = yaml_parser_methods.getKeywordArgument("yaml")
) and
succ = yaml_parser_methods.getAMethodCall("to_ruby")
)
or
exists(DataFlow::CallNode yaml_parser_methods |
yaml_parser_methods = API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall("parse_file") and
(
pred = yaml_parser_methods.getArgument(0) or
pred = yaml_parser_methods.getKeywordArgument("filename")
) and
succ = yaml_parser_methods.getAMethodCall("to_ruby")
)
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -75,13 +75,41 @@ module UnsafeDeserialization {
}

/**
* An argument in a call to `YAML.load`, considered a sink
* An argument in a call to `YAML.unsafe_*` and `YAML.load_stream` , considered sinks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to leave the YAML.load sink in here for now. We can consider removing it separately, and that will make it easier to analyse the impact of such a change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it look good?

  /**
   * An argument in a call to `YAML.unsafe_*` and `YAML.load_stream` , considered sinks
   * for unsafe deserialization. The `YAML` module is an alias of `Psych` in
   * recent versions of Ruby.
   * the `this = yamlNode().getAMethodCall("load").getArgument(0)` is safe
   *  in recent versions of YAML library, so it will be removed in future.
   */
  class YamlLoadArgument extends Sink {
    YamlLoadArgument() {
      this = yamlNode().getAMethodCall("load").getArgument(0)
      or
      this =
        yamlNode().getAMethodCall(["unsafe_load_file", "unsafe_load", "load_stream"]).getArgument(0)
      or
      this = yamlNode().getAMethodCall(["unsafe_load", "load_stream"]).getKeywordArgument("yaml")
      or
      this = yamlNode().getAMethodCall("unsafe_load_file").getKeywordArgument("filename")
    }
  }

  /**
   * YAML/Psych Top level Class member
   */
  private API::Node yamlNode() { result = API::getTopLevelMember(["YAML", "Psych"]) }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great - if you know it, can you mention the minimum version of the YAML library for which the load method is safe? That will help others to decide whether to make this change in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now I mention to the exact version too.

* for unsafe deserialization. The `YAML` module is an alias of `Psych` in
* recent versions of Ruby.
*/
class YamlLoadArgument extends Sink {
YamlLoadArgument() {
this = API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall("load").getArgument(0)
this =
API::getTopLevelMember(["YAML", "Psych"])
.getAMethodCall(["unsafe_load_file", "unsafe_load", "load_stream"])
.getArgument(0)
or
this =
API::getTopLevelMember(["YAML", "Psych"])
.getAMethodCall(["unsafe_load", "load_stream"])
.getKeywordArgument("yaml")
or
this =
API::getTopLevelMember(["YAML", "Psych"])
.getAMethodCall("unsafe_load_file")
.getKeywordArgument("filename")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could extract the API::getTopLevelMember(["YAML", "Psych"]) expression into a separate predicate, to reduce duplication a bit:

private API::Node yamlNode() {
  result = API::getTopLevelMember(["YAML", "Psych"])
}

Then in the charpred above:

this = yamlNode().getAMethodCall(...)


/**
* An argument in a call to `YAML.parse*`, considered sinks
* for unsafe deserialization if there is a call to `to_ruby` on returned value of them,
* so this need some additional taint steps. The `YAML` module is an alias of `Psych` in
* recent versions of Ruby.
*/
class YamlParseArgument extends Sink {
YamlParseArgument() {
this =
API::getTopLevelMember(["YAML", "Psych"])
.getAMethodCall(["parse", "parse_stream", "parse_file"])
.getAMethodCall("to_ruby")
}
}

Expand Down Expand Up @@ -208,4 +236,31 @@ module UnsafeDeserialization {
)
}
}

/**
* check whether an input argument has desired "key: value" input or not.
*/
predicate checkkeyValue(CfgNodes::ExprNodes::PairCfgNode p, string key, string value) {
p.getKey().getConstantValue().isStringlikeValue(key) and
DataFlow::exprNode(p.getValue()).getALocalSource().getConstantValue().toString() = value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not work if you just write p.getValue().getConstantValue().isStringlikeValue(value)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you right, it seems no need for this predicate.

}

/**
* An argument in a call to `Plist.parse_xml` where the marshal is `true` (which is
* the default), considered a sink for unsafe deserialization.
*/
class UnsafePlistParsexmlArgument extends Sink {
UnsafePlistParsexmlArgument() {
exists(DataFlow::CallNode plistParsexml |
plistParsexml = API::getTopLevelMember("Plist").getAMethodCall("parse_xml")
|
this = [plistParsexml.getArgument(0), plistParsexml.getKeywordArgument("filename_or_xml")] and
// Exclude calls that explicitly pass a safe mode option.
checkkeyValue(plistParsexml.getArgument(1).asExpr(), "marshal", "true")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does plistParsexml.getKeywordArgument("marshal").getConstantValue().isBoolean(true) not work here?

or
this = [plistParsexml.getArgument(0), plistParsexml.getKeywordArgument("filename_or_xml")] and
plistParsexml.getNumberOfArguments() = 1
)
}
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@
private import codeql.ruby.AST
private import codeql.ruby.DataFlow
private import codeql.ruby.TaintTracking
private import codeql.ruby.ApiGraphs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

import UnsafeDeserializationCustomizations

/**
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
<!DOCTYPE qhelp SYSTEM "qhelp.dtd">
<qhelp>
<overview>
<p>
Processing an unvalidated user input can allow an attacker to execute arbitrary code in your application.
Unsafe deserializing the malicious serialized xml document through the Plist library, making it possible to execute some code or execute arbitrary code with the help of a complete gadget chain.
</p>
</overview>
<recommendation>
<p>
This vulnerability can be prevented by using <code>Plist.parse_xml</code>.
</p>
</recommendation>
<example>
<p>In the example below, you can see safe and unsafe call of this dangerous method that can be abused by a remote user input. You can use "marshal: false" as an arugument for <code>Plist.parse_xml</code> to use it safe.
</p>
<sample src="PlistUnsafeYamlDeserialization.rb" />
</example>
<references>
<li>
Security considerations from library documentation
<a href="https://github.com/patsplat/plist#label-Security+considerations">patsplat/plist</a>.
</li>
</references>
</qhelp>
68 changes: 68 additions & 0 deletions ruby/ql/src/experimental/cwe-502/PlistUnsafeDeserialization.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
/**
* @name Unsafe Deserialization of user-controlled data by Plist
* @description Deserializing user-controlled data may allow attackers to
* execute arbitrary code.
* @kind path-problem
* @problem.severity warning
* @security-severity 9.8
* @precision high
* @id rb/Plist-unsafe-deserialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @id rb/Plist-unsafe-deserialization
* @id rb/plist-unsafe-deserialization

* @tags security
* experimental
* external/cwe/cwe-502
*/

import codeql.ruby.ApiGraphs
import codeql.ruby.DataFlow
import codeql.ruby.TaintTracking
import codeql.ruby.CFG
import DataFlow::PathGraph
import codeql.ruby.security.UnsafeDeserializationCustomizations

abstract class PlistUnsafeSinks extends DataFlow::Node { }

/**
* check whether an input argument has desired "key: value" input or not.
* borrowed from UnsafeDeserialization module with some changes
*/
predicate checkkeyBalue(CfgNodes::ExprNodes::PairCfgNode p, string key, string value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
predicate checkkeyBalue(CfgNodes::ExprNodes::PairCfgNode p, string key, string value) {
predicate checkkeyValue(CfgNodes::ExprNodes::PairCfgNode p, string key, string value) {

p.getKey().getConstantValue().isStringlikeValue(key) and
DataFlow::exprNode(p.getValue()).getALocalSource().getConstantValue().toString() = value
}

/**
* An argument in a call to `Plist.parse_xml` where the marshal is `true` (which is
* the default), considered a sink for unsafe deserialization.
* borrowed from UnsafeDeserialization module with some changes
*/
class UnsafePlistParsexmlArgument extends PlistUnsafeSinks {
UnsafePlistParsexmlArgument() {
exists(DataFlow::CallNode plistParsexml |
plistParsexml = API::getTopLevelMember("Plist").getAMethodCall("parse_xml")
|
this = [plistParsexml.getArgument(0), plistParsexml.getKeywordArgument("filename_or_xml")] and
// Exclude calls that explicitly pass a safe mode option.
checkkeyBalue(plistParsexml.getArgument(1).asExpr(), "marshal", "true")
or
this = [plistParsexml.getArgument(0), plistParsexml.getKeywordArgument("filename_or_xml")] and
plistParsexml.getNumberOfArguments() = 1
)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great that we have this class duplicated. It would be better to find a way to keep the single class. For example, can we remove this query and update the existing Unsafe Deserialization qhelp to describe the plist vulnerability?

Copy link
Contributor Author

@am0o0 am0o0 Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you right, I just kept my query files and added them to non-experimental libraries for better tracking. if you want to there is no new query file in experimental directory, it can be removed.


class Configuration extends TaintTracking::Configuration {
Configuration() { this = "PlistUnsafeDeserialization" }

override predicate isSource(DataFlow::Node source) {
// to detect CVE-2021-33575, we should uncomment following line instead of current UnsafeDeserialization::Source
// source instanceof DataFlow::LocalSourceNode
source instanceof UnsafeDeserialization::Source
}

override predicate isSink(DataFlow::Node sink) { sink instanceof PlistUnsafeSinks }
}

from Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Unsafe deserialization depends on a $@.", source.getNode(),
"potentially untrusted source"
13 changes: 13 additions & 0 deletions ruby/ql/src/experimental/cwe-502/PlistUnsafeDeserialization.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
require 'yaml'
class UsersController < ActionController::Base
def example
# not safe
result = Plist.parse_xml(params[:yaml_string])
result = Plist.parse_xml(params[:yaml_string], marshal: true)

# safe
result = Plist.parse_xml(params[:yaml_string], marshal: false)
end
end


Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
<!DOCTYPE qhelp SYSTEM "qhelp.dtd">
<qhelp>
<overview>
<p>
Processing an unvalidated user input can allow an attacker to execute arbitrary code in your application.
Unsafe deserializing the malicious serialized yaml document through the Psych (YAML) library, making it possible to execute some code or execute arbitrary code with the help of a complete gadget chain.
</p>
</overview>
<recommendation>
<p>
After Psych(YAML) 4.0.0, the load method is same as safe_load method.
This vulnerability can be prevented by using YAML.load (same as <code>YAML.safe_load</code>), <code>YAML.load_file</code> (same as <code>YAML.safe_load_file</code>) instead of <code>YAML.unsafe_*</code> methods.
Be careful that <code>YAML.load_stream</code> don't use safe_load method, Also Be careful the <code>to_ruby</code> method of Psych get called on a trusted parsed (<code>YAML.parse*</code>) yaml document.
</p>
</recommendation>
<example>
<p>In the example below, you can see safe and unsafe methods get called by a remote user input. You can give correct authorization to users, or you can use safe methods for loading yaml documents.</p>
<sample src="YAMLUnsafeYamlDeserialization.rb" />
</example>
<references>
<li>
You can read that how unsafe yaml load methods can lead to code executions.
<a href="https://devcraft.io/2021/01/07/universal-deserialisation-gadget-for-ruby-2-x-3-x.html">Universal Deserialisation Gadget for Ruby 2.x-3.x </a>.
</li>
</references>
</qhelp>
88 changes: 88 additions & 0 deletions ruby/ql/src/experimental/cwe-502/YAMLUnsafeDeserialization.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
/**
* @name Unsafe Deserialization of user-controlled data by YAML
* @description Deserializing user-controlled data may allow attackers to
* execute arbitrary code.
* @kind path-problem
* @problem.severity warning
* @security-severity 9.8
* @precision high
* @id rb/YAML-unsafe-deserialization
* @tags security
* experimental
* external/cwe/cwe-502
*/

import codeql.ruby.ApiGraphs
import codeql.ruby.DataFlow
import codeql.ruby.TaintTracking
import DataFlow::PathGraph
import codeql.ruby.security.UnsafeDeserializationCustomizations

abstract class YamlUnsafeSinks extends DataFlow::Node { }

class YamlUnsafeArgument extends YamlUnsafeSinks {
YamlUnsafeArgument() {
this =
API::getTopLevelMember(["YAML", "Psych"])
.getAMethodCall(["unsafe_load_file", "unsafe_load", "load_stream"])
.getArgument(0)
or
this =
API::getTopLevelMember(["YAML", "Psych"])
.getAMethodCall(["unsafe_load", "load_stream"])
.getKeywordArgument("yaml")
or
this =
API::getTopLevelMember(["YAML", "Psych"])
.getAMethodCall("unsafe_load_file")
.getKeywordArgument("filename")
or
this =
API::getTopLevelMember(["YAML", "Psych"])
.getAMethodCall(["parse", "parse_stream", "parse_file"])
.getAMethodCall("to_ruby")
}
}

class Configuration extends TaintTracking::Configuration {
Configuration() { this = "YamlUnsafeDeserialization" }

override predicate isSource(DataFlow::Node source) {
// to detect CVE-2022-32224, we should uncomment following line instead of current UnsafeDeserialization::Source
// source instanceof DataFlow::LocalSourceNode
source instanceof UnsafeDeserialization::Source
}

override predicate isSink(DataFlow::Node sink) {
// after changing the isSource for detecting CVE-2022-32224
// uncomment the following line only see the CVE sink not other files similar sinks
// sink.getLocation().getFile().toString().matches("%yaml_column%") and
sink instanceof YamlUnsafeSinks
}

override predicate isAdditionalTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
exists(DataFlow::CallNode yaml_parser_methods |
yaml_parser_methods =
API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall(["parse", "parse_stream"]) and
(
nodeFrom = yaml_parser_methods.getArgument(0) or
nodeFrom = yaml_parser_methods.getKeywordArgument("yaml")
) and
nodeTo = yaml_parser_methods.getAMethodCall("to_ruby")
)
or
exists(DataFlow::CallNode yaml_parser_methods |
yaml_parser_methods = API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall("parse_file") and
(
nodeFrom = yaml_parser_methods.getArgument(0) or
nodeFrom = yaml_parser_methods.getKeywordArgument("filename")
) and
nodeTo = yaml_parser_methods.getAMethodCall("to_ruby")
)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto with this file. The existing Unsafe Deserialization query should catch these cases now that you've added these sinks and the extra taint step, so do we need a separate query?


from Configuration config, DataFlow::PathNode source, DataFlow::PathNode sink
where config.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "Unsafe deserialization depends on a $@.", source.getNode(),
"potentially untrusted source"
22 changes: 22 additions & 0 deletions ruby/ql/src/experimental/cwe-502/YAMLUnsafeDeserialization.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
require 'yaml'
class UsersController < ActionController::Base
def example
# safe
Psych.load(params[:yaml_string])
Psych.load_file(params[:yaml_file])
Psych.parse_stream(params[:yaml_string])
Psych.parse(params[:yaml_string])
Psych.parse_file(params[:yaml_file])
# unsafe
Psych.unsafe_load(params[:yaml_string])
Psych.unsafe_load_file(params[:yaml_file])
Psych.load_stream(params[:yaml_string])
parse_output = Psych.parse_stream(params[:yaml_string])
parse_output.to_ruby
Psych.parse(params[:yaml_string]).to_ruby
Psych.parse_file(params[:yaml_file]).to_ruby

end
end


Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
edges
| PlistUnsafeDeserialization.rb:5:30:5:35 | call to params : | PlistUnsafeDeserialization.rb:5:30:5:49 | ...[...] |
| PlistUnsafeDeserialization.rb:6:30:6:35 | call to params : | PlistUnsafeDeserialization.rb:6:30:6:49 | ...[...] |
nodes
| PlistUnsafeDeserialization.rb:5:30:5:35 | call to params : | semmle.label | call to params : |
| PlistUnsafeDeserialization.rb:5:30:5:49 | ...[...] | semmle.label | ...[...] |
| PlistUnsafeDeserialization.rb:6:30:6:35 | call to params : | semmle.label | call to params : |
| PlistUnsafeDeserialization.rb:6:30:6:49 | ...[...] | semmle.label | ...[...] |
subpaths
#select
| PlistUnsafeDeserialization.rb:5:30:5:49 | ...[...] | PlistUnsafeDeserialization.rb:5:30:5:35 | call to params : | PlistUnsafeDeserialization.rb:5:30:5:49 | ...[...] | Unsafe deserialization depends on a $@. | PlistUnsafeDeserialization.rb:5:30:5:35 | call to params | potentially untrusted source |
| PlistUnsafeDeserialization.rb:6:30:6:49 | ...[...] | PlistUnsafeDeserialization.rb:6:30:6:35 | call to params : | PlistUnsafeDeserialization.rb:6:30:6:49 | ...[...] | Unsafe deserialization depends on a $@. | PlistUnsafeDeserialization.rb:6:30:6:35 | call to params | potentially untrusted source |
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
experimental/cwe-502/PlistUnsafeDeserialization.ql
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
require 'yaml'
class UsersController < ActionController::Base
def example
# not safe
result = Plist.parse_xml(params[:yaml_string])
result = Plist.parse_xml(params[:yaml_string], marshal: true)

# safe
result = Plist.parse_xml(params[:yaml_string], marshal: false)
end
end


Loading