Skip to content

add Ruby YAML unsafe_* methods as sink #12301

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from

Conversation

am0o0
Copy link
Contributor

@am0o0 am0o0 commented Feb 23, 2023

No description provided.

@am0o0 am0o0 requested a review from a team as a code owner February 23, 2023 20:23
@am0o0 am0o0 requested review from hmac and removed request for a team February 23, 2023 20:23
@github-actions
Copy link
Contributor

github-actions bot commented Feb 24, 2023

QHelp previews:

ruby/ql/src/experimental/CWE-502/YAMLUnsafeYamlDeserialization.qhelp

Deserialization of user-controlled data by YAML

Processing an unvalidated user input can allow an attacker to execute arbitrary code in your application. Unsafe deserializing the malicious serialized yaml document through the Psych (YAML) library, making it possible to execute some code or execute arbitrary code with the help of a complete gadget chain.

Recommendation

After Psych(YAML) 4.0.0, the load method is same as safe_load method. This vulnerability can be prevented by using YAML.load (same as YAML.safe_load), YAML.load_file (same as YAML.safe_load_file) instead of YAML.unsafe_* methods. Be careful that YAML.load_stream don't use safe_load method, Also Be careful the to_ruby method of Psych get called on a trusted parsed (YAML.parse*) yaml document.

Example

In the example below, you can see safe and unsafe methods get called by a remote user input. You can give correct authorization to users, or you can use safe methods for loading yaml documents.

require 'yaml'
class UsersController < ActionController::Base
  def example
    # safe
    Psych.load(params[:yaml_string])
    Psych.load_file(params[:yaml_file])
    Psych.parse_stream(params[:yaml_string])
    Psych.parse(params[:yaml_string])
    Psych.parse_file(params[:yaml_file])
    # unsafe
    Psych.unsafe_load(params[:yaml_string])
    Psych.unsafe_load_file(params[:yaml_file])
    Psych.load_stream(params[:yaml_string])
    parse_output = Psych.parse_stream(params[:yaml_string])
    parse_output.to_ruby
    Psych.parse(params[:yaml_string]).to_ruby
    Psych.parse_file(params[:yaml_file]).to_ruby

  end
end

@am0o0 am0o0 changed the title v1 add Ruby YAML unsafe_* methods as sink Feb 24, 2023
// for detecting The CVE we should uncomment following line
// sink.getLocation().getFile().toString().matches("%yaml_column%") and
sink instanceof YamlSink or
sink =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to place the sink at the receiver of the call?

Copy link
Contributor Author

@am0o0 am0o0 Feb 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For detecting the mentioned CVE there are a lot of results after that you uncomment source instanceof DataFlow::LocalSourceNode in isSource.
so I just want to filter the results to see the CVE taint path easier than before
that is my mistake I wrote a bad comment for that.

.getAMethodCall("to_ruby")
}

override predicate isAdditionalTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add these taint steps as global taint steps?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understood this comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The taintsteps defined in the isAdditionalTaintStep of a TaintTracking configuration only apply to that particular configuration. I can imagine untrusted YAML data being parsed and then one of the properties flowing into a different injection sink (eg: SQL injection). In those cases we want to apply the taintsteps to all Tracking configurations. To do that, create a class that extends from AdditionalTaintStep:

private import codeql.ruby.dataflow.FlowSteps

private class YamlParseStep extends AdditionalTaintStep {
  override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
    ...
  }
}

You can see some examples here. You would need to create a Yaml.qll file in ruby/ql/lib/codeql/ruby/frameworks and then import it from https://github.com/github/codeql/blob/main/ruby/ql/lib/codeql/ruby/Frameworks.qll

@hmac @aibaars can you verify we want to do that?

// for detecting The CVE we should uncomment following line
// sink.getLocation().getFile().toString().matches("%yaml_column%") and
sink instanceof YamlSink or
sink =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make its own YamlSink for this. Would it make sense to add the sink at the reciever of the to_ruby call instead?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't put the parse_*.to_ruby sinks to YamlSink as I couldn't find a way to do taint step to do following:
for a code like

require 'yaml'
class UsersController < ActionController::Base
  def example
     Psych.parse_stream(params[:yaml_string]).to_ruby
  end
end

I wanted to make to_ruby a Mandatory condition, and also I should have a taint sink like getArgument to be able to connect the chain of parse_*.to_ruby to the taint steps.

so I couldn't use YamlSink and I had to separate parse_*.to_ruby from YamlSink

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and I thought it doesn't have any problem, as we can see the taint steps in vscode, so we can see that the user input first go to parse_* and then the .to_ruby will cause the problem.

@aibaars
Copy link
Contributor

aibaars commented Feb 24, 2023

@pwntester While the bug bounty policy says that new queries should go into the "experimental" folder, I think in this case the additional sinks could be added to the existing unsafe deserialization query. We could validate the additional sinks by running the existing and the updated query against a batch of say 1000 repos and see whether the new results are good. What do you think?

@am0o0
Copy link
Contributor Author

am0o0 commented Feb 25, 2023

I added another dangerous sink in this pull request, as I saw this is not a good idea to opening another one.
I updated related issue in github/securitylab too.

@hmac
Copy link
Contributor

hmac commented Feb 27, 2023

As @aibaars says, I think these new sinks are best added to the existing rb/unsafe-deserialization query. We would add them as new classes in UnsafeDeserializationCustomizations.qll, extending UnsafeDeserialization::Sink. For the YAML sinks, we already have a YamlLoadArgument class that we could modify:

class YamlLoadArgument extends Sink {
  YamlLoadArgument() {
    exists(API::Node yaml | yaml = API::getTopLevelMember(["YAML", "Psych"]) |
      this = yaml
        .getAMethodCall(["load", "unsafe_load_file", "unsafe_load", "load_stream"])
        .getArgument(0)
      or
      this = yaml
        .getAMethodCall(["unsafe_load", "load_stream"])
        .getKeywordArgument("yaml")
      // or ...
    )
  }
}

@am0o0
Copy link
Contributor Author

am0o0 commented Feb 27, 2023

Hi, I want to know about the load method that is safe now, but still some repositories are using the lower versions of Psych that load is unsafe. should codeql contains both of them as sink?

@pwntester
Copy link
Contributor

Hi @amammad, as per @aibaars and @hmac suggestion, can you please add the new YAML/PLIST sinks into the existing rb/unsafe-deserialization query?

Hi, I want to know about the load method that is safe now, but still some repositories are using the lower versions of Psych that load is unsafe. should codeql contains both of them as sink?

In most of the cases we remove a sink when it is no longer vulnerable in some version since Repos that use the old version should listen to dependabot an upgrade (or turn on dependabot in the first place).

@am0o0
Copy link
Contributor Author

am0o0 commented Mar 1, 2023

Hi @amammad, as per @aibaars and @hmac suggestion, can you please add the new YAML/PLIST sinks into the existing rb/unsafe-deserialization query?

OK, I'm working on it now.

@am0o0
Copy link
Contributor Author

am0o0 commented Mar 1, 2023

I removed the YAML.load too, if I should not do that, please tell me to add this sink again.

@am0o0
Copy link
Contributor Author

am0o0 commented Mar 1, 2023

I have an important question, I want to submit bug slayer request for this query too, if i merged it in a non-experimental directory so all GitHub repositories that are using codeql will recognize the vulnerability and I can't really submit my bug slayer request as maintainers will fix the vulnerabilities according to codeql action alerts.

@pwntester
Copy link
Contributor

I have an important question, I want to submit bug slayer request for this query too, if i merged it in a non-experimental directory so all GitHub repositories that are using codeql will recognize the vulnerability and I can't really submit my bug slayer request as maintainers will fix the vulnerabilities according to codeql action alerts.

It will take some time to merge this PR and make it available for Code Scanning. In the meantime you can mass scan for this issue using MRVA and report any findings to the maintainers.

@@ -23,6 +24,27 @@ class Configuration extends TaintTracking::Configuration {

override predicate isSink(DataFlow::Node sink) { sink instanceof UnsafeDeserialization::Sink }

override predicate isAdditionalTaintStep(DataFlow::Node nodeFrom, DataFlow::Node nodeTo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please define these as global taint steps

@am0o0
Copy link
Contributor Author

am0o0 commented Mar 1, 2023

I didn't change example files because i think it need to add test results too, so I will add them after final changes if you agree with this.

Copy link
Contributor

@hmac hmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general this looks like a great addition, but I think there's some cleaning up we could do to make it more mergeable.

@@ -0,0 +1,30 @@
/**
* add additional steps for to_ruby method of YAML/Psych library
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The file comment here is typically a bit more general, for example

/**
 * Provides modeling for the `YAML` and `Psych` libraries. */
 */


private class YamlParseStep extends AdditionalTaintStep {
override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
exists(DataFlow::CallNode yaml_parser_methods |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The preferred style is camelCase for variable names. We also tend to use singular nouns. So this should be yamlParserMethod.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you prefer following format

private class YamlParseStep extends AdditionalTaintStep {
  override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
    exists(DataFlow::CallNode yamlParserMethod |
      yamlParserMethod =
        API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall(["parse", "parse_stream"]) and
      (
        pred = yamlParserMethod.getArgument(0) or
        pred = yamlParserMethod.getKeywordArgument("yaml")
      ) and
      succ = yamlParserMethod.getAMethodCall("to_ruby")
      or
      yamlParserMethod = API::getTopLevelMember(["YAML", "Psych"]).getAMethodCall("parse_file") and
      (
        pred = yamlParserMethod.getArgument(0) or
        pred = yamlParserMethod.getKeywordArgument("filename")
      ) and
      succ = yamlParserMethod.getAMethodCall("to_ruby")
    )
  }
}

Or two separate exists quantifiers?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a single exists is preferable. It may provide a performance improvement as well, but I don't know enough to be sure of that.

private import codeql.ruby.DataFlow
private import codeql.ruby.ApiGraphs

private class YamlParseStep extends AdditionalTaintStep {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could add a doc comment to this class explaining the step we are adding here. Perhaps something like

A taint step related to the result of `YAML.parse` calls, or similar.
In the following example, this step will propagate taint from 
`source` to `sink`:

```rb
x = source
result = YAML.parse(x)
sink result.to_ruby

@@ -75,13 +75,41 @@ module UnsafeDeserialization {
}

/**
* An argument in a call to `YAML.load`, considered a sink
* An argument in a call to `YAML.unsafe_*` and `YAML.load_stream` , considered sinks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to leave the YAML.load sink in here for now. We can consider removing it separately, and that will make it easier to analyse the impact of such a change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it look good?

  /**
   * An argument in a call to `YAML.unsafe_*` and `YAML.load_stream` , considered sinks
   * for unsafe deserialization. The `YAML` module is an alias of `Psych` in
   * recent versions of Ruby.
   * the `this = yamlNode().getAMethodCall("load").getArgument(0)` is safe
   *  in recent versions of YAML library, so it will be removed in future.
   */
  class YamlLoadArgument extends Sink {
    YamlLoadArgument() {
      this = yamlNode().getAMethodCall("load").getArgument(0)
      or
      this =
        yamlNode().getAMethodCall(["unsafe_load_file", "unsafe_load", "load_stream"]).getArgument(0)
      or
      this = yamlNode().getAMethodCall(["unsafe_load", "load_stream"]).getKeywordArgument("yaml")
      or
      this = yamlNode().getAMethodCall("unsafe_load_file").getKeywordArgument("filename")
    }
  }

  /**
   * YAML/Psych Top level Class member
   */
  private API::Node yamlNode() { result = API::getTopLevelMember(["YAML", "Psych"]) }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That looks great - if you know it, can you mention the minimum version of the YAML library for which the load method is safe? That will help others to decide whether to make this change in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, now I mention to the exact version too.

.getAMethodCall("unsafe_load_file")
.getKeywordArgument("filename")
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could extract the API::getTopLevelMember(["YAML", "Psych"]) expression into a separate predicate, to reduce duplication a bit:

private API::Node yamlNode() {
  result = API::getTopLevelMember(["YAML", "Psych"])
}

Then in the charpred above:

this = yamlNode().getAMethodCall(...)

@@ -9,6 +9,7 @@
private import codeql.ruby.AST
private import codeql.ruby.DataFlow
private import codeql.ruby.TaintTracking
private import codeql.ruby.ApiGraphs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed?

* @problem.severity warning
* @security-severity 9.8
* @precision high
* @id rb/Plist-unsafe-deserialization
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @id rb/Plist-unsafe-deserialization
* @id rb/plist-unsafe-deserialization

* check whether an input argument has desired "key: value" input or not.
* borrowed from UnsafeDeserialization module with some changes
*/
predicate checkkeyBalue(CfgNodes::ExprNodes::PairCfgNode p, string key, string value) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
predicate checkkeyBalue(CfgNodes::ExprNodes::PairCfgNode p, string key, string value) {
predicate checkkeyValue(CfgNodes::ExprNodes::PairCfgNode p, string key, string value) {

plistParsexml.getNumberOfArguments() = 1
)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not great that we have this class duplicated. It would be better to find a way to keep the single class. For example, can we remove this query and update the existing Unsafe Deserialization qhelp to describe the plist vulnerability?

Copy link
Contributor Author

@am0o0 am0o0 Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you right, I just kept my query files and added them to non-experimental libraries for better tracking. if you want to there is no new query file in experimental directory, it can be removed.

nodeTo = yaml_parser_methods.getAMethodCall("to_ruby")
)
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto with this file. The existing Unsafe Deserialization query should catch these cases now that you've added these sinks and the extra taint step, so do we need a separate query?

@am0o0 am0o0 requested a review from a team as a code owner April 24, 2023 08:49
@github-actions
Copy link
Contributor

github-actions bot commented Apr 27, 2023

QHelp previews:

ruby/ql/src/experimental/cwe-502/PlistUnsafeDeserialization.qhelp

Unsafe Deserialization of user-controlled data by Plist

Processing an unvalidated user input can allow an attacker to execute arbitrary code in your application. Unsafe deserializing the malicious serialized xml document through the Plist library, making it possible to execute some code or execute arbitrary code with the help of a complete gadget chain.

Recommendation

This vulnerability in Plist can be prevented by calling Plist.parse_xml FileOrXmlString, marshal: false.

Example

In the example below, you can see safe and unsafe Plist dangerous method calls that can be abused by a remote user input. You can use "marshal: false" as an arugument for Plist.parse_xml to use it safe.

require 'plist'
class UsersController < ActionController::Base
  def example
    # not safe
    config = true
    result = Plist.parse_xml(params[:yaml_string])
    result = Plist.parse_xml(params[:yaml_string], marshal: config)
    result = Plist.parse_xml(params[:yaml_string], marshal: true)

    # safe
    config = false
    result = Plist.parse_xml(params[:yaml_string], marshal: false)
    result = Plist.parse_xml(params[:yaml_string], marshal: config)
  end
end

References

ruby/ql/src/experimental/cwe-502/YAMLUnsafeDeserialization.qhelp

Unsafe Deserialization of user-controlled data by YAML

Processing an unvalidated user input can allow an attacker to execute arbitrary code in your application. Unsafe deserializing the malicious serialized yaml document through the Psych (YAML) library, making it possible to execute some code or execute arbitrary code with the help of a complete gadget chain.

Recommendation

After Psych(YAML) 4.0.0, the load method is same as safe_load method. This vulnerability can be prevented by using YAML.load (same as YAML.safe_load), YAML.load_file (same as YAML.safe_load_file) instead of YAML.unsafe_* methods. Be careful that YAML.load_stream don't use safe_load method, Also Be careful the to_ruby method of Psych get called on a trusted parsed (YAML.parse*) yaml document.

Example

In the example below, you can see safe and unsafe methods get called by a remote user input. You can give correct authorization to users, or you can use safe methods for loading yaml documents.

require 'yaml'
class UsersController < ActionController::Base
  def example
    # safe
    Psych.load(params[:yaml_string])
    Psych.load_file(params[:yaml_file])
    Psych.parse_stream(params[:yaml_string])
    Psych.parse(params[:yaml_string])
    Psych.parse_file(params[:yaml_file])
    # unsafe
    Psych.unsafe_load(params[:yaml_string])
    Psych.unsafe_load_file(params[:yaml_file])
    Psych.load_stream(params[:yaml_string])
    parse_output = Psych.parse_stream(params[:yaml_string])
    parse_output.to_ruby
    Psych.parse(params[:yaml_string]).to_ruby
    Psych.parse_file(params[:yaml_file]).to_ruby

  end
end

References

ruby/ql/src/queries/security/cwe-502/UnsafeDeserialization.qhelp

Deserialization of user-controlled data

Deserializing untrusted data using any method that allows the construction of arbitrary objects is easily exploitable and, in many cases, allows an attacker to execute arbitrary code.

Recommendation

Avoid deserialization of untrusted data if possible. If the architecture permits it, use serialization formats that cannot represent arbitrary objects. For libraries that support it, such as the Ruby standard library's JSON module, ensure that the parser is configured to disable deserialization of arbitrary objects.

YAML/Psych recommendation: After Psych(YAML) 4.0.0, the load method is same as safe_load method. This vulnerability can be prevented by using YAML.load (same as YAML.safe_load), YAML.load_file (same as YAML.safe_load_file) instead of YAML.unsafe_* methods. Be careful that YAML.load_stream don't use safe_load method, Also Be careful the to_ruby method of Psych get called on a trusted parsed (YAML.parse*) yaml document.

This vulnerability in Plist can be prevented by calling Plist.parse_xml FileOrXmlString, marshal: false.

Example

The following example calls the Marshal.load, JSON.load, YAML.load, and Oj.load methods on data from an HTTP request. Since these methods are capable of deserializing to arbitrary objects, this is inherently unsafe.

require 'json'
require 'yaml'
require 'oj'

class UserController < ActionController::Base
  def marshal_example
    data = Base64.decode64 params[:data]
    object = Marshal.load data
    # ...
  end

  def json_example
    object = JSON.load params[:json]
    # ...
  end

  def yaml_example
    object = YAML.load params[:yaml]
    # ...
  end

  def oj_example
    object = Oj.load params[:json]
    # ...
  end
end

In the example below, you can see safe and unsafe methods get called by a remote user input. You can give correct authorization to users, or you can use safe methods for loading yaml documents.

require 'yaml'
class UsersController < ActionController::Base
  def example
    # safe
    Psych.load(params[:yaml_string])
    Psych.load_file(params[:yaml_file])
    Psych.parse_stream(params[:yaml_string])
    Psych.parse(params[:yaml_string])
    Psych.parse_file(params[:yaml_file])
    # unsafe
    Psych.unsafe_load(params[:yaml_string])
    Psych.unsafe_load_file(params[:yaml_file])
    Psych.load_stream(params[:yaml_string])
    parse_output = Psych.parse_stream(params[:yaml_string])
    parse_output.to_ruby
    Psych.parse(params[:yaml_string]).to_ruby
    Psych.parse_file(params[:yaml_file]).to_ruby

  end
end

In the example below, you can see safe and unsafe Plist dangerous method calls that can be abused by a remote user input. You can use "marshal: false" as an arugument for Plist.parse_xml to use it safe.

require 'plist'
class UsersController < ActionController::Base
  def example
    # not safe
    config = true
    result = Plist.parse_xml(params[:yaml_string])
    result = Plist.parse_xml(params[:yaml_string], marshal: config)
    result = Plist.parse_xml(params[:yaml_string], marshal: true)

    # safe
    config = false
    result = Plist.parse_xml(params[:yaml_string], marshal: false)
    result = Plist.parse_xml(params[:yaml_string], marshal: config)
  end
end

Using JSON.parse and YAML.safe_load instead, as in the following example, removes the vulnerability. Similarly, calling Oj.load with any mode other than :object is safe, as is calling Oj.safe_load. Note that there is no safe way to deserialize untrusted data using Marshal.

require 'json'

class UserController < ActionController::Base
  def safe_json_example
    object = JSON.parse params[:json]
    # ...
  end

  def safe_yaml_example
    object = YAML.safe_load params[:yaml]
    # ...
  end

  def safe_oj_example
    object = Oj.load params[:yaml], { mode: :strict }
    # or
    object = Oj.safe_load params[:yaml]
    # ...
  end
end

References

@hmac
Copy link
Contributor

hmac commented May 26, 2023

@amammad sorry for the delay with this PR. I think we can merge it in after a few more changes. Do you mind if I make the changes myself? The credit will remain all yours, of course :)

@am0o0
Copy link
Contributor Author

am0o0 commented May 26, 2023

@amammad sorry for the delay with this PR. I think we can merge it in after a few more changes. Do you mind if I make the changes myself? The credit will remain all yours, of course :)

@hmac NP at all :)

@calumgrant
Copy link
Contributor

Closing in favour of #13307

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants