Skip to content

Ruby: Add YAML unsafe deserialization sinks #13307

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
Jun 12, 2023

Conversation

hmac
Copy link
Contributor

@hmac hmac commented May 27, 2023

This is a duplicate of #12301 which I've opened to make some changes before merging (the original is from a contributor's fork).

@github-actions
Copy link
Contributor

github-actions bot commented May 27, 2023

QHelp previews:

ruby/ql/src/queries/security/cwe-502/UnsafeDeserialization.qhelp

Deserialization of user-controlled data

Deserializing untrusted data using any method that allows the construction of arbitrary objects is easily exploitable and, in many cases, allows an attacker to execute arbitrary code.

Recommendation

Avoid deserialization of untrusted data if possible. If the architecture permits it, use serialization formats that cannot represent arbitrary objects. For libraries that support it, such as the Ruby standard library's JSON module, ensure that the parser is configured to disable deserialization of arbitrary objects.

If deserializing an untrusted YAML document using the psych gem, prefer the safe_load and safe_load_file methods over load and load_file, as the former will safely handle untrusted data. Avoid passing untrusted data to the load_stream method. In psych version 4.0.0 and above, the load method can safely be used.

To safely deserialize Property List files using the plist gem, ensure that you pass marshal: false when calling Plist.parse_xml.

Example

The following example calls the Marshal.load, JSON.load, YAML.load, and Oj.load methods on data from an HTTP request. Since these methods are capable of deserializing to arbitrary objects, this is inherently unsafe.

require 'json'
require 'yaml'
require 'oj'

class UserController < ActionController::Base
  def marshal_example
    data = Base64.decode64 params[:data]
    object = Marshal.load data
    # ...
  end

  def json_example
    object = JSON.load params[:json]
    # ...
  end

  def yaml_example
    object = YAML.load params[:yaml]
    # ...
  end

  def oj_example
    object = Oj.load params[:json]
    # ...
  end
end

Using JSON.parse and YAML.safe_load instead, as in the following example, removes the vulnerability. Similarly, calling Oj.load with any mode other than :object is safe, as is calling Oj.safe_load. Note that there is no safe way to deserialize untrusted data using Marshal.

require 'json'

class UserController < ActionController::Base
  def safe_json_example
    object = JSON.parse params[:json]
    # ...
  end

  def safe_yaml_example
    object = YAML.safe_load params[:yaml]
    # ...
  end

  def safe_oj_example
    object = Oj.load params[:yaml], { mode: :strict }
    # or
    object = Oj.safe_load params[:yaml]
    # ...
  end
end

References

@hmac hmac marked this pull request as ready for review May 29, 2023 04:32
@hmac hmac requested a review from a team as a code owner May 29, 2023 04:32
* sink result.to_ruby # Unsafe call
* ```
*/
private class YamlParseStep extends AdditionalTaintStep {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to #12301 (comment)
I didn't write following predicate correctly, I think it should be two separate steps:
1.

 x = source
sink = YAML.parse(x)
sink = source.to_ruby 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about that - the second step should only apply when the return value of YAML.parse(x) flows to source, right? We could add to_ruby as a global taint-preserving step but I think that could result in false positives where some other class defines a to_ruby method. I've requested a review from others in the team; we'll see what they say.

Copy link
Contributor

@am0o0 am0o0 Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes second one can have FP,
please consider the following example too, if you like I can try to write an additional step for such a pattern too:
https://github.com/Shopify/krane/blob/0e1a3a0651ca4a0b234722ea98dae74ca8c65351/lib/krane/renderer.rb#L61-L67

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding a separate taint step for the to_rubymethod makes sense. However, let's leave that for a followup pull request in which we can investigate the impact a bit. I would expect to_ruby methods from other libraries to be mostly taint-preserving too, but we might be mistaken.

@calumgrant calumgrant requested a review from aibaars June 6, 2023 08:19
Copy link
Contributor

@aibaars aibaars left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Thanks @amammad and @hmac .

@am0o0
Copy link
Contributor

am0o0 commented Jun 8, 2023

Hi, I want to do some changes, can I commit here or I should wait for merge and open another pull request after merge?

  • It is really important to me ( more than my query submission to codeql BB program) to get feedback about using getASuccessor*() here in case of performance and accuracy, additionally I'm in the process to submit multiple queries in this week so it would be great to submit another version of my queries with this approach too.

the changes summary:
I changed these additional steps because:

  1. I wanted to do all the things totally with API graphs (not DataFlow::CallNode).
  2. I need to use getASuccessor*() before finding the "to_ruby" method, there are a lot of possibilities between parse/parse_stream/... methods and to_ruby like block parameters, elements of returned arrays.

new examples that pushed me to improve the query:

    parsed_yaml  = Psych.parse(params[:yaml_string])
    parsed_yaml.children.each do |child|
      puts child.to_ruby
    end
    Psych.parse_stream(params[:yaml_string])  do |document|
      puts document.to_ruby
    end

   # https://github.com/Shopify/krane/blob/0e1a3a0651ca4a0b234722ea98dae74ca8c65351/lib/krane/renderer.rb#L61-L67
    parsed_yaml.children.first.to_ruby

    # https://github.com/work-design/rails_extend/blob/c013aef0b7e769e37010464b0d6c2db8e83601d8/lib/rails_extend/yaml.rb#L10-L13
    parsed_yaml = Psych.parse_stream(params[:yaml_string])
    content = parsed_yaml.children[0].children[0].children
    parsed = parsed_yaml.to_ruby[0]
    parsed = content.to_ruby[0]
    Psych.parse(params[:yaml_string]).children[0].to_ruby

changes of https://github.com/github/codeql/blob/e70e3e52dcf7456465fdc6f59e687f08293c40d2/ruby/ql/lib/codeql/ruby/frameworks/Yaml.qll

private class YamlParseStep extends AdditionalTaintStep {
  override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
    exists(API::Node yamlParserMethod |
      succ = yamlParserMethod.getASuccessor*().getMethod("to_ruby").asSource() and
      (
        yamlParserMethod = yamlNode().getMethod(["parse", "parse_stream"]) and
        pred =
          [yamlParserMethod.getParameter(0), yamlParserMethod.getKeywordParameter("yaml")].asSink()
        or
        yamlParserMethod = yamlNode().getMethod("parse_file") and
        pred =
          [yamlParserMethod.getParameter(0), yamlParserMethod.getKeywordParameter("filename")]
              .asSink()
      )
    )
  }
}

the related sinks:

class YamlParseArgument extends Sink {
    YamlParseArgument() {
      this =
        yamlNode()
            .getMethod(["parse", "parse_stream", "parse_file"])
            .getASuccessor*()
            .getAMethodCall("to_ruby")
    }
  }

@am0o0
Copy link
Contributor

am0o0 commented Jun 8, 2023

also we can solve the FP of this by following which I think we can find any to_ruby correctly without conflict with other to_ruby:

private class YamlParseStep extends AdditionalTaintStep {
  override predicate step(DataFlow::Node pred, DataFlow::Node succ) {
    exists(API::Node yamlParserMethod |
      succ = yamlParserMethod.getReturn().asSource() and
      (
        yamlParserMethod = yamlNode().getMethod(["parse", "parse_stream"]) and
        pred =
          [yamlParserMethod.getParameter(0), yamlParserMethod.getKeywordParameter("yaml")].asSink()
        or
        yamlParserMethod = yamlNode().getMethod("parse_file") and
        pred =
          [yamlParserMethod.getParameter(0), yamlParserMethod.getKeywordParameter("filename")]
              .asSink()
      )
    )
    or
    exists(API::Node yamlParserMethod |
      succ =
        [
          yamlParserMethod.getASuccessor*().getMethod("to_ruby").getReturn().asSource(),
          yamlParserMethod
              .getASuccessor*()
              .getMethod("to_ruby")
              .getReturn()
              .getAnElement()
              .asSource()
        ] and
      yamlParserMethod = yamlNode().getMethod(["parse", "parse_stream", "parse_file"]) and
      pred = yamlParserMethod.getReturn().asSource()
    )
  }
}

@aibaars
Copy link
Contributor

aibaars commented Jun 12, 2023

Hi, I want to do some changes, can I commit here or I should wait for merge and open another pull request after merge?

The changes you propose sound like a good idea. However, let's merge this first and make a new pull request for further improvements. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants