Skip to content

Ruby: add seperate additional steps between YAML.parse* methods and to_ruby #13431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Feb 26, 2024

Conversation

am0o0
Copy link
Contributor

@am0o0 am0o0 commented Jun 12, 2023

related to this.
the changes summary:

  • do all the things totally with API graphs (not DataFlow::CallNode).
  • use getASuccessor*() before finding the to_ruby method, there are a lot of possibilities between parse/parse_stream/... methods and to_ruby like block parameters, elements of returned arrays.

new examples that pushed me to improve the query:

    parsed_yaml  = Psych.parse(params[:yaml_string])
    parsed_yaml.children.each do |child|
      puts child.to_ruby
    end
    Psych.parse_stream(params[:yaml_string])  do |document|
      puts document.to_ruby
    end

   # https://github.com/Shopify/krane/blob/0e1a3a0651ca4a0b234722ea98dae74ca8c65351/lib/krane/renderer.rb#L61-L67
    parsed_yaml.children.first.to_ruby

    # https://github.com/work-design/rails_extend/blob/c013aef0b7e769e37010464b0d6c2db8e83601d8/lib/rails_extend/yaml.rb#L10-L13
    parsed_yaml = Psych.parse_stream(params[:yaml_string])
    content = parsed_yaml.children[0].children[0].children
    parsed = parsed_yaml.to_ruby[0]
    parsed = content.to_ruby[0]

Also, It is really important to me ( more than my query submission to codeql BB program) to get feedback about using getASuccessor*() in case of performance and accuracy if there is only an unique end for Successors like here that is to_ruby, additionally I'm in the process to submit multiple queries in this week so it would be great to give me a hint about this and I'll submit another version of my queries with this approach too.

@asgerf
Copy link
Contributor

asgerf commented Jun 12, 2023

API::Node.getASuccessor is about to be deprecated, so I would appreciate it if new uses were not introduced, as it would mean I have to rewrite it later.

You mentioned in the PR description that there are many ways of accessing children of a YAML node. If you don't want to precisely model which methods are doing what, you could do something like:

API::Node yamlChildNodeAccess(API::Node source) {
  source = <return value from initial parse call>
  or
  result = yamlChildNodeAccess(source).getMethod(_).getReturn()
  or
  result = yamlChildNodeAccess(source).getMethod(_).getBlock().getParameter(_)
  or
  result = yamlChildNodeAccess(source).getAnElement()
}

@am0o0
Copy link
Contributor Author

am0o0 commented Jun 12, 2023

Thanks @asgerf

  1. I tested following, it seems that there is no bind between each source and the result and I got near 1000 results. because I want to source and result as additional step too so I must figure out how can i do that too.
API::Node yamlChildNodeAccess(API::Node source) {
  source = yamlNode().getMethod(["parse", "parse_stream"]).getReturn()
  or
  result = yamlChildNodeAccess(source).getMethod(_).getReturn()
  or
  result = yamlChildNodeAccess(source).getMethod(_).getBlock().getParameter(_)
  or
  result = yamlChildNodeAccess(source).getAnElement()
}
  1. I've used getASuccessor() in one of my javascript queries and I want to submit that soon, could you tell me which languages gonna remove getASuccessor().

@am0o0
Copy link
Contributor Author

am0o0 commented Jun 12, 2023

well,I think I was wrong, now it is working for me,
what about performance issues? I really worry about this.

exists(API::Node parseSuccessors | parseSuccessors = yamlChildNodeAccess(_) |
      succ =
        [
          parseSuccessors.getMethod("to_ruby").getReturn().asSource(),
          parseSuccessors.getMethod("to_ruby").getReturn().getAnElement().asSource()
        ] and
      pred = parseSuccessors.asSource()
    )

@asgerf
Copy link
Contributor

asgerf commented Jun 12, 2023

tested following, it seems that there is no bind between each source and the result and I got near 1000 results.

Sorry, the base case needs and source = result as well.

I've used getASuccessor() in one of my javascript queries and I want to submit that soon, could you tell me which languages gonna remove getASuccessor().

Eventually it will be deprecated in JS as well but it will be a while and we won't reject submissions that occurred before deprecation.

@am0o0
Copy link
Contributor Author

am0o0 commented Jun 12, 2023

  1. Well, I really don't want make this conversation long. I think the suggested predicate must check result that result can has prefix of a parse method, like following:
 API::Node yamlParseChildNodeAccess(API::Node source) {
  source = yamlNode().getMethod(["parse", "parse_stream"]).getReturn() and source = result
  or
  (
    result = yamlParseChildNodeAccess(source).getMethod(_).getReturn()
    or
    result = yamlParseChildNodeAccess(source).getMethod(_).getBlock().getParameter(_)
    or
    result = yamlParseChildNodeAccess(source).getAnElement()
  ) and
  result.getPath().matches(yamlNode().getMethod(["parse", "parse_stream"]).getPath() + "%")
}
  1. what about performance issues? I really worry about this. :)

@am0o0
Copy link
Contributor Author

am0o0 commented Jun 12, 2023

About 1 I was wrong, sorry to took your time!

@owen-mc owen-mc changed the title make seperate steps for YAML.parse* and use getAsuccessor*() to rea… Ruby: make seperate steps for YAML.parse* and use getAsuccessor*() to rea… Jun 13, 2023
@am0o0 am0o0 changed the title Ruby: make seperate steps for YAML.parse* and use getAsuccessor*() to rea… Ruby: add seperate additional steps between YAML.parse* methods and to_ruby Jun 13, 2023
@pwntester
Copy link
Contributor

@amammad would it make sense to add transform everywhere you are using to_ruby since there is an alias for it?

@am0o0
Copy link
Contributor Author

am0o0 commented Jun 15, 2023

Hi @pwntester
I found some instances of transform with GitHub search too. I can add this alias here if you are agree.

@asgerf asgerf added the no-change-note-required This PR does not need a change note label Jun 29, 2023
@calumgrant
Copy link
Contributor

Looks great, please fix the tests so we can merge this.

@calumgrant calumgrant added the awaiting-response The CodeQL team is awaiting further input or clarification from the original reporter of this issue. label Sep 11, 2023
}
}

predicate isAdditionalFlowStepTest(DataFlow::Node pred, DataFlow::Node succ) {
Copy link
Contributor Author

@am0o0 am0o0 Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hmac This predicate is the same as above just for testing, so the predicate works fine and I can see additional steps from the first argument of a yaml parse method like YAML.parse(arg) to a to_ruby method return value. but I don't know why it doesn't work in my StateConfigSig implementation.

@am0o0
Copy link
Contributor Author

am0o0 commented Oct 24, 2023

Hi, @calumgrant the query is ready for review.

@erik-krogh erik-krogh removed the awaiting-response The CodeQL team is awaiting further input or clarification from the original reporter of this issue. label Nov 20, 2023
…ch final to_ruby method call, All parts have Rewritten with API graphs exclusively
… and thank him that providing the solution
@hmac hmac force-pushed the amammad-ruby-YAMLunsafeLoad branch from 251fb99 to 32f5667 Compare February 26, 2024 12:16
Copy link
Contributor

QHelp previews:

ruby/ql/src/experimental/cwe-502/UnsafeYamlDeserialization.qhelp

Deserialization of user-controlled yaml data

Deserializing untrusted data using any method that allows the construction of arbitrary objects is easily exploitable and, in many cases, allows an attacker to execute arbitrary code.

Recommendation

If deserializing an untrusted YAML document using the psych gem, prefer the safe_load and safe_load_file methods over load and load_file, as the former will safely handle untrusted data. Avoid passing untrusted data to the load_stream method. In psych version 4.0.0 and above, the load method can safely be used.

Example

The following example calls the Marshal.load, JSON.load, YAML.load, and Oj.load methods on data from an HTTP request. Since these methods are capable of deserializing to arbitrary objects, this is inherently unsafe.

require 'yaml'

class UserController < ActionController::Base
  def yaml_example
    object = YAML.unsafe_load params[:yaml]
    object = YAML.load_stream params[:yaml]
    parsed_yaml = Psych.parse_stream(params[:yaml])
    
    # to_ruby is unsafe
    parsed_yaml.children.each do |child|
      object = child.to_ruby
    end
    object = Psych.parse(params[:yaml]).to_ruby
    # ...
  end
end

Using JSON.parse and YAML.safe_load instead, as in the following example, removes the vulnerability. Similarly, calling Oj.load with any mode other than :object is safe, as is calling Oj.safe_load. Note that there is no safe way to deserialize untrusted data using Marshal.

require 'yaml'

class UserController < ActionController::Base
  def safe_yaml_example
    object = YAML.load params[:yaml]
    object = Psych.load_file params[:yaml]
    object = YAML.safe_load params[:yaml]
    # ...
  end
end

References

Comment on lines 31 to 46
/**
* A taint step related to the result of `YAML.parse` calls, or similar.
* In the following example, this step will propagate taint from
* `source` to `sink`:
* this contains two seperate steps:
* ```rb
* x = source
* sink = YAML.parse(x)
* ```
* By second step
* source is a Successor of `YAML.parse(x)`
* which ends with `to_ruby` or an Element of `to_ruby`
* ```ruby
* sink source.to_ruby # Unsafe call
* ```
*/

Check warning

Code scanning / CodeQL

Misspelling

This comment contains the common misspelling 'seperate', which should instead be 'separate'.
Comment on lines 31 to 46
/**
* A taint step related to the result of `YAML.parse` calls, or similar.
* In the following example, this step will propagate taint from
* `source` to `sink`:
* this contains two seperate steps:
* ```rb
* x = source
* sink = YAML.parse(x)
* ```
* By second step
* source is a Successor of `YAML.parse(x)`
* which ends with `to_ruby` or an Element of `to_ruby`
* ```ruby
* sink source.to_ruby # Unsafe call
* ```
*/

Check warning

Code scanning / CodeQL

Predicate QLDoc style.

The QLDoc for a predicate without a result should start with 'Holds'.
@hmac hmac merged commit 6ff0054 into github:main Feb 26, 2024
@am0o0 am0o0 deleted the amammad-ruby-YAMLunsafeLoad branch September 14, 2024 11:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation no-change-note-required This PR does not need a change note Ruby
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants