Deployed b7bd3d6 with MkDocs version: 1.6.0

github-actions[bot] · github-actions[bot] · commit 846e2a2d1322 · 2024-07-14T02:36:10.000Z
diff --git a/index.html b/index.html
@@ -703,8 +703,38 @@ <h1 id="scicode-a-research-coding-benchmark-curated-by-scientists">SciCode: A Re
 </p>
 
 <h2 id="introduction">Introduction</h2>
-<p>SciCode is a newly developed benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of <strong>6</strong> domains: Physics, Math, Material Science, Biology, and Chemistry. They span 16 diverse natural science sub-fields. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains <strong>338</strong> subproblems decomposed from <strong>80</strong> challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation.</p>
+<p>SciCode is a newly developed benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of <strong>6</strong> domains: Physics, Math, Material Science, Biology, and Chemistry. They span 16 diverse natural science sub-fields. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains <strong>338</strong> subproblems decomposed from <strong>80</strong> challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only <strong>4.6%</strong> of the problems in the most realistic setting. </p>
 <h2 id="overview">Overview</h2>
+<table>
+<thead>
+<tr>
+<th><strong>Fields</strong></th>
+<th><strong>Subfields</strong></th>
+</tr>
+</thead>
+<tbody>
+<tr>
+<td><strong>Mathematics</strong></td>
+<td>Numerical Linear Algebra (7), Computational Mechanics (6), Computational Finance (1)</td>
+</tr>
+<tr>
+<td><strong>Physics</strong></td>
+<td>Condensed Matter Physics (13), Optics (10), Quantum Information/Computing (6), Computational Physics (5), Astrophysics (2), Particle Physics (1)</td>
+</tr>
+<tr>
+<td><strong>Chemistry</strong></td>
+<td>Quantum Chemistry (5), Computational Chemistry (3)</td>
+</tr>
+<tr>
+<td><strong>Biology</strong></td>
+<td>Ecology (6), Biochemistry (1), Genetics (1)</td>
+</tr>
+<tr>
+<td><strong>Material Science</strong></td>
+<td>Semiconductor Materials (7), Molecular Modeling (6)</td>
+</tr>
+</tbody>
+</table>
 <div class="grid cards">
 <ul>
 <li>
diff --git a/search/search_index.json b/search/search_index.json
@@ -1 +1 @@
-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"SciCode: A Research Coding Benchmark Curated by Scientists","text":"<p> Minyang Tian<sup>1,2*\u2021</sup>, Luyu Gao<sup>3*</sup>, Shizhuo Dylan Zhang<sup>1</sup>, Xinan Chen<sup>1\u2020</sup>, Cunwei Fan<sup>1\u2020</sup>, Xuefei Guo<sup>1\u2020</sup>, Roland Haas<sup>1\u2020</sup>, Pan Ji<sup>4\u2020</sup>, Kittithat Krongchon<sup>1\u2020</sup>, Yao Li<sup>1\u2020</sup>, Shengyan Liu<sup>1\u2020</sup>, Di Luo<sup>5,6,11\u2020</sup>, Yutao Ma<sup>7\u2020</sup>, Hao Tong<sup>1\u2020</sup>, Kha Trinh<sup>7\u2020</sup>, Chenyu Tian<sup>8\u2020</sup>, Zihan Wang<sup>1\u2020</sup>, Bohao Wu<sup>1\u2020</sup>, Yanyu Xiong<sup>9\u2020</sup>, Shengzhu Yin<sup>1\u2020</sup>, Minhui Zhu<sup>1\u2020</sup>, Kilian Lieret<sup>10</sup>, Yanxin Lu<sup>1</sup>, Genglin Liu<sup>1</sup>, Yufeng Du<sup>1</sup>, Tianhua Tao<sup>1</sup>, Ofir Press<sup>10</sup>, Jamie Callan<sup>3</sup>, Eliu Huerta<sup>1,2,7\u2021</sup>, Hao Peng<sup>1\u2021</sup> </p> <p> <sup>1</sup>University of Illinois Urbana-Champaign   <sup>2</sup>Argonne National Laboratory   <sup>3</sup>Carnegie Mellon University   <sup>4</sup>University of North Carolina at Chapel Hill   <sup>5</sup>Massachusetts Institute of Technology   <sup>6</sup>Harvard University   <sup>7</sup>University of Chicago   <sup>8</sup>University of Texas at Austin   <sup>9</sup>Stanford University   <sup>10</sup>Princeton University   <sup>11</sup>The NSF AI Institute for Artificial Intelligence and Fundamental Interactions   </p> <p> * Equal contribution lead authors. \u2020 Data curation, alphabetical order. \u2021 Corresponding to: {mtian8, haopeng}@illinois.edu, elihu@anl.gov </p>"},{"location":"#introduction","title":"Introduction","text":"<p>SciCode is a newly developed benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of 6 domains: Physics, Math, Material Science, Biology, and Chemistry. They span 16 diverse natural science sub-fields. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation.</p>"},{"location":"#overview","title":"Overview","text":"<ul> <li> <p> Leaderboard</p> <p>How good are LMs at science, really?</p> <p> Browse the results</p> </li> <li> <p> Paper</p> <p>Learn all the details</p> <p> Read the paper</p> </li> </ul> <ul> <li> <p> Installation &amp; usage</p> <p>Learn how to evaluate your model</p> <p> Read the docs</p> </li> </ul>"},{"location":"_footer/","title":"footer","text":"<ul> <li> <p> Something broken?  Report bug</p> </li> <li> <p> Something unclear?  Ask question</p> </li> </ul>"},{"location":"leaderboard/","title":"Leaderboard","text":"<p> date author model score 240712 scicode gpt4 0.8 240712 scicode gpt4o 0.8 <p></p> <p>How to submit</p> <p>Want to submit your own model? Head over to the documentation.</p>"},{"location":"leaderboard_table/","title":"Leaderboard table","text":"date author model score 240712 scicode gpt4 0.8 240712 scicode gpt4o 0.8"}]}
+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"SciCode: A Research Coding Benchmark Curated by Scientists","text":"<p> Minyang Tian<sup>1,2*\u2021</sup>, Luyu Gao<sup>3*</sup>, Shizhuo Dylan Zhang<sup>1</sup>, Xinan Chen<sup>1\u2020</sup>, Cunwei Fan<sup>1\u2020</sup>, Xuefei Guo<sup>1\u2020</sup>, Roland Haas<sup>1\u2020</sup>, Pan Ji<sup>4\u2020</sup>, Kittithat Krongchon<sup>1\u2020</sup>, Yao Li<sup>1\u2020</sup>, Shengyan Liu<sup>1\u2020</sup>, Di Luo<sup>5,6,11\u2020</sup>, Yutao Ma<sup>7\u2020</sup>, Hao Tong<sup>1\u2020</sup>, Kha Trinh<sup>7\u2020</sup>, Chenyu Tian<sup>8\u2020</sup>, Zihan Wang<sup>1\u2020</sup>, Bohao Wu<sup>1\u2020</sup>, Yanyu Xiong<sup>9\u2020</sup>, Shengzhu Yin<sup>1\u2020</sup>, Minhui Zhu<sup>1\u2020</sup>, Kilian Lieret<sup>10</sup>, Yanxin Lu<sup>1</sup>, Genglin Liu<sup>1</sup>, Yufeng Du<sup>1</sup>, Tianhua Tao<sup>1</sup>, Ofir Press<sup>10</sup>, Jamie Callan<sup>3</sup>, Eliu Huerta<sup>1,2,7\u2021</sup>, Hao Peng<sup>1\u2021</sup> </p> <p> <sup>1</sup>University of Illinois Urbana-Champaign   <sup>2</sup>Argonne National Laboratory   <sup>3</sup>Carnegie Mellon University   <sup>4</sup>University of North Carolina at Chapel Hill   <sup>5</sup>Massachusetts Institute of Technology   <sup>6</sup>Harvard University   <sup>7</sup>University of Chicago   <sup>8</sup>University of Texas at Austin   <sup>9</sup>Stanford University   <sup>10</sup>Princeton University   <sup>11</sup>The NSF AI Institute for Artificial Intelligence and Fundamental Interactions   </p> <p> * Equal contribution lead authors. \u2020 Data curation, alphabetical order. \u2021 Corresponding to: {mtian8, haopeng}@illinois.edu, elihu@anl.gov </p>"},{"location":"#introduction","title":"Introduction","text":"<p>SciCode is a newly developed benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of 6 domains: Physics, Math, Material Science, Biology, and Chemistry. They span 16 diverse natural science sub-fields. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only 4.6% of the problems in the most realistic setting. </p>"},{"location":"#overview","title":"Overview","text":"Fields Subfields Mathematics Numerical Linear Algebra (7), Computational Mechanics (6), Computational Finance (1) Physics Condensed Matter Physics (13), Optics (10), Quantum Information/Computing (6), Computational Physics (5), Astrophysics (2), Particle Physics (1) Chemistry Quantum Chemistry (5), Computational Chemistry (3) Biology Ecology (6), Biochemistry (1), Genetics (1) Material Science Semiconductor Materials (7), Molecular Modeling (6) <ul> <li> <p> Leaderboard</p> <p>How good are LMs at science, really?</p> <p> Browse the results</p> </li> <li> <p> Paper</p> <p>Learn all the details</p> <p> Read the paper</p> </li> </ul> <ul> <li> <p> Installation &amp; usage</p> <p>Learn how to evaluate your model</p> <p> Read the docs</p> </li> </ul>"},{"location":"_footer/","title":"footer","text":"<ul> <li> <p> Something broken?  Report bug</p> </li> <li> <p> Something unclear?  Ask question</p> </li> </ul>"},{"location":"leaderboard/","title":"Leaderboard","text":"<p> date author model score 240712 scicode gpt4 0.8 240712 scicode gpt4o 0.8 <p></p> <p>How to submit</p> <p>Want to submit your own model? Head over to the documentation.</p>"},{"location":"leaderboard_table/","title":"Leaderboard table","text":"date author model score 240712 scicode gpt4 0.8 240712 scicode gpt4o 0.8"}]}
diff --git a/sitemap.xml.gz b/sitemap.xml.gz

Original file line number	Diff line number	Diff line change
`@@ -1 +1 @@`
`1`		-{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"SciCode: A Research Coding Benchmark Curated by Scientists","text":"<p> Minyang Tian<sup>1,2\u2021</sup>, Luyu Gao<sup>3</sup>, Shizhuo Dylan Zhang<sup>1</sup>, Xinan Chen<sup>1\u2020</sup>, Cunwei Fan<sup>1\u2020</sup>, Xuefei Guo<sup>1\u2020</sup>, Roland Haas<sup>1\u2020</sup>, Pan Ji<sup>4\u2020</sup>, Kittithat Krongchon<sup>1\u2020</sup>, Yao Li<sup>1\u2020</sup>, Shengyan Liu<sup>1\u2020</sup>, Di Luo<sup>5,6,11\u2020</sup>, Yutao Ma<sup>7\u2020</sup>, Hao Tong<sup>1\u2020</sup>, Kha Trinh<sup>7\u2020</sup>, Chenyu Tian<sup>8\u2020</sup>, Zihan Wang<sup>1\u2020</sup>, Bohao Wu<sup>1\u2020</sup>, Yanyu Xiong<sup>9\u2020</sup>, Shengzhu Yin<sup>1\u2020</sup>, Minhui Zhu<sup>1\u2020</sup>, Kilian Lieret<sup>10</sup>, Yanxin Lu<sup>1</sup>, Genglin Liu<sup>1</sup>, Yufeng Du<sup>1</sup>, Tianhua Tao<sup>1</sup>, Ofir Press<sup>10</sup>, Jamie Callan<sup>3</sup>, Eliu Huerta<sup>1,2,7\u2021</sup>, Hao Peng<sup>1\u2021</sup> </p> <p> <sup>1</sup>University of Illinois Urbana-Champaign <sup>2</sup>Argonne National Laboratory <sup>3</sup>Carnegie Mellon University <sup>4</sup>University of North Carolina at Chapel Hill <sup>5</sup>Massachusetts Institute of Technology <sup>6</sup>Harvard University <sup>7</sup>University of Chicago <sup>8</sup>University of Texas at Austin <sup>9</sup>Stanford University <sup>10</sup>Princeton University <sup>11</sup>The NSF AI Institute for Artificial Intelligence and Fundamental Interactions </p> <p> * Equal contribution lead authors. \u2020 Data curation, alphabetical order. \u2021 Corresponding to: {mtian8, haopeng}@illinois.edu, [email protected] </p>"},{"location":"#introduction","title":"Introduction","text":"<p>SciCode is a newly developed benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of 6 domains: Physics, Math, Material Science, Biology, and Chemistry. They span 16 diverse natural science sub-fields. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation.</p>"},{"location":"#overview","title":"Overview","text":"<ul> <li> <p> Leaderboard</p> <p>How good are LMs at science, really?</p> <p> Browse the results</p> </li> <li> <p> Paper</p> <p>Learn all the details</p> <p> Read the paper</p> </li> </ul> <ul> <li> <p> Installation & usage</p> <p>Learn how to evaluate your model</p> <p> Read the docs</p> </li> </ul>"},{"location":"_footer/","title":"footer","text":"<ul> <li> <p> Something broken? Report bug</p> </li> <li> <p> Something unclear? Ask question</p> </li> </ul>"},{"location":"leaderboard/","title":"Leaderboard","text":"<p> date author model score 240712 scicode gpt4 0.8 240712 scicode gpt4o 0.8 <p></p> <p>How to submit</p> <p>Want to submit your own model? Head over to the documentation.</p>"},{"location":"leaderboard_table/","title":"Leaderboard table","text":"date author model score 240712 scicode gpt4 0.8 240712 scicode gpt4o 0.8"}]}
	`1`	+{"config":{"lang":["en"],"separator":"[\\s\\-]+","pipeline":["stopWordFilter"]},"docs":[{"location":"","title":"SciCode: A Research Coding Benchmark Curated by Scientists","text":"<p> Minyang Tian<sup>1,2\u2021</sup>, Luyu Gao<sup>3</sup>, Shizhuo Dylan Zhang<sup>1</sup>, Xinan Chen<sup>1\u2020</sup>, Cunwei Fan<sup>1\u2020</sup>, Xuefei Guo<sup>1\u2020</sup>, Roland Haas<sup>1\u2020</sup>, Pan Ji<sup>4\u2020</sup>, Kittithat Krongchon<sup>1\u2020</sup>, Yao Li<sup>1\u2020</sup>, Shengyan Liu<sup>1\u2020</sup>, Di Luo<sup>5,6,11\u2020</sup>, Yutao Ma<sup>7\u2020</sup>, Hao Tong<sup>1\u2020</sup>, Kha Trinh<sup>7\u2020</sup>, Chenyu Tian<sup>8\u2020</sup>, Zihan Wang<sup>1\u2020</sup>, Bohao Wu<sup>1\u2020</sup>, Yanyu Xiong<sup>9\u2020</sup>, Shengzhu Yin<sup>1\u2020</sup>, Minhui Zhu<sup>1\u2020</sup>, Kilian Lieret<sup>10</sup>, Yanxin Lu<sup>1</sup>, Genglin Liu<sup>1</sup>, Yufeng Du<sup>1</sup>, Tianhua Tao<sup>1</sup>, Ofir Press<sup>10</sup>, Jamie Callan<sup>3</sup>, Eliu Huerta<sup>1,2,7\u2021</sup>, Hao Peng<sup>1\u2021</sup> </p> <p> <sup>1</sup>University of Illinois Urbana-Champaign <sup>2</sup>Argonne National Laboratory <sup>3</sup>Carnegie Mellon University <sup>4</sup>University of North Carolina at Chapel Hill <sup>5</sup>Massachusetts Institute of Technology <sup>6</sup>Harvard University <sup>7</sup>University of Chicago <sup>8</sup>University of Texas at Austin <sup>9</sup>Stanford University <sup>10</sup>Princeton University <sup>11</sup>The NSF AI Institute for Artificial Intelligence and Fundamental Interactions </p> <p> * Equal contribution lead authors. \u2020 Data curation, alphabetical order. \u2021 Corresponding to: {mtian8, haopeng}@illinois.edu, [email protected] </p>"},{"location":"#introduction","title":"Introduction","text":"<p>SciCode is a newly developed benchmark designed to evaluate the capabilities of language models (LMs) in generating code for solving realistic scientific research problems. It has a diverse coverage of 6 domains: Physics, Math, Material Science, Biology, and Chemistry. They span 16 diverse natural science sub-fields. Unlike previous benchmarks that consist of question-answer pairs, SciCode problems naturally factorize into multiple subproblems, each involving knowledge recall, reasoning, and code synthesis. In total, SciCode contains 338 subproblems decomposed from 80 challenging main problems, and it offers optional descriptions specifying useful scientific background information and scientist-annotated gold-standard solutions and test cases for evaluation. Claude3.5-Sonnet, the best-performing model among those tested, can solve only 4.6% of the problems in the most realistic setting. </p>"},{"location":"#overview","title":"Overview","text":"Fields Subfields Mathematics Numerical Linear Algebra (7), Computational Mechanics (6), Computational Finance (1) Physics Condensed Matter Physics (13), Optics (10), Quantum Information/Computing (6), Computational Physics (5), Astrophysics (2), Particle Physics (1) Chemistry Quantum Chemistry (5), Computational Chemistry (3) Biology Ecology (6), Biochemistry (1), Genetics (1) Material Science Semiconductor Materials (7), Molecular Modeling (6) <ul> <li> <p> Leaderboard</p> <p>How good are LMs at science, really?</p> <p> Browse the results</p> </li> <li> <p> Paper</p> <p>Learn all the details</p> <p> Read the paper</p> </li> </ul> <ul> <li> <p> Installation & usage</p> <p>Learn how to evaluate your model</p> <p> Read the docs</p> </li> </ul>"},{"location":"_footer/","title":"footer","text":"<ul> <li> <p> Something broken? Report bug</p> </li> <li> <p> Something unclear? Ask question</p> </li> </ul>"},{"location":"leaderboard/","title":"Leaderboard","text":"<p> date author model score 240712 scicode gpt4 0.8 240712 scicode gpt4o 0.8 <p></p> <p>How to submit</p> <p>Want to submit your own model? Head over to the documentation.</p>"},{"location":"leaderboard_table/","title":"Leaderboard table","text":"date author model score 240712 scicode gpt4 0.8 240712 scicode gpt4o 0.8"}]}