Pandas IO XML Issue Tracker

### Issue tracking for new pandas.io.xml module (after merge: #39516):

**To-Do**

- [x] BUG: Fix declaration not showing for `xml_declaration=True` when `pretty_print=False` for etree parser.
- [x] CLN: Centralize docstrings to avoid repetition in `formats.py` and `frame.py`.
- [ ] CLN: When `etree` supports `pretty_print`, remove `xml.dom.minidom` reliance.
- [x] TYP: Refactor code for type hints on `parse_doc` methods for optional dependency, `lxml`.
- [ ] TST: Add tests for edge cases (`ParserError`, `OSError`, `URLError`, etc. ). See checklist in tests code.
- [ ] TST: Add more tests for `storage_options` (i.e., read/write to `pandas-test` S3 bucket).

**Enhancements**

- [X] ENH: Add `parse_dates` and `dtype` converters similar to other IO methods. 
- [X] ENH: Add support for nullable dtyes in reading and exporting XML.
- [X] ENH: Add `iterparse` for memory efficient parsing of large XML. See [etree iterparse](https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse) and  [lxml iterparse](https://lxml.de/3.2/parsing.html#iterparse-and-iterwalk).
- [ ] ENH: Add `xpath_vars` to pass `$` variables in `xpath` expression. See [lxml xpath() method](https://lxml.de/xpathxslt.html#the-xpath-method).
- [ ] ENH: Add `xsl_params` to pass values into XSLT script. See [lxml stylesheet parameters](https://lxml.de/xpathxslt.html#stylesheet-parameters).
- [ ] ENH: Add `prefix_cols` to specify which columns should have namespace prefixes.
- [ ] ENH: Add `nested` (bool) to write out nested node-sets for data frames with hierarchical columns or multindex.
- [ ] ENH: Add `engine` for external processors for XPath and XSLT 2.0 and 3.0, XQuery, streaming, and others.







Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pandas IO XML Issue Tracker #40131

Issue tracking for new pandas.io.xml module (after merge: #39516):

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pandas IO XML Issue Tracker #40131

Description

Issue tracking for new pandas.io.xml module (after merge: #39516):

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions