Closed
Description
Issue tracking for new pandas.io.xml module (after merge: #39516):
To-Do
- BUG: Fix declaration not showing for
xml_declaration=True
whenpretty_print=False
for etree parser. - CLN: Centralize docstrings to avoid repetition in
formats.py
andframe.py
. - CLN: When
etree
supportspretty_print
, removexml.dom.minidom
reliance. - TYP: Refactor code for type hints on
parse_doc
methods for optional dependency,lxml
. - TST: Add tests for edge cases (
ParserError
,OSError
,URLError
, etc. ). See checklist in tests code. - TST: Add more tests for
storage_options
(i.e., read/write topandas-test
S3 bucket).
Enhancements
- ENH: Add
parse_dates
anddtype
converters similar to other IO methods. - ENH: Add support for nullable dtyes in reading and exporting XML.
- ENH: Add
iterparse
for memory efficient parsing of large XML. See etree iterparse and lxml iterparse. - ENH: Add
xpath_vars
to pass$
variables inxpath
expression. See lxml xpath() method. - ENH: Add
xsl_params
to pass values into XSLT script. See lxml stylesheet parameters. - ENH: Add
prefix_cols
to specify which columns should have namespace prefixes. - ENH: Add
nested
(bool) to write out nested node-sets for data frames with hierarchical columns or multindex. - ENH: Add
engine
for external processors for XPath and XSLT 2.0 and 3.0, XQuery, streaming, and others.