Skip to content

Pandas IO XML Issue Tracker #40131

Closed
Closed
@ParfaitG

Description

@ParfaitG

Issue tracking for new pandas.io.xml module (after merge: #39516):

To-Do

  • BUG: Fix declaration not showing for xml_declaration=True when pretty_print=False for etree parser.
  • CLN: Centralize docstrings to avoid repetition in formats.py and frame.py.
  • CLN: When etree supports pretty_print, remove xml.dom.minidom reliance.
  • TYP: Refactor code for type hints on parse_doc methods for optional dependency, lxml.
  • TST: Add tests for edge cases (ParserError, OSError, URLError, etc. ). See checklist in tests code.
  • TST: Add more tests for storage_options (i.e., read/write to pandas-test S3 bucket).

Enhancements

  • ENH: Add parse_dates and dtype converters similar to other IO methods.
  • ENH: Add support for nullable dtyes in reading and exporting XML.
  • ENH: Add iterparse for memory efficient parsing of large XML. See etree iterparse and lxml iterparse.
  • ENH: Add xpath_vars to pass $ variables in xpath expression. See lxml xpath() method.
  • ENH: Add xsl_params to pass values into XSLT script. See lxml stylesheet parameters.
  • ENH: Add prefix_cols to specify which columns should have namespace prefixes.
  • ENH: Add nested (bool) to write out nested node-sets for data frames with hierarchical columns or multindex.
  • ENH: Add engine for external processors for XPath and XSLT 2.0 and 3.0, XQuery, streaming, and others.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions