Skip to content

ENH: read_xml handling of bad lines #59384

Open
@davetapley

Description

@davetapley

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

Be able to read_xml and skip non-parseable lines.

E.g.

With:

<gage_rain id="" last_rpt="-999 -999" min_10="-999" min_30="-999" hour_1="-999" hour_3="-999" hour_6="-999" day_1="-999" day_3="-999" day_7="-999" day_30="-999" ytd="-999" null="-999" name="" lat=" -999" long="--999 " updated="2024-07-31 19:40:00" m1="-999" m2="-999" m3="-999" m4="-999" m5="-999" m6="-999" m7="-999" m8="-999" m9="-999" m10="-999" m11="-999" m12="-999"/>
<gage_rain id="470" last_rpt="2024-07-31 11:58:03" min_10="0.00" min_30="0.00" hour_1="0.00" hour_3="0.00" hour_6="0.00" day_1="0.00" day_3="0.00" day_7="0.67" day_30="1.93" ytd="12.25" null="-999" name="Lee Butte Precipitation" lat="34.83403" long="-111.53714" updated="2024-07-31 19:40:00" m1="1.93" m2="0.00" m3="1.45" m4="2.95" m5="1.54" m6="1.97" m7="0.86" m8="0.87" m9="0.00" m10="0.71" m11="2.87" m12="2.44"/>

If I:

dtype = {'id': str, 'lat': pd.Float32Dtype, 'long': pd.Float32Dtype}
df = pd.read_xml('fcdyc_alert_rain.xml', dtype=dtype)

I get:

  File "lib.pyx", line 2391, in pandas._libs.lib.maybe_convert_numeric
ValueError: Unable to parse string "--999 

Feature Description

#15122 but for read_xml

Alternative Solutions

read_xml with no dtype kwarg, and manually manipulate the DataFrame afterwards.

Additional Context

No response

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions