Skip to content

Commit 713b3d3

Browse files
committed
Updated doc/source/user_guide/io.rst with new usage.
Signed-off-by: Ronald Barnes <[email protected]>
1 parent 997e2e9 commit 713b3d3

File tree

1 file changed

+45
-28
lines changed

1 file changed

+45
-28
lines changed

doc/source/user_guide/io.rst

Lines changed: 45 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1373,8 +1373,7 @@ Files with fixed width columns
13731373

13741374
While :func:`read_csv` reads delimited data, the :func:`read_fwf` function works
13751375
with data files that have known and fixed column widths. The function parameters
1376-
to ``read_fwf`` are largely the same as ``read_csv`` with two extra parameters, and
1377-
a different usage of the ``delimiter`` parameter:
1376+
to ``read_fwf`` are largely the same as ``read_csv`` with four extra parameters:
13781377

13791378
* ``colspecs``: A list of pairs (tuples) giving the extents of the
13801379
fixed-width fields of each line as half-open intervals (i.e., [from, to[ ).
@@ -1383,47 +1382,65 @@ a different usage of the ``delimiter`` parameter:
13831382
behavior, if not specified, is to infer.
13841383
* ``widths``: A list of field widths which can be used instead of 'colspecs'
13851384
if the intervals are contiguous.
1386-
* ``delimiter``: Characters to consider as filler characters in the fixed-width file.
1387-
Can be used to specify the filler character of the fields
1388-
if it is not spaces (e.g., '~').
1385+
* ``keep_whitespace``: A boolean (default True) for explicit handling of whitespace
1386+
from fields / columns.
1387+
* ``whitespace_chars``: A string of characters to treat as whitespace when
1388+
``keep_whitespace`` is False. Defaults to [space] and [tab] characters.
13891389

13901390
Consider a typical fixed-width data file:
13911391

13921392
.. ipython:: python
13931393
13941394
data1 = (
1395-
"id8141 360.242940 149.910199 11950.7\n"
1396-
"id1594 444.953632 166.985655 11788.4\n"
1397-
"id1849 364.136849 183.628767 11806.2\n"
1398-
"id1230 413.836124 184.375703 11916.8\n"
1399-
"id1948 502.953953 173.237159 12468.3"
1395+
"Amy BBYBC 38BC1052AF____test_1_____\n"
1396+
"Bob VANBC 7290603ED _ _test_2__ _ \n"
1397+
"ChrisVICBC 0005473D1B N/A \n"
1398+
"Dave KAMBC 315395AC $150.00\n"
14001399
)
1401-
with open("bar.csv", "w") as f:
1400+
with open("bar.dat", "w") as f:
14021401
f.write(data1)
14031402
14041403
In order to parse this file into a ``DataFrame``, we simply need to supply the
1405-
column specifications to the ``read_fwf`` function along with the file name:
1404+
column specifications (or widths) to the ``read_fwf`` function along with the file name:
14061405

14071406
.. ipython:: python
14081407
1409-
# Column specifications are a list of half-intervals
1410-
colspecs = [(0, 6), (8, 20), (21, 33), (34, 43)]
1411-
df = pd.read_fwf("bar.csv", colspecs=colspecs, header=None, index_col=0)
1408+
df = pd.read_fwf("bar.dat",
1409+
# Column specifications are a list of half-intervals
1410+
# colspecs=[(0,5), (5, 8), (8,10), (11,22), (22,37)],
1411+
widths=[5,3,2,12,15],
1412+
names=["fname", "city", "prov", "month_$_flags","test_whitespace"],
1413+
header=None,
1414+
index_col=None,
1415+
## Do not convert "N/A" to NaN:
1416+
keep_default_na=False,
1417+
)
14121418
df
1419+
df.values
14131420
1414-
Note how the parser automatically picks column names X.<column number> when
1415-
``header=None`` argument is specified. Alternatively, you can supply just the
1416-
column widths for contiguous columns:
1421+
Note the ``names`` are used as column names, however the column names can be
1422+
retrieved from the first row of data with the ``header=0`` option.
1423+
Otherwise, the parser automatically assigns column numbers as column names.
14171424

1418-
.. ipython:: python
1425+
Also note the whitespace has been preserved inside the fields. To remove whitespace
1426+
from the beginning and ending of fields, use ``keep_whitespace=False`` and, optionally
1427+
specify ``whitespace_chars`` if other than default ([space] and [tab] characters):
14191428

1420-
# Widths are a list of integers
1421-
widths = [6, 14, 13, 10]
1422-
df = pd.read_fwf("bar.csv", widths=widths, header=None)
1423-
df
1429+
.. ipython:: python
14241430
1425-
The parser will take care of extra white spaces around the columns
1426-
so it's ok to have extra separation between the columns in the file.
1431+
df = pd.read_fwf("bar.dat",
1432+
# Column specifications are a list of half-intervals
1433+
# colspecs=[(0,5), (5, 8), (8,10), (11,22), (22,37)],
1434+
widths=[5,3,2,12,15],
1435+
names=["fname", "city", "prov", "month_$_flags","test_whitespace"],
1436+
header=None,
1437+
index_col=None,
1438+
## Do not convert "N/A" to NaN:
1439+
keep_default_na=False,
1440+
keep_whitespace=False,
1441+
whitespace_chars=" _",
1442+
)
1443+
df.values
14271444
14281445
By default, ``read_fwf`` will try to infer the file's ``colspecs`` by using the
14291446
first 100 rows of the file. It can do it only in cases when the columns are
@@ -1432,16 +1449,16 @@ is whitespace).
14321449

14331450
.. ipython:: python
14341451
1435-
df = pd.read_fwf("bar.csv", header=None, index_col=0)
1452+
df = pd.read_fwf("bar.dat", header=None, index_col=0)
14361453
df
14371454
14381455
``read_fwf`` supports the ``dtype`` parameter for specifying the types of
14391456
parsed columns to be different from the inferred type.
14401457

14411458
.. ipython:: python
14421459
1443-
pd.read_fwf("bar.csv", header=None, index_col=0).dtypes
1444-
pd.read_fwf("bar.csv", header=None, dtype={2: "object"}).dtypes
1460+
pd.read_fwf("bar.dat", header=None, index_col=0).dtypes
1461+
pd.read_fwf("bar.dat", header=None, dtype={2: "object"}).dtypes
14451462
14461463
.. ipython:: python
14471464
:suppress:

0 commit comments

Comments
 (0)