Updated doc/source/user_guide/io.rst with new usage.

RonaldBarnes · RonaldBarnes · commit 713b3d38527a · 2023-01-27T00:09:24.000-08:00
Signed-off-by: Ronald Barnes &lt;ron@ronaldbarnes.ca&gt;
diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst
@@ -1373,8 +1373,7 @@ Files with fixed width columns
 
 While :func:`read_csv` reads delimited data, the :func:`read_fwf` function works
 with data files that have known and fixed column widths. The function parameters
-to ``read_fwf`` are largely the same as ``read_csv`` with two extra parameters, and
-a different usage of the ``delimiter`` parameter:
+to ``read_fwf`` are largely the same as ``read_csv`` with four extra parameters:
 
 * ``colspecs``: A list of pairs (tuples) giving the extents of the
   fixed-width fields of each line as half-open intervals (i.e.,  [from, to[ ).
@@ -1383,47 +1382,65 @@ a different usage of the ``delimiter`` parameter:
   behavior, if not specified, is to infer.
 * ``widths``: A list of field widths which can be used instead of 'colspecs'
   if the intervals are contiguous.
-* ``delimiter``: Characters to consider as filler characters in the fixed-width file.
-  Can be used to specify the filler character of the fields
-  if it is not spaces (e.g., '~').
+* ``keep_whitespace``: A boolean (default True) for explicit handling of whitespace
+  from fields / columns.
+* ``whitespace_chars``: A string of characters to treat as whitespace when
+  ``keep_whitespace`` is False. Defaults to [space] and [tab] characters.
 
 Consider a typical fixed-width data file:
 
 .. ipython:: python
 
    data1 = (
-       "id8141    360.242940   149.910199   11950.7\n"
-       "id1594    444.953632   166.985655   11788.4\n"
-       "id1849    364.136849   183.628767   11806.2\n"
-       "id1230    413.836124   184.375703   11916.8\n"
-       "id1948    502.953953   173.237159   12468.3"
+        "Amy  BBYBC  38BC1052AF____test_1_____\n"
+        "Bob  VANBC   7290603ED _ _test_2__ _ \n"
+        "ChrisVICBC  0005473D1B      N/A      \n"
+        "Dave KAMBC    315395AC        $150.00\n"
    )
-   with open("bar.csv", "w") as f:
+   with open("bar.dat", "w") as f:
        f.write(data1)
 
 In order to parse this file into a ``DataFrame``, we simply need to supply the
-column specifications to the ``read_fwf`` function along with the file name:
+column specifications (or widths) to the ``read_fwf`` function along with the file name:
 
 .. ipython:: python
 
-   # Column specifications are a list of half-intervals
-   colspecs = [(0, 6), (8, 20), (21, 33), (34, 43)]
-   df = pd.read_fwf("bar.csv", colspecs=colspecs, header=None, index_col=0)
+   df = pd.read_fwf("bar.dat",
+        # Column specifications are a list of half-intervals
+        # colspecs=[(0,5), (5, 8), (8,10), (11,22), (22,37)],
+        widths=[5,3,2,12,15],
+        names=["fname", "city", "prov", "month_$_flags","test_whitespace"],
+        header=None,
+        index_col=None,
+        ## Do not convert "N/A" to NaN:
+        keep_default_na=False,
+        )
    df
+   df.values
 
-Note how the parser automatically picks column names X.<column number> when
-``header=None`` argument is specified. Alternatively, you can supply just the
-column widths for contiguous columns:
+Note the ``names`` are used as column names, however the column names can be
+retrieved from the first row of data with the ``header=0`` option.
+Otherwise, the parser automatically assigns column numbers as column names.
 
-.. ipython:: python
+Also note the whitespace has been preserved inside the fields.  To remove whitespace
+from the beginning and ending of fields, use ``keep_whitespace=False`` and, optionally
+specify ``whitespace_chars`` if other than default ([space] and [tab] characters):
 
-   # Widths are a list of integers
-   widths = [6, 14, 13, 10]
-   df = pd.read_fwf("bar.csv", widths=widths, header=None)
-   df
+.. ipython:: python
 
-The parser will take care of extra white spaces around the columns
-so it's ok to have extra separation between the columns in the file.
+   df = pd.read_fwf("bar.dat",
+        # Column specifications are a list of half-intervals
+        # colspecs=[(0,5), (5, 8), (8,10), (11,22), (22,37)],
+        widths=[5,3,2,12,15],
+        names=["fname", "city", "prov", "month_$_flags","test_whitespace"],
+        header=None,
+        index_col=None,
+        ## Do not convert "N/A" to NaN:
+        keep_default_na=False,
+        keep_whitespace=False,
+        whitespace_chars=" _",
+        )
+   df.values
 
 By default, ``read_fwf`` will try to infer the file's ``colspecs`` by using the
 first 100 rows of the file. It can do it only in cases when the columns are
@@ -1432,16 +1449,16 @@ is whitespace).
 
 .. ipython:: python
 
-   df = pd.read_fwf("bar.csv", header=None, index_col=0)
+   df = pd.read_fwf("bar.dat", header=None, index_col=0)
    df
 
 ``read_fwf`` supports the ``dtype`` parameter for specifying the types of
 parsed columns to be different from the inferred type.
 
 .. ipython:: python
 
-   pd.read_fwf("bar.csv", header=None, index_col=0).dtypes
-   pd.read_fwf("bar.csv", header=None, dtype={2: "object"}).dtypes
+   pd.read_fwf("bar.dat", header=None, index_col=0).dtypes
+   pd.read_fwf("bar.dat", header=None, dtype={2: "object"}).dtypes
 
 .. ipython:: python
    :suppress: