Skip to content

BUG: read_stata ignores columns parameter and dtypes of empty dta files #46240

Closed
@sterlinm

Description

@sterlinm

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# create an empty DataFrame with int64 and float64 dtypes
df = pd.DataFrame(data={"a": range(3), "b": [1.0, 2.0, 3.0]}).head(0)

# write to Stata .dta file
df.to_stata('empty.dta', write_index=False, version=117)

# read one column of empty .dta file
df2 = pd.read_stata('empty.dta', columns=["a"])

# show dtypes of df2
df2.dtypes

Issue Description

A stata .dta file with zero rows still has type information, but when you try to read an empty .dta file using pd.read_stata all of the columns have object dtype. It will also ignore the columns parameter and read all of the columns.

Expected Behavior

In the above example df2.dtypes should return:

In [2]: df2.dtypes
Out[2]:
a    object
b    object
dtype: object

Installed Versions

Apologies, pd.show_versions() fails for some reason. I've included it, but the pandas version is 1.4.1.

In [5]: pd.__version__
Out[5]: '1.4.1'
In [3]: pd.show_versions()
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Input In [3], in <module>
----> 1 pd.show_versions()

File ~/mambaforge/envs/py310/lib/python3.10/site-packages/pandas/util/_print_versions.py:109, in show_versions(as_json)
     94 """
     95 Provide useful information, important for bug reports.
     96
   (...)
    106     * If True, outputs info in JSON format to the console.
    107 """
    108 sys_info = _get_sys_info()
--> 109 deps = _get_dependency_info()
    111 if as_json:
    112     j = {"system": sys_info, "dependencies": deps}

File ~/mambaforge/envs/py310/lib/python3.10/site-packages/pandas/util/_print_versions.py:88, in _get_dependency_info()
     86 result: dict[str, JSONSerializable] = {}
     87 for modname in deps:
---> 88     mod = import_optional_dependency(modname, errors="ignore")
     89     result[modname] = get_version(mod) if mod else None
     90 return result

File ~/mambaforge/envs/py310/lib/python3.10/site-packages/pandas/compat/_optional.py:126, in import_optional_dependency(name, extra, errors, min_version)
    121 msg = (
    122     f"Missing optional dependency '{install_name}'. {extra} "
    123     f"Use pip or conda to install {install_name}."
    124 )
    125 try:
--> 126     module = importlib.import_module(name)
    127 except ImportError:
    128     if errors == "raise":

File ~/mambaforge/envs/py310/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    124             break
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File <frozen importlib._bootstrap>:1050, in _gcd_import(name, package, level)

File <frozen importlib._bootstrap>:1027, in _find_and_load(name, import_)

File <frozen importlib._bootstrap>:1006, in _find_and_load_unlocked(name, import_)

File <frozen importlib._bootstrap>:688, in _load_unlocked(spec)

File <frozen importlib._bootstrap_external>:883, in exec_module(self, module)

File <frozen importlib._bootstrap>:241, in _call_with_frames_removed(f, *args, **kwds)

File ~/mambaforge/envs/py310/lib/python3.10/site-packages/setuptools/__init__.py:8, in <module>
      5 import os
      6 import re
----> 8 import _distutils_hack.override  # noqa: F401
     10 import distutils.core
     11 from distutils.errors import DistutilsOptionError

File ~/mambaforge/envs/py310/lib/python3.10/site-packages/_distutils_hack/override.py:1, in <module>
----> 1 __import__('_distutils_hack').do_override()

File ~/mambaforge/envs/py310/lib/python3.10/site-packages/_distutils_hack/__init__.py:72, in do_override()
     70 if enabled():
     71     warn_distutils_present()
---> 72     ensure_local_distutils()

File ~/mambaforge/envs/py310/lib/python3.10/site-packages/_distutils_hack/__init__.py:59, in ensure_local_distutils()
     57 # check that submodules load as expected
     58 core = importlib.import_module('distutils.core')
---> 59 assert '_distutils' in core.__file__, core.__file__
     60 assert 'setuptools._distutils.log' not in sys.modules

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions