Skip to content

pd.stats.api.ols inconsistent estimates #6874

Closed
@edwinhu

Description

@edwinhu

I am running into an issue trying to run OLS using pandas 0.13.1.

Here is a simple example: I want to regress a variable on itself, in this case excess returns. The intercept should be 0, and the coefficient should be 1. pandas provides the wrong estimates, while statsmodels gives the correct estimates.

This is not due to the silly regression specification, as I have noticed the pandas.ols estimates are inconsistent for other specifications as well.

Has anyone else encountered this problem?

import pandas as pd
import statsmodels.formula.api

In [1]: pd.ols(y=test.exret,x=test.exret).beta
Out[1]: 
x            0.003107
intercept    0.006438
dtype: float64

In [2]: sm.ols(formula="exret ~ exret", data=test).fit().params
Out[2]: 
Intercept   -3.469447e-18
exret        1.000000e+00
dtype: float64

In [3]: pd.show_versions()

INSTALLED VERSIONS
------------------
commit: None
python: 2.7.6.final.0
python-bits: 64
OS: Linux
OS-release: 3.11.0-19-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8

pandas: 0.13.1
Cython: 0.20.1
numpy: 1.8.0
scipy: 0.13.3
statsmodels: 0.5.0
IPython: 2.0.0-dev
sphinx: 1.2.1
patsy: 0.2.1
scikits.timeseries: None
dateutil: 1.5
pytz: 2013b
bottleneck: None
tables: 3.1.0
numexpr: 2.3.1
matplotlib: 1.3.1
openpyxl: 1.8.2
xlrd: 0.9.2
xlwt: 0.7.5
xlsxwriter: 0.5.2
sqlalchemy: 0.9.2
lxml: 3.3.1
bs4: 4.3.1
html5lib: None
bq: None
apiclient: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions