Skip to content

AWS_S3_HOST environment variable no longer available #26195

Closed
@jakobkogler

Description

@jakobkogler

In the What's New - Other enhancements section from the 18.0 release it says that you can define the environment variable AWS_S3_HOST.

Therefore, when setting this environment variable, together with AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY I expected the following code to work.

import pandas as pd
df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))
df.to_csv(fs.open('s3://bucket/key', mode='w'))

The script crashes with:

Exception ignored in: <bound method S3File.__del__ of <S3File bucket/key>>
Traceback (most recent call last):
  File "/home/xxx/.local/share/virtualenvs/xxx/lib/python3.6/site-packages/s3fs/core.py", line 1518, in __del__
    self.close()
  File "/home/xxx/.local/share/virtualenvs/xxx/lib/python3.6/site-packages/s3fs/core.py", line 1496, in close
    raise_from(IOError('Write failed: %s' % self.path), e)
  File "<string>", line 3, in raise_from
OSError: Write failed: bucket/key

This is the exact same error as when I don't specify the variable AWS_S3_HOST at all.

I managed to get a working solution by directly interacting directly with s3fs:

import os
import pandas as pd
import s3fs

fs = s3fs.S3FileSystem(client_kwargs={'endpoint_url': os.environ["AWS_S3_HOST"]})
df = pd.DataFrame(dict(a=[1, 2, 3], b=[4, 5, 6]))
df.to_csv(fs.open('s3://bucket/key', mode='w'))

Looks like this feature got (accidentally?) removed in https://github.com/pandas-dev/pandas/commit/dc4b0708f36b971f71890bfdf830d9a5dc019c7b#diff-a37b395bed03f0404dec864a4529c97dL94, when they switched from boto to s3fs.
Is it planned to support this environment variable again?
I'm sure that many companies, e.g. like the one I work for, hosts their own s3 server, e.g. with Minio, and don't use Amazon.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions