ENH: Add if_sheet_exists parameter to ExcelWriter #40231

mirober · 2021-03-04T19:18:52Z

closes ENH: Allow overwriting existing sheets when appending to excel files #40230
tests added / passed
Ensure all linting tests pass, see here for how to run them
whatsnew entry

pep8speaks · 2021-03-04T19:19:01Z

Hello @mrob95! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-21 19:55:25 UTC

…erwrite_xl

rhshadrach

Thanks for the PR! Looking good, some comments below. Also, can you add test in test_writers for attempting to use if_sheet_exists with engines other than openpyxl.

pandas/io/excel/_openpyxl.py

pandas/tests/io/excel/test_openpyxl.py

pandas/tests/io/excel/test_writers.py

WillAyd · 2021-03-12T22:32:25Z

pandas/io/excel/_base.py

+
+        * new: Create a new sheet, with a name determined by the engine.
+        * replace: Delete the contents of the sheet before writing to it.
+        * overwrite: Write directly to the named sheet


What is the use case for overwrite?

Published statistics, at least in the UK, often have sheets which combine headings, date, description, data quality notes, etc with data tables. To automate something like this you would probably have a pre-written template and then write your data from pandas into specific sheets at specific locations.

An example I happened to be looking at recently, England's daily, weekly and monthly vaccination figures.

@mrob95 - does the current implementation in this PR overwrite the template formatting? E.g. if a column in the template is formatted to percent and I have a DataFrame with 0.5, will it be displayed in excel as 50%?

@rhshadrach It doesn't overwrite cell formatting unless there is an alternative style set. E.g. df.style.set_properties(**{"number-format": "0.00%"}) will overwrite number formatting but otherwise the written cells inherit the previous formatting, including conditional formatting.

The only exception to this I can see is headers and indexes, which have a hardcoded style which I think will always overwrite certain formats (see def header_style in io/formats/excel.py). This may not be ideal for certain use cases but seems like a separate issue, about which there is already discussion (#25185).

Between this and pandas own excel formatting options I think the options are pretty good for styling tables written using if_sheet_exists="overwrite".

…erwrite_xl

jreback · 2021-03-15T13:58:27Z

pandas/tests/io/excel/test_openpyxl.py

+    # GH 40230
+    df = DataFrame({"fruit": ["pear"]})
+    with tm.ensure_clean(ext) as f:
+        with pytest.raises(ValueError, match=re.escape(msg)):


what about other engines?

Openpyxl is the only engine which supports append mode currently, and this option only affects append mode

ok, so we completely ignore on other engines is fine. is there a reason to raise / warn in that case if its not None?

As it's written at the moment passing if_sheet_exists in write mode will raise an error with all engines (mostly as a feedback mechanism for the user)

jreback · 2021-03-15T13:59:23Z

pandas/io/excel/_base.py

+        * replace: Delete the contents of the sheet before writing to it.
+        * overwrite: Write directly to the named sheet
+          without deleting the previous contents.
+        * fail: raise a ValueError.


rename to 'error'

jreback · 2021-03-15T13:59:35Z

pandas/io/excel/_base.py

@@ -667,6 +667,17 @@ class ExcelWriter(metaclass=abc.ABCMeta):
        be parsed by ``fsspec``, e.g., starting "s3://", "gcs://".

        .. versionadded:: 1.2.0
+    if_sheet_exists : {'new', 'replace', 'overwrite', 'fail'}, default 'new'


why is the default not 'error'?

Mainly to preserve the current behaviour as the default. Happy to change

pandas/io/excel/_base.py

WillAyd · 2021-03-18T13:45:24Z

Not sure - it might be better as it’s own library

…

Sent from my iPhone

On Mar 18, 2021, at 3:02 AM, Richard Shadrach ***@***.***> wrote: @rhshadrach commented on this pull request. In pandas/io/excel/_base.py: > @@ -667,6 +667,17 @@ class ExcelWriter(metaclass=abc.ABCMeta): be parsed by ``fsspec``, e.g., starting "s3://", "gcs://". .. versionadded:: 1.2.0 + if_sheet_exists : {'new', 'replace', 'overwrite', 'fail'}, default 'new' + How to behave when trying to write to a sheet that already + exists (append mode only). + + * new: Create a new sheet, with a name determined by the engine. + * replace: Delete the contents of the sheet before writing to it. We also don't support overwrite with any other type of IO methods I don't believe other IO methods have the use case of writing to an existing template for formatting purposes, which is what I see as being the primary case here. and to your point above there is a lot of ambiguity as to what "overwrite" should actually overwrite That was not my point, I merely wanted to make sure existing formatting was preserved if the DataFrame was not formatted and indeed that is the case. Not certain what you find unclear. or if it is something we discuss more and want to include in pandas we I think would just want to spend a little bit more time going over the API Other than what name to call this, what API choices are there? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

mirober · 2021-04-03T11:22:06Z

Hi @WillAyd @jreback @rhshadrach apologies for the delay on this. How would you like me to proceed with this PR? Propose three options:

Remove the overwrite option as discussed above
Keep all of the current options
Rethink the PR. I can see that this parameter might seem too specific, wordy or confusing. A smaller change would be to resurrect BUG: update dict of sheets before check #27730, essentially changing the default behaviour of the openpyxl append mode from "new" to "overwrite". I think this is closer to what most people want to do with append mode, and would make it easier to implement the other behaviours in user code.

Happy to hear any thoughts

rhshadrach · 2021-04-08T03:45:00Z

@mrob95 - I wonder if this should be a part of the engine_kwargs argument. We'd need to intercept and react to the argument there, and the docstring would need to be updated with a Notes section detailing this. On the one hand, it saves us from adding a new argument. But I also wonder if it might create confusion. Do users expect engine_kwargs to always be passed through to the engine?

cc @jreback and @WillAyd for any thoughts on this.

jreback · 2021-04-12T14:56:36Z

@mrob95 - I wonder if this should be a part of the engine_kwargs argument. We'd need to intercept and react to the argument there, and the docstring would need to be updated with a Notes section detailing this. On the one hand, it saves us from adding a new argument. But I also wonder if it might create confusion. Do users expect engine_kwargs to always be passed through to the engine?

cc @jreback and @WillAyd for any thoughts on this.

the point of the engine_kwargs is to pass them thru always, so not sure what else one would expect here.

On the subject of this PR. The problem i have with overwrite is that this is a mutating method, whereas we don't have this anywhere else. Why is creating a new sheet a burden?

mirober · 2021-04-13T18:36:03Z

I'll remove overwrite. Writing data in raw form to a back sheet and then referencing it is probably the best way to handle the kind of "data presentation" issues discussed above, and replace allows that.

rhshadrach · 2021-04-13T21:11:23Z

@jreback

Why is creating a new sheet a burden?

For me, a use case is to write data to an existing excel sheet (a template) that already has a format. In this way, the result is automatically formatted. I've used this several times, and see people desiring to do so on SO.

Since there is opposition to this feature, it seems like @mrob95's solution is a good compromise, at least when one has control of the template sheet (works for my use case).

rhshadrach

lgtm

jreback · 2021-04-20T23:25:38Z

pandas/io/excel/_base.py

+            )
+        if if_sheet_exists and "r+" not in mode:
+            raise ValueError("if_sheet_exists is only valid in append mode (mode='a')")
+        if if_sheet_exists is None and "r+" in mode:


why is the r+ condition here? (e.g. then if_sheet_exists) should always be a str at this point, right? (if its being written will be ignored anyhow), right?

i guess not a big deal

You're right, fixed

jreback · 2021-04-20T23:27:23Z

pandas/tests/io/excel/test_openpyxl.py

+    # GH 40230
+    df = DataFrame({"fruit": ["pear"]})
+    with tm.ensure_clean(ext) as f:
+        with pytest.raises(ValueError, match=re.escape(msg)):


ok, so we completely ignore on other engines is fine. is there a reason to raise / warn in that case if its not None?

jreback · 2021-04-20T23:27:55Z

looks good, can you merge master and a couple of questions (really not a big deal), but an edge case.

jreback

lgtm. @rhshadrach over to you

Thanks @mrob95!

mirober force-pushed the overwrite_xl branch from 8e688d4 to fe60c2a Compare March 4, 2021 19:21

mirober changed the title ~~FEAT: Add if_exists parameter to ExcelWriter~~ ENH: Add if_exists parameter to ExcelWriter Mar 4, 2021

ENH: Add if_exists parameter to ExcelWriter

4b73d6a

mirober force-pushed the overwrite_xl branch from 395b2d9 to 4b73d6a Compare March 4, 2021 22:17

jreback added the IO Excel read_excel, to_excel label Mar 5, 2021

mirober added 6 commits March 5, 2021 19:17

Merge branch 'master' of https://github.com/pandas-dev/pandas into ov…

5bce9ac

…erwrite_xl

Rename parameter to if_sheet_exists

e8716c7

Merge branch 'master' of https://github.com/pandas-dev/pandas into ov…

7d4e93d

…erwrite_xl

Update avoid -> new in ExcelWriter if_sheet_exists

71104f6

use assert_frame_equal instead of result.equals

ae5b385

fix mypy error

6baec67

mirober changed the title ~~ENH: Add if_exists parameter to ExcelWriter~~ ENH: Add if_sheet_exists parameter to ExcelWriter Mar 7, 2021

rhshadrach requested changes Mar 7, 2021

View reviewed changes

mirober mentioned this pull request Mar 7, 2021

BUG: Inconsistent behaviour between excel engines when writing multiple times using the same ExcelWriter #40289

Open

3 tasks

mirober added 5 commits March 8, 2021 19:27

Update docstring to better describe 'new' behaviour

3a2a54f

Parameterize test

fcbab05

Add raises test for other engines

5bcd6e3

More detailed error message

47527c6

Remove redundant code, delete sheet when replacing

060d754

lithomas1 reviewed Mar 9, 2021

View reviewed changes

pandas/tests/io/excel/test_writers.py Outdated Show resolved Hide resolved

Tidy up raise tests

d8f0be1

WillAyd reviewed Mar 12, 2021

View reviewed changes

mirober added 2 commits March 13, 2021 16:57

Merge branch 'master' of https://github.com/pandas-dev/pandas into ov…

26511c7

…erwrite_xl

Fix inconsistent namespacing

e31a59e

jreback requested changes Mar 15, 2021

View reviewed changes

mirober added 2 commits March 20, 2021 11:49

Fixed conflicts and test failures

5ed0fb8

Rename fail to error

338b6d5

Merge remote-tracking branch 'upstream/master' into overwrite_xl

5b13015

Remove overwrite option

b2ad90b

rhshadrach approved these changes Apr 20, 2021

View reviewed changes

jreback requested changes Apr 20, 2021

View reviewed changes

mirober added 2 commits April 21, 2021 18:03

Merge branch 'master' into overwrite_xl

320728a

Remove redundant mode check

49c4d3b

jreback added this to the 1.3 milestone Apr 21, 2021

jreback approved these changes Apr 21, 2021

View reviewed changes

rhshadrach added the Enhancement label Apr 22, 2021

rhshadrach merged commit ac8977f into pandas-dev:master Apr 22, 2021

yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request May 6, 2021

ENH: Add if_sheet_exists parameter to ExcelWriter (pandas-dev#40231)

d12cf6c

Thanks @mrob95!

feefladder mentioned this pull request Jun 25, 2021

ENH: add if_sheet_exists='overlay' to ExcelWriter #42222

Merged

4 tasks

JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021

ENH: Add if_sheet_exists parameter to ExcelWriter (pandas-dev#40231)

d9b49fd

Thanks @mrob95!

rhshadrach mentioned this pull request Sep 1, 2021

ENH: Rename ExcelWriter to make clear attributes are not public #43088

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ENH: Add if_sheet_exists parameter to ExcelWriter #40231

ENH: Add if_sheet_exists parameter to ExcelWriter #40231

mirober commented Mar 4, 2021 •

edited

Loading

pep8speaks commented Mar 4, 2021 •

edited

Loading

rhshadrach left a comment

WillAyd Mar 12, 2021

mirober Mar 12, 2021

rhshadrach Mar 15, 2021

mirober Mar 17, 2021

jreback Mar 15, 2021

mirober Mar 17, 2021

jreback Apr 20, 2021

mirober Apr 21, 2021

jreback Mar 15, 2021

mirober Mar 17, 2021

jreback Mar 15, 2021

mirober Mar 17, 2021

WillAyd commented Mar 18, 2021 via email

mirober commented Apr 3, 2021

rhshadrach commented Apr 8, 2021

jreback commented Apr 12, 2021

mirober commented Apr 13, 2021

rhshadrach commented Apr 13, 2021

rhshadrach left a comment

jreback Apr 20, 2021

jreback Apr 20, 2021

mirober Apr 21, 2021

jreback Apr 20, 2021

jreback commented Apr 20, 2021

jreback left a comment

ENH: Add if_sheet_exists parameter to ExcelWriter #40231

ENH: Add if_sheet_exists parameter to ExcelWriter #40231

Conversation

mirober commented Mar 4, 2021 • edited Loading

pep8speaks commented Mar 4, 2021 • edited Loading

Comment last updated at 2021-04-21 19:55:25 UTC

rhshadrach left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WillAyd commented Mar 18, 2021 via email

mirober commented Apr 3, 2021

rhshadrach commented Apr 8, 2021

jreback commented Apr 12, 2021

mirober commented Apr 13, 2021

rhshadrach commented Apr 13, 2021

rhshadrach left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Apr 20, 2021

jreback left a comment

Choose a reason for hiding this comment

mirober commented Mar 4, 2021 •

edited

Loading

pep8speaks commented Mar 4, 2021 •

edited

Loading