Skip to content

Commit 83ef702

Browse files
committed
Add details on implementation options
1 parent c7575c1 commit 83ef702

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

protocol/dataframe_protocol_summary.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -249,8 +249,39 @@ computational graph approach like Dask uses, etc.)._
249249

250250
## Possible direction for implementation
251251

252+
### Rough prototypes
253+
252254
The `cuDFDataFrame`, `cuDFColumn` and `cuDFBuffer` sketched out by @kkraus14
253255
[here](https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685123386)
254256
seems to be in the right direction.
255257

258+
[This prototype](https://github.com/wesm/dataframe-protocol/pull/1) by Wes
259+
McKinney was the first attempt, and has some useful features.
260+
256261
TODO: work this out after making sure we're all on the same page regarding requirements.
262+
263+
264+
### Relevant existing protocols
265+
266+
Here are the four most relevant existing protocols, and what requirements they support:
267+
268+
| *supports* | buffer protocol | `__array_interface__` | DLPack | Arrow C Data Interface |
269+
|---------------------|:---------------:|:---------------------:|:------:|:----------------------:|
270+
| Python API | | Y | Y | |
271+
| C API | Y | Y | Y | Y |
272+
| arrays | Y | Y | Y | Y |
273+
| dataframes | | | | |
274+
| chunking | | | | |
275+
| devices | | | Y | |
276+
| bool/int/uint/float | Y | Y | Y | Y |
277+
| missing data | (1) | (2) | (3) | Y |
278+
| string dtype | (3) | (3) | | Y |
279+
| datetime dtypes | | (4) | | Y |
280+
| categoricals | (5) | (5) | (6) | (5) |
281+
282+
(1) Can be done only via separate masks of boolean arrays.
283+
(2) `__array_interface__` has a `mask` attribute, which is a separate boolean array also implementing the `__array_interface__` protocol.
284+
(3) Only fixed-length strings as sequence of char or unicode.
285+
(4) Only NumPy datetime and timedelta, which are limited compared to what the Arrow format offers.
286+
(5) No explicit support, however categoricals can be mapped to either integers or strings.
287+
(6) No explicit support, categoricals can only be mapped to integers.

0 commit comments

Comments
 (0)