Skip to content

Commit b465e39

Browse files
committed
Add details on implementation options
1 parent 5f278b3 commit b465e39

File tree

1 file changed

+31
-0
lines changed

1 file changed

+31
-0
lines changed

protocol/dataframe_protocol_summary.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,8 +255,39 @@ computational graph approach like Dask uses, etc.)._
255255

256256
## Possible direction for implementation
257257

258+
### Rough prototypes
259+
258260
The `cuDFDataFrame`, `cuDFColumn` and `cuDFBuffer` sketched out by @kkraus14
259261
[here](https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685123386)
260262
seems to be in the right direction.
261263

264+
[This prototype](https://github.com/wesm/dataframe-protocol/pull/1) by Wes
265+
McKinney was the first attempt, and has some useful features.
266+
262267
TODO: work this out after making sure we're all on the same page regarding requirements.
268+
269+
270+
### Relevant existing protocols
271+
272+
Here are the four most relevant existing protocols, and what requirements they support:
273+
274+
| *supports* | buffer protocol | `__array_interface__` | DLPack | Arrow C Data Interface |
275+
|---------------------|:---------------:|:---------------------:|:------:|:----------------------:|
276+
| Python API | | Y | Y | |
277+
| C API | Y | Y | Y | Y |
278+
| arrays | Y | Y | Y | Y |
279+
| dataframes | | | | |
280+
| chunking | | | | |
281+
| devices | | | Y | |
282+
| bool/int/uint/float | Y | Y | Y | Y |
283+
| missing data | (1) | (2) | (3) | Y |
284+
| string dtype | (3) | (3) | | Y |
285+
| datetime dtypes | | (4) | | Y |
286+
| categoricals | (5) | (5) | (6) | (5) |
287+
288+
1. Can be done only via separate masks of boolean arrays.
289+
2. `__array_interface__` has a `mask` attribute, which is a separate boolean array also implementing the `__array_interface__` protocol.
290+
3. Only fixed-length strings as sequence of char or unicode.
291+
4. Only NumPy datetime and timedelta, which are limited compared to what the Arrow format offers.
292+
5. No explicit support, however categoricals can be mapped to either integers or strings.
293+
6. No explicit support, categoricals can only be mapped to integers.

0 commit comments

Comments
 (0)