@@ -249,8 +249,39 @@ computational graph approach like Dask uses, etc.)._
249
249
250
250
## Possible direction for implementation
251
251
252
+ ### Rough prototypes
253
+
252
254
The ` cuDFDataFrame ` , ` cuDFColumn ` and ` cuDFBuffer ` sketched out by @kkraus14
253
255
[ here] ( https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685123386 )
254
256
seems to be in the right direction.
255
257
258
+ [ This prototype] ( https://github.com/wesm/dataframe-protocol/pull/1 ) by Wes
259
+ McKinney was the first attempt, and has some useful features.
260
+
256
261
TODO: work this out after making sure we're all on the same page regarding requirements.
262
+
263
+
264
+ ### Relevant existing protocols
265
+
266
+ Here are the four most relevant existing protocols, and what requirements they support:
267
+
268
+ | * supports* | buffer protocol | ` __array_interface__ ` | DLPack | Arrow C Data Interface |
269
+ | ---------------------| :---------------:| :---------------------:| :------:| :----------------------:|
270
+ | Python API | | Y | Y | |
271
+ | C API | Y | Y | Y | Y |
272
+ | arrays | Y | Y | Y | Y |
273
+ | dataframes | | | | |
274
+ | chunking | | | | |
275
+ | devices | | | Y | |
276
+ | bool/int/uint/float | Y | Y | Y | Y |
277
+ | missing data | (1) | (2) | (3) | Y |
278
+ | string dtype | (3) | (3) | | Y |
279
+ | datetime dtypes | | (4) | | Y |
280
+ | categoricals | (5) | (5) | (6) | (5) |
281
+
282
+ (1) Can be done only via separate masks of boolean arrays.
283
+ (2) ` __array_interface__ ` has a ` mask ` attribute, which is a separate boolean array also implementing the ` __array_interface__ ` protocol.
284
+ (3) Only fixed-length strings as sequence of char or unicode.
285
+ (4) Only NumPy datetime and timedelta, which are limited compared to what the Arrow format offers.
286
+ (5) No explicit support, however categoricals can be mapped to either integers or strings.
287
+ (6) No explicit support, categoricals can only be mapped to integers.
0 commit comments