@@ -255,8 +255,39 @@ computational graph approach like Dask uses, etc.)._
255
255
256
256
## Possible direction for implementation
257
257
258
+ ### Rough prototypes
259
+
258
260
The ` cuDFDataFrame ` , ` cuDFColumn ` and ` cuDFBuffer ` sketched out by @kkraus14
259
261
[ here] ( https://github.com/data-apis/dataframe-api/issues/29#issuecomment-685123386 )
260
262
seems to be in the right direction.
261
263
264
+ [ This prototype] ( https://github.com/wesm/dataframe-protocol/pull/1 ) by Wes
265
+ McKinney was the first attempt, and has some useful features.
266
+
262
267
TODO: work this out after making sure we're all on the same page regarding requirements.
268
+
269
+
270
+ ### Relevant existing protocols
271
+
272
+ Here are the four most relevant existing protocols, and what requirements they support:
273
+
274
+ | * supports* | buffer protocol | ` __array_interface__ ` | DLPack | Arrow C Data Interface |
275
+ | ---------------------| :---------------:| :---------------------:| :------:| :----------------------:|
276
+ | Python API | | Y | Y | |
277
+ | C API | Y | Y | Y | Y |
278
+ | arrays | Y | Y | Y | Y |
279
+ | dataframes | | | | |
280
+ | chunking | | | | |
281
+ | devices | | | Y | |
282
+ | bool/int/uint/float | Y | Y | Y | Y |
283
+ | missing data | (1) | (2) | (3) | Y |
284
+ | string dtype | (3) | (3) | | Y |
285
+ | datetime dtypes | | (4) | | Y |
286
+ | categoricals | (5) | (5) | (6) | (5) |
287
+
288
+ 1 . Can be done only via separate masks of boolean arrays.
289
+ 2 . ` __array_interface__ ` has a ` mask ` attribute, which is a separate boolean array also implementing the ` __array_interface__ ` protocol.
290
+ 3 . Only fixed-length strings as sequence of char or unicode.
291
+ 4 . Only NumPy datetime and timedelta, which are limited compared to what the Arrow format offers.
292
+ 5 . No explicit support, however categoricals can be mapped to either integers or strings.
293
+ 6 . No explicit support, categoricals can only be mapped to integers.
0 commit comments