Infer schemas and coerce data for table cells #346

libbey-observable · 2023-01-25T23:50:11Z

Resolves https://github.com/observablehq/observablehq/issues/9862 and https://github.com/observablehq/observablehq/issues/9673

In this inference approach, we take a sample (currently the first 100 rows), and for each column, count how many times we encounter each possible data type. Then, if > 90% of the sampled values conform to that type, we take the most frequently encountered type as the column's type. If not, we use type "other."

These changes include some support for upcoming column type assertions, but for the purposes of this PR, I'm testing against main.

inference_and_coercion.mov

src/table.js

mbostock

Almost there! Just a few more things, and we’re done, I swear. 😅

src/table.js

mkfreeman · 2023-02-02T16:47:43Z

src/table.js

+    case "boolean":
+      if (typeof value === "string") {
+        const trimValue = value.trim();
+        return trimValue === "true" ? true : trimValue === "false" ? false : null;


Just want to note that I was surprised that our boolean coercion returned false for a column of string values, but true for other types, such as a column of empty objects. Maybe the default return value for strings should be Boolean(value), but then that ruins detecting the strings "true" and "false".

It returned null for string values (rather than false). I'm not sure what the right response is when a Date, object, or other types are coerced to boolean. TRUE seems surprising to me, but if that's what we want to do, then it does seem strange to have type string alone not conform to that. Any thoughts, @mbostock?

I think we have to do this because string is special: it’s the “untyped” format we use to serialize other types (namely in CSV).

src/table.js

Co-authored-by: Mike Bostock <[email protected]>

src/table.js

mbostock

I posted a few trivial suggestions which you can take or leave, but here is what I think (hope!) is the last logical issue! 🙏

src/table.js

Co-authored-by: Mike Bostock <[email protected]>

libbey-observable added 24 commits January 19, 2023 15:12

Stop using {typed: true} for csv and tsv

ea741fe

Infer schema if none exists

558a6a6

Add schema validity check to address #9673

ba09d45

Update tests

3a3f5a1

Handle sources that are arrays of primitives

e151efd

Formatting

ec38311

Quick updates based on feedback

6e9d64e

With Mike F's coercion

799398f

Remove new validity check fn and use existing

5f30887

Don't mutate source

eb7008a

Don't mutate row

3786837

Add exported fn to index.js

81018d4

Apply user-selected types and update schema

3ec692b

Combine into one regex

2aaaad0

Update handling of "other" and use d3.greatest

4bd58bf

Fix tests

bf97713

Small fixes

afe5241

More coercion

f5c648b

Fix test

740d860

Try supporting number coercion into dates

9ddd352

Try with value.toString

bfd138c

Fixes and allowing for soft coercion

1e820ec

Fix test

28a6aaf

Formatting

0c2a3ca

libbey-observable mentioned this pull request Jan 25, 2023

Infer schema for relevant data sources #344

Closed

Remove export

8884734

mbostock reviewed Jan 26, 2023

View reviewed changes

src/table.js Outdated Show resolved Hide resolved

libbey-observable added 3 commits January 26, 2023 11:20

Update number and date coercion

a95a1bb

Infer integers even if type is number

c7583f7

Coercion improvements

b9aceae

Fix date regex

4600d6e

mbostock reviewed Feb 2, 2023

View reviewed changes

src/table.js Outdated Show resolved Hide resolved

src/table.js Outdated Show resolved Hide resolved

src/table.js Outdated Show resolved Hide resolved

src/table.js Outdated Show resolved Hide resolved

libbey-observable added 2 commits February 2, 2023 08:40

Move date regex to constant

8eccbd8

Move trim to inferType function

154e20e

mkfreeman reviewed Feb 2, 2023

View reviewed changes

src/table.js Outdated Show resolved Hide resolved

libbey-observable added 5 commits February 2, 2023 09:29

Update coercion of dates

d82b2f8

Don't have inferType fall back to "other"

2f2ee5e

Coerce empty strings to null when type is "date"

6eae9d3

Case-insensitive boolean inference/coercion

30ba6e5

Allow multiple types to be counted during inference

7d0f114

mbostock reviewed Feb 2, 2023

View reviewed changes

src/table.js Outdated Show resolved Hide resolved

mbostock reviewed Feb 2, 2023

View reviewed changes

src/table.js Outdated Show resolved Hide resolved

mbostock reviewed Feb 2, 2023

View reviewed changes

src/table.js Outdated Show resolved Hide resolved

Update src/table.js

9bff937

Co-authored-by: Mike Bostock <[email protected]>

mbostock reviewed Feb 2, 2023

View reviewed changes

src/table.js Outdated Show resolved Hide resolved

mbostock requested changes Feb 2, 2023

View reviewed changes

src/table.js Outdated Show resolved Hide resolved

libbey-observable and others added 9 commits February 2, 2023 13:01

Update src/table.js

5c4bc45

Co-authored-by: Mike Bostock <[email protected]>

Clean up trim and lower casing

8344ef6

Use trimmed string in filter

0b21fac

checkpoint

9f9a3f2

tweaks to inferSchema

543d55b

combine loops!

63ea079

whitespace, bigint fixes

89e62c6

prEtTieR

f3a4ad8

stricter string coercion

9a56e67

mbostock approved these changes Feb 2, 2023

View reviewed changes

Handle column of nulls

326f542

libbey-observable merged commit fa1b356 into main Feb 3, 2023

libbey-observable deleted the libbey/infer-and-coerce branch February 3, 2023 00:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Infer schemas and coerce data for table cells #346

Infer schemas and coerce data for table cells #346

Uh oh!

libbey-observable commented Jan 25, 2023 •

edited

Loading

Uh oh!

Uh oh!

mbostock left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkfreeman Feb 2, 2023

Uh oh!

libbey-observable Feb 2, 2023

Uh oh!

mbostock Feb 2, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mbostock left a comment

Uh oh!

Uh oh!

Uh oh!

Infer schemas and coerce data for table cells #346

Infer schemas and coerce data for table cells #346

Uh oh!

Conversation

libbey-observable commented Jan 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mbostock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mkfreeman Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

libbey-observable Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

mbostock Feb 2, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mbostock left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

libbey-observable commented Jan 25, 2023 •

edited

Loading