-
Notifications
You must be signed in to change notification settings - Fork 83
Infer schemas and coerce data for table cells #346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there! Just a few more things, and we’re done, I swear. 😅
src/table.js
Outdated
case "boolean": | ||
if (typeof value === "string") { | ||
const trimValue = value.trim(); | ||
return trimValue === "true" ? true : trimValue === "false" ? false : null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just want to note that I was surprised that our boolean coercion returned false
for a column of string values, but true
for other types, such as a column of empty objects. Maybe the default return value for strings should be Boolean(value)
, but then that ruins detecting the strings "true" and "false".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returned null
for string values (rather than false
). I'm not sure what the right response is when a Date, object, or other types are coerced to boolean. TRUE
seems surprising to me, but if that's what we want to do, then it does seem strange to have type string
alone not conform to that. Any thoughts, @mbostock?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to do this because string is special: it’s the “untyped” format we use to serialize other types (namely in CSV).
Co-authored-by: Mike Bostock <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted a few trivial suggestions which you can take or leave, but here is what I think (hope!) is the last logical issue! 🙏
Co-authored-by: Mike Bostock <[email protected]>
Resolves https://github.com/observablehq/observablehq/issues/9862 and https://github.com/observablehq/observablehq/issues/9673
In this inference approach, we take a sample (currently the first 100 rows), and for each column, count how many times we encounter each possible data type. Then, if > 90% of the sampled values conform to that type, we take the most frequently encountered type as the column's type. If not, we use type "other."
These changes include some support for upcoming column type assertions, but for the purposes of this PR, I'm testing against main.
inference_and_coercion.mov