-
Notifications
You must be signed in to change notification settings - Fork 185
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize field shorthand for columnar data? #191
Comments
I was going to post an issue to this effect after a brief skim of the source code, as it appears to involve a fair bit of data copying. Glad to see an issue is already here! Arquero includes a table.array method that extracts an array (optionally into a specified typed array type). Currently this method performs a copy (necessary if coercing types or if the table has filter or orderby criteria) but we could optimize this to return an underlying array directly (zero copy) when applicable. Even with an array copy here, it might be nice to avoid object serialization of table rows. Along those lines, Arquero can also provide a column iterator using table.values. Also, I would advise against If Observable/Plot eventually settles on a more general convention for accessing columnar data sources, we're happy to look into adapting Arquero along those lines. |
I think we should focus on doing this specifically for Apache Arrow table instances first (since these are the most widely-used columnar data representation, including by the DuckDBClient/SQL code blocks in Observable Framework). |
Here’s is trivial example of how to pass in columnar data. Given an Arrow table: const data = Arrow.tableFromArrays({
id: [1, 2, 3],
name: ["Alice", "Bob", "Charlie"],
age: [35, 25, 45]
}); You can pass the columns in like so: Plot.barY({length: data.numRows}, {x: data.getChild("name"), y: data.getChild("age")}).plot() The goal of this issue is to make the above equivalent to the following shorthand syntax: Plot.barY(data, {x: "name", y: "age"}).plot() To do this, we will need to detect when |
We currently support field (named properties specified as strings) as channel value shorthand, e.g.,
y: "foo"
is upgraded toy: d => d["foo"]
or equivalentlyy: data.map(d => d["foo"])
. For Arquero (and perhaps other column-oriented data structures), this still works because Arquero provides an object iterator; however, it’d be more efficient if Plot supported the same shorthand with Arquero’s column accessors, upgradingy: "foo"
toy: data.column("foo")
y: data.array("foo")
, which returns an iterable over the values.Although, currently this would still involve copying the column into a new array, so we actually wanty: data.column("foo").data
if possible (e.g., Arquero’s Column class could support an toArray method which returns the data if possible, and otherwise constructs a new array using the iterator).The text was updated successfully, but these errors were encountered: