-
Notifications
You must be signed in to change notification settings - Fork 177
duckdb 1.29.0; self-host extensions #1734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
Show all changes
47 commits
Select commit
Hold shift + click to select a range
c70e1bc
explicit duckdb 1.29.0; self-host core extensions; document
Fil 0029c8c
configure which extensions are self-hosted
Fil feeaad8
Merge branch 'main' into fil/duckdb-wasm-1.29
Fil 33aa5cb
hash extensions
Fil 543f823
better docs
Fil 7475589
cleaner duckdb manifest — now works in scripts and embeds
Fil 47b6bd0
restructure code, extensible manifest
Fil abd0380
test, documentation
Fil 7ac5d1d
much nicer config
Fil 0adcb36
document config
Fil 5365371
add support for mvp, clean config & documentation
Fil 1fdf717
parametrized the initial LOAD in DuckDBClient
Fil bc712c3
tests
Fil 2fb2878
bake-in the extensions manifest
Fil bc49674
fix test
Fil 9a13f2a
don't activate spatial on the documentation
Fil e2c8b6c
Merge branch 'main' into fil/duckdb-wasm-1.29
Fil 4a5128d
refactor: hash individual extensions, include the list of platforms i…
Fil 13f892c
don't copy extensions twice
Fil 8bb2866
Merge branch 'main' into fil/duckdb-wasm-1.29
Fil 43ef6eb
Merge branch 'main' into fil/duckdb-wasm-1.29
Fil 6764969
Merge branch 'main' into fil/duckdb-wasm-1.29
mbostock d72f0c3
Update src/duckdb.ts
Fil d6fc020
remove DuckDBClientReport utility
Fil 69f25a2
renames
Fil 30788e3
p for platform
Fil 710f36a
centralize DUCKDBWASMVERSION and DUCKDBVERSION
Fil 4f58100
clearer
Fil a8cfdcd
better config; manifest.extensions now lists individual extensions on…
Fil 490d969
validate extension names; centralize DUCKDBBUNDLES
Fil aaff8f8
fix tests
Fil bc39bbe
Merge branch 'main' into fil/duckdb-wasm-1.29
Fil 8bd0972
copy edit
Fil b90c22a
support loading non-self-hosted extensions
Fil b37be07
test duckdb config normalization & defaults
Fil 9abaf57
documentation
Fil ccc0073
typography
Fil 26c7a6f
doc
Fil 4416dd3
Merge branch 'main' into fil/duckdb-wasm-1.29
mbostock 7704416
use view for <50MB
mbostock 1dde616
docs, shorthand, etc.
mbostock 0491966
annotate fixes
mbostock be26385
disable telemetry on annotate tests, too
mbostock a23d3e4
tidier duckdb manifest
mbostock c753728
Merge branch 'main' into fil/duckdb-wasm-1.29
mbostock 6e828c9
remove todo
mbostock 365dbe3
more robust duckdb: scheme
mbostock File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -65,7 +65,7 @@ const db2 = await DuckDBClient.of({base: FileAttachment("quakes.db")}); | |
| db2.queryRow(`SELECT COUNT() FROM base.events`) | ||
| ``` | ||
|
|
||
| For externally-hosted data, you can create an empty `DuckDBClient` and load a table from a SQL query, say using [`read_parquet`](https://duckdb.org/docs/guides/import/parquet_import) or [`read_csv`](https://duckdb.org/docs/guides/import/csv_import). DuckDB offers many affordances to make this easier (in many cases it detects the file format and uses the correct loader automatically). | ||
| For externally-hosted data, you can create an empty `DuckDBClient` and load a table from a SQL query, say using [`read_parquet`](https://duckdb.org/docs/guides/import/parquet_import) or [`read_csv`](https://duckdb.org/docs/guides/import/csv_import). DuckDB offers many affordances to make this easier. (In many cases it detects the file format and uses the correct loader automatically.) | ||
|
|
||
| ```js run=false | ||
| const db = await DuckDBClient.of(); | ||
|
|
@@ -105,3 +105,96 @@ const sql = DuckDBClient.sql({quakes: `https://earthquake.usgs.gov/earthquakes/f | |
| ```sql echo | ||
| SELECT * FROM quakes ORDER BY updated DESC; | ||
| ``` | ||
|
|
||
| ## Extensions <a href="https://github.com/observablehq/framework/pull/1734" class="observablehq-version-badge" data-version="prerelease" title="Added in #1734"></a> | ||
|
|
||
| [DuckDB extensions](https://duckdb.org/docs/extensions/overview.html) extend DuckDB’s functionality, adding support for additional file formats, new types, and domain-specific functions. For example, the [`json` extension](https://duckdb.org/docs/data/json/overview.html) provides a `read_json` method for reading JSON files: | ||
|
|
||
| ```sql echo | ||
| SELECT bbox FROM read_json('https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_day.geojson'); | ||
| ``` | ||
|
|
||
| To read a local file (or data loader), use `FileAttachment` and interpolation `${…}`: | ||
|
|
||
| ```sql echo | ||
| SELECT bbox FROM read_json(${FileAttachment("../quakes.json").href}); | ||
| ``` | ||
|
|
||
| For convenience, Framework configures the `json` and `parquet` extensions by default. Some other [core extensions](https://duckdb.org/docs/extensions/core_extensions.html) also autoload, meaning that you don’t need to explicitly enable them; however, Framework will only [self-host extensions](#self-hosting-of-extensions) if you explicitly configure them, and therefore we recommend that you always use the [**duckdb** config option](../config#duckdb) to configure DuckDB extensions. Any configured extensions will be automatically [installed and loaded](https://duckdb.org/docs/extensions/overview#explicit-install-and-load), making them available in SQL code blocks as well as the `sql` and `DuckDBClient` built-ins. | ||
|
|
||
| For example, to configure the [`spatial` extension](https://duckdb.org/docs/extensions/spatial/overview.html): | ||
|
|
||
| ```js run=false | ||
| export default { | ||
| duckdb: { | ||
| extensions: ["spatial"] | ||
| } | ||
| }; | ||
| ``` | ||
|
|
||
| You can then use the `ST_Area` function to compute the area of a polygon: | ||
|
|
||
| ```sql echo run=false | ||
| SELECT ST_Area('POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))'::GEOMETRY) as area; | ||
| ``` | ||
|
|
||
| To tell which extensions have been loaded, you can run the following query: | ||
|
|
||
| ```sql echo | ||
| FROM duckdb_extensions() WHERE loaded; | ||
| ``` | ||
|
|
||
| <div class="warning"> | ||
|
|
||
| If the `duckdb_extensions()` function runs before DuckDB autoloads a core extension (such as `json`), it might not be included in the returned set. | ||
|
|
||
| </div> | ||
|
|
||
| ### Self-hosting of extensions | ||
|
|
||
| As with [npm imports](../imports#self-hosting-of-npm-imports), configured DuckDB extensions are self-hosted, improving performance, stability, & security, and allowing you to develop offline. Extensions are downloaded to the DuckDB cache folder, which lives in <code>.observablehq/<wbr>cache/<wbr>\_duckdb</code> within the source root (typically `src`). You can clear the cache and restart the preview server to re-fetch the latest versions of any DuckDB extensions. If you use an [autoloading core extension](https://duckdb.org/docs/extensions/core_extensions.html#list-of-core-extensions) that is not configured, DuckDB-Wasm [will load it](https://duckdb.org/docs/api/wasm/extensions.html#fetching-duckdb-wasm-extensions) from the default extension repository, `extensions.duckdb.org`, at runtime. | ||
|
|
||
| ## Configuring | ||
|
|
||
| The second argument to `DuckDBClient.of` and `DuckDBClient.sql` is a [`DuckDBConfig`](https://shell.duckdb.org/docs/interfaces/index.DuckDBConfig.html) object which configures the behavior of DuckDB-Wasm. By default, Framework sets the `castBigIntToDouble` and `castTimestampToDate` query options to true. To instead use [`BigInt`](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/BigInt): | ||
|
|
||
| ```js run=false | ||
| const bigdb = DuckDBClient.of({}, {query: {castBigIntToDouble: false}}); | ||
| ``` | ||
|
|
||
| By default, `DuckDBClient.of` and `DuckDBClient.sql` automatically load all [configured extensions](#extensions). To change the loaded extensions for a particular `DuckDBClient`, use the **extensions** config option. For example, pass an empty array to instantiate a DuckDBClient with no loaded extensions (even if your configuration lists several): | ||
|
|
||
| ```js echo run=false | ||
| const simpledb = DuckDBClient.of({}, {extensions: []}); | ||
| ``` | ||
|
|
||
| Alternatively, you can configure extensions to be self-hosted but not load by default using the **duckdb** config option and the `load: false` shorthand: | ||
|
|
||
| ```js run=false | ||
| export default { | ||
| duckdb: { | ||
| extensions: { | ||
| spatial: false, | ||
| h3: false | ||
| } | ||
| } | ||
| }; | ||
| ``` | ||
|
|
||
| You can then selectively load extensions as needed like so: | ||
|
|
||
| ```js echo run=false | ||
| const geosql = DuckDBClient.sql({}, {extensions: ["spatial", "h3"]}); | ||
| ``` | ||
|
|
||
| In the future, we’d like to allow DuckDB to be configured globally (beyond just [extensions](#extensions)) via the [**duckdb** config option](../config#duckdb); please upvote [#1791](https://github.com/observablehq/framework/issues/1791) if you are interested in this feature. | ||
|
|
||
| ## Versioning | ||
|
|
||
| Framework currently uses [DuckDB-Wasm 1.29.0](https://github.com/duckdb/duckdb-wasm/releases/tag/v1.29.0), which aligns with [DuckDB 1.1.1](https://github.com/duckdb/duckdb/releases/tag/v1.1.1). You can load a different version of DuckDB-Wasm by importing `npm:@duckdb/duckdb-wasm` directly, for example: | ||
|
|
||
| ```js run=false | ||
| import * as duckdb from "npm:@duckdb/[email protected]"; | ||
| ``` | ||
|
|
||
| However, you will not be able to change the version of DuckDB-Wasm used by SQL code blocks or the `sql` or `DuckDBClient` built-ins, nor can you use Framework’s support for self-hosting extensions with a different version of DuckDB-Wasm. | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,17 +29,25 @@ import * as duckdb from "npm:@duckdb/duckdb-wasm"; | |
| // ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE | ||
| // POSSIBILITY OF SUCH DAMAGE. | ||
|
|
||
| const bundle = await duckdb.selectBundle({ | ||
| mvp: { | ||
| mainModule: import.meta.resolve("npm:@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm"), | ||
| mainWorker: import.meta.resolve("npm:@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js") | ||
| }, | ||
| eh: { | ||
| mainModule: import.meta.resolve("npm:@duckdb/duckdb-wasm/dist/duckdb-eh.wasm"), | ||
| mainWorker: import.meta.resolve("npm:@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js") | ||
| } | ||
| }); | ||
|
|
||
| // Baked-in manifest. | ||
| // eslint-disable-next-line no-undef | ||
| const manifest = DUCKDB_MANIFEST; | ||
| const candidates = { | ||
| ...(manifest.bundles.includes("mvp") && { | ||
| mvp: { | ||
| mainModule: import.meta.resolve("npm:@duckdb/duckdb-wasm/dist/duckdb-mvp.wasm"), | ||
| mainWorker: import.meta.resolve("npm:@duckdb/duckdb-wasm/dist/duckdb-browser-mvp.worker.js") | ||
| } | ||
| }), | ||
| ...(manifest.bundles.includes("eh") && { | ||
| eh: { | ||
| mainModule: import.meta.resolve("npm:@duckdb/duckdb-wasm/dist/duckdb-eh.wasm"), | ||
| mainWorker: import.meta.resolve("npm:@duckdb/duckdb-wasm/dist/duckdb-browser-eh.worker.js") | ||
| } | ||
| }) | ||
mbostock marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| }; | ||
| const bundle = await duckdb.selectBundle(candidates); | ||
| const activePlatform = manifest.bundles.find((key) => bundle.mainModule === candidates[key].mainModule); | ||
| const logger = new duckdb.ConsoleLogger(duckdb.LogLevel.WARNING); | ||
|
|
||
| let db; | ||
|
|
@@ -169,6 +177,7 @@ export class DuckDBClient { | |
| config = {...config, query: {...config.query, castBigIntToDouble: true}}; | ||
| } | ||
| await db.open(config); | ||
| await registerExtensions(db, config.extensions); | ||
| await Promise.all(Object.entries(sources).map(([name, source]) => insertSource(db, name, source))); | ||
| return new DuckDBClient(db); | ||
| } | ||
|
|
@@ -178,9 +187,22 @@ export class DuckDBClient { | |
| } | ||
| } | ||
|
|
||
| Object.defineProperty(DuckDBClient.prototype, "dialect", { | ||
| value: "duckdb" | ||
| }); | ||
| Object.defineProperty(DuckDBClient.prototype, "dialect", {value: "duckdb"}); | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The DatabaseClient.dialect isn’t used for anything in Framework. I think it’s used in notebooks for ejecting from a SQL cell, but there’s no analogous concept in Framework so maybe we should consider dropping it. Not directly related to this PR though! |
||
|
|
||
| async function registerExtensions(db, extensions) { | ||
| const con = await db.connect(); | ||
| try { | ||
| await Promise.all( | ||
| manifest.extensions.map(([name, {[activePlatform]: ref, load}]) => | ||
| con | ||
| .query(`INSTALL "${name}" FROM '${import.meta.resolve(ref)}'`) | ||
| .then(() => (extensions === undefined ? load : extensions.includes(name)) && con.query(`LOAD "${name}"`)) | ||
| ) | ||
| ); | ||
| } finally { | ||
| await con.close(); | ||
| } | ||
| } | ||
|
|
||
| async function insertSource(database, name, source) { | ||
| source = await source; | ||
|
|
@@ -258,7 +280,7 @@ async function insertFile(database, name, file, options) { | |
| }); | ||
| } | ||
| if (/\.parquet$/i.test(file.name)) { | ||
| const table = file.size < 10e6 ? "TABLE" : "VIEW"; // for small files, materialize the table | ||
| const table = file.size < 50e6 ? "TABLE" : "VIEW"; // for small files, materialize the table | ||
| return await connection.query(`CREATE ${table} '${name}' AS SELECT * FROM parquet_scan('${file.name}')`); | ||
| } | ||
| if (/\.(db|ddb|duckdb)$/i.test(file.name)) { | ||
|
|
||
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.