perf: Avoid recursive `union all by name` call for duckdb diagonal concat by FBruzzesi · Pull Request #3399 · narwhals-dev/narwhals

FBruzzesi · 2026-01-11T20:29:34Z

Description

I run a benchmark locally with different configurations (times are average on 3 runs per configuration), and overlap fraction of 50% of the columns:

n_frames	n_rows	n_cols	time[s] (main)	time[s] (branch)
5	10000	20	0.0591	0.0504
5	10000	50	0.0627	0.0603
5	100000	20	0.1759	0.1709
5	100000	50	0.4753	0.4462
10	10000	20	0.0914	0.0710
10	10000	50	0.2312	0.1790
10	100000	20	0.5308	0.5432
10	100000	50	1.3510	1.4438
20	10000	20	0.3652	0.2255
20	10000	50	0.9011	0.5725
20	100000	20	1.9238	1.7864
20	100000	50	4.9235	4.4886

What type of PR is this? (check all applicable)

Related issues

Related feat: Add concat(..., how="*_relaxed"}) #3398 (comment)

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

dangotbanned · 2026-01-12T12:39:34Z

Thanks @FBruzzesi

Note

take all of this with a grain of salt, as my extent of DuckDB use has been within narwhals 😅

Benchmark

Double thanks for including a benchmark!

I run a benchmark locally with different configurations (times are average on 3 runs per configuration),
and overlap fraction of 50% of the columns:

n_frames n_rows n_cols time[s] (main) time[s] (branch)

5 10000 20 0.0591 0.0504

... ... ... ... ...

20 100000 50 4.9235 4.4886

I am left with some more questions though, which should be easy enough to get to the bottom of 🤞

Configurations

The most general conclusion I can make is that there looks to be more of a benefit as the size and number of datasets increase.

At the lower end, the speedup is narrow and weirdly n_frames=10 goes into the negative.

Would you be able to extend the config to give a broader view?

Current

n_frames = 5, 10, 20
n_rows= 10_000 100_000
n_cols = 20, 50

Proposed

I would hope when we go below the current n_*, we at-worst get the same performance and not regress.

This also extends the upper bounds for 2/3 parameters, where I'm expecting to continue seeing a perf boost 😎

n_frames = 2, 5, 10, 20, 50
n_rows= 100, 1_000, 10_000 100_000
n_cols = 5, 10, 20, 50

Presentation

time[s] (main) and time[s] (branch) are useful, but the first thing my brain wanted to do is compare the runtimes.

Could you replace these with diff[s] and speedup?

I'm only suggesting to replace, since the table will be pretty big after the other request 😂

FBruzzesi · 2026-01-12T14:22:23Z

Thanks @dangotbanned

take all of this with a grain of salt, as my extent of DuckDB use has been within narwhals 😅

That makes two of us!

Proposed

I would hope when we go below the current n_*, we at-worst get the same performance and not regress.

This also extends the upper bounds for 2/3 parameters, where I'm expecting to continue seeing a perf boost 😎
n_frames = 2, 5, 10, 20, 50
n_rows= 100, 1_000, 10_000 100_000
n_cols = 5, 10, 20, 50

I have a 16 GB machine 500GB Drive, I can barely run the (20, 100_000, 50) config, 50 dataframes will go OOM

Presentation

time[s] (main) and time[s] (branch) are useful, but the first thing my brain wanted to do is compare the runtimes.

Could you replace these with diff[s] and speedup?

I'm only suggesting to replace, since the table will be pretty big after the other request 😂

This is quite easy to achieve

dangotbanned · 2026-01-12T14:45:14Z

#3399 (comment)

I have a 16 GB machine 500GB Drive, I can barely run the (20, 100_000, 50) config, 50 dataframes will go OOM

Ah yes, that'll teach me for plucking numbers out of thin air 😂

FBruzzesi · 2026-01-12T16:49:02Z

Caution

Removed conclusions comment as inaccurate

FBruzzesi · 2026-01-25T11:01:39Z

I am going to close this as it does not seem to have particular improvement. We can come back to it either once they support union by name in the python API or via AlignDiagonal (from #3404) and .union

perf: Avoid recursive union all by name call

add6d70

FBruzzesi added performance duckdb Issue is related to duckdb backend labels Jan 11, 2026

FBruzzesi changed the title ~~perf: Avoid recursive union all by name call~~ perf: Avoid recursive union all by name call for duckdb diagonal concat Jan 11, 2026

whoopsie

74f5ff8

FBruzzesi mentioned this pull request Jan 11, 2026

feat: Add concat(..., how="*_relaxed"}) #3398

Draft

12 tasks

FBruzzesi marked this pull request as draft January 12, 2026 19:02

FBruzzesi closed this Jan 25, 2026

FBruzzesi deleted the perf/duckdb-diagonal-concat branch January 25, 2026 11:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Avoid recursive `union all by name` call for duckdb diagonal concat#3399

perf: Avoid recursive `union all by name` call for duckdb diagonal concat#3399
FBruzzesi wants to merge 2 commits intomainfrom
perf/duckdb-diagonal-concat

FBruzzesi commented Jan 11, 2026 •

edited

Loading

Uh oh!

dangotbanned commented Jan 12, 2026 •

edited

Loading

Uh oh!

FBruzzesi commented Jan 12, 2026 •

edited

Loading

Proposed

Presentation

Uh oh!

dangotbanned commented Jan 12, 2026

Uh oh!

FBruzzesi commented Jan 12, 2026 •

edited

Loading

Uh oh!

FBruzzesi commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

FBruzzesi commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

What type of PR is this? (check all applicable)

Related issues

Checklist

Uh oh!

dangotbanned commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark

Configurations

Current

Proposed

Presentation

Uh oh!

FBruzzesi commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Proposed

Presentation

Uh oh!

dangotbanned commented Jan 12, 2026

Uh oh!

FBruzzesi commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

FBruzzesi commented Jan 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

FBruzzesi commented Jan 11, 2026 •

edited

Loading

dangotbanned commented Jan 12, 2026 •

edited

Loading

FBruzzesi commented Jan 12, 2026 •

edited

Loading

FBruzzesi commented Jan 12, 2026 •

edited

Loading