Massively optimize canvas 2D rendering by using vertex buffers by stuartcarnie · Pull Request #112481 · godotengine/godot

stuartcarnie · 2025-11-06T18:50:36Z

Summary

This PR is a performance optimisation for the Canvas RD renderer after introducing batching. @clayjohn has observed a regression in performance under certain combinations and older hardware that this PR is intended to address. It should not regress existing performance gains. The core change is to switch instance data from a uniform buffer (shader storage buffer object) to a vertex buffer object (VBO).

Caution

D3D12 needs a validate vertex_format_create to use the new VertexAttribute::binding member and to update command_render_bind_vertex_buffers to handle UMA buffers.

TODOs

Remove all the USE_VAO stuff, as that was just me trying to work with both
Fix the PushConstant data so that it's size is reduced from 144 bytes to 84 (+ padding)
Add the dynamic_offset index to the vertex buffer binding. Update the drivers to use the offset (already similar for uniforms):
- Metal
- Vulkan
- D3D12
Clean up the new dynamic vertex binding API and streamline it (switch to Span in the RenderDeviceGraph, so we don't have to allocate)
Extend VertexAttribute change, so a set of attributes can bind to the same buffer
- This will make the API more efficient as drivers will only need to create a single buffer binding and consume a single slot.

Testing

We must verify all rendered command types:

Verify:

INSTANCE_FLAGS_USE_MSDF
INSTANCE_FLAGS_USE_LCD

Benchmarks

See below for more detail

	Adreno 530	Adreno 640	Mali G715	Intel Xe	RX 6900XT	Mali G68 MP5
improvement	4-5x	1.5-2x	1-1.25x	2-3x	1x	1.1x

stuartcarnie

@clayjohn, et al some notes for your information

drivers/d3d12/rendering_device_driver_d3d12.cpp

stuartcarnie · 2025-11-11T01:21:55Z

core/templates/hash_map.h

 		}
 	}

+	HashMap &operator=(HashMap &&p_other) noexcept {


We can std::move so we can transfer to another type. I'm using it in vertex_format_create for the bindings member:

VertexDescriptionCache &ce = vertex_formats.insert(id, VertexDescriptionCache())->value; ce.vertex_formats = vertex_descriptions; ce.bindings = std::move(bindings); ce.driver_id = driver_id;

This is worth a separate core PR. Move semantics are desirable for all our containers, and that would make this PR completely non-core.

@Ivorforce Are you okay with merging this change in this PR? I don't really want to block this PR while we wait for this optimization to be applied to other containers.

My preference in general is for core optimizations to be in the same PR where they are used as well so the git history shows why certain changes were needed

Generally agree that optimizations should be introduced only when needed, but move semantics are needed for all our containers (and most of them do have it already). Looks like I just forgot to add them for hash maps.
Core changes particularly have a habit of being 'snuck in' in bigger PRs, which can make it very hard to spot and estimate the repercussions of. Generally I would expect core changes to be beneficial to not only the use-case of a PR, but the codebase in general. For example, I would normally expect benchmarks for common cases, if it's optimization related. That's why I prefer them in separate PRs when possible.

Anyway, they look fine to me (granted noexcept is removed), so I'm OK with merging them in here.

noexcept has been removed - thanks for the feedback!

servers/rendering/renderer_rd/forward_clustered/render_forward_clustered.cpp

servers/rendering/renderer_rd/shaders/canvas.glsl

servers/rendering/rendering_device.cpp

stuartcarnie · 2025-11-11T01:38:12Z

servers/rendering/rendering_device_driver.h

-	virtual VertexFormatID vertex_format_create(VectorView<VertexAttribute> p_vertex_attribs) = 0;
+	virtual VertexFormatID vertex_format_create(Span<VertexAttribute> p_vertex_attribs, const VertexAttributeBindingsMap &p_vertex_bindings) = 0;


Switching to Span means we can avoid allocations at call sites. We should evaluate all calls and use FixedVector, as the sizes in the renderer_rd are all known at compile time.

stuartcarnie · 2025-11-11T21:08:34Z

core/templates/hash_map.h

 	static constexpr uint32_t MIN_CAPACITY_INDEX = 2; // Use a prime.
 	static constexpr float MAX_OCCUPANCY = 0.75;
 	static constexpr uint32_t EMPTY_HASH = 0;
+	using KV = KeyValue<TKey, TValue>; // Type alias for easier access to KeyValue.


Particularly useful when you have a typedef such as:

typedef HashMap<uint32_t, VertexAttributeBinding> VertexAttributeBindingsMap;

as you can then use:

for (const VertexAttributeBindingsMap::KV &ky : p_vertex_bindings) { // ... }

vs

for (const KeyValue<uint32_t, VertexAttributeBinding> &kv : p_vertex_bindings) { // ... }

Also, if you change the Key or Value type, these call sites don't need to be updated.

servers/rendering/multi_uma_buffer.h

clayjohn · 2025-11-13T04:24:07Z

Some preliminary testing using a modified MRP from #104194

android-4.4-perf-clay.zip

The original MRP has a loop that adds SubViewports that have a single Sprite rendered to them, then are rendered to the screen.

That means that in a loop of N, we rendering N * 2 draw calls using N + 1 render passes (i.e. N render passes with 1 draw call and 1 render pass with N draw calls). None of the draw calls are batched, so this MRP exposes the worst case performance. It can be CPU bottlenecked on some hardware and GPU bottlenecked on others.

The second test case renders N sprites with the same texture in the same location. This is the best case for batching. The number of draw calls depends on the hardware's capabilities, but it is often only 1 or a handful. Drawing that many Sprites in one location is a weird edge case for TBDR GPUs, so Mali GPUs and Apple silicon GPUs may not reflect typical performance scenarios.

# 2000 Sprites Adreno 530
4.3: 55 FPS
4.5.1: 10 FPS
PR:  43 FPS

# 100 viewports Adreno 530
4.3 36 FPS
4.5.1: 6 FPS
PR: 30 FPS

# 10000 Sprites Adreno 640
4.3 25 FPS
4.5.1: 19 FPS
PR: 37 FPS

# 500 viewports Adreno 640
4.3 18 FPS
4.5.1: 12 FPS
PR: 18 FPS

# 10000 Sprites Mali G715
4.3 52 FPS
4.5.1: 45 FPS
PR: 57 FPS

# 500 viewports Mali G715
4.3 32 FPS
4.5.1: 24 FPS
PR: 25 FPS (This is likely caused by an unrelated bug and should be investigated)

# 10000 Sprites Intel XE
4.3: 80 FPS
4.5.1: 23 FPS
PR: 102 FPS

# 500 viewports Intel XE
4.3: 28 FPS
4.5.1: 16 FPS
PR: 39 FPS

# 10000 Sprites Windows - Ryzen 5 9600X - Radeon RX 6900 XT
4.3: 675 FPS
4.5.1:  752 FPS
PR: 856 FPS
Master: 839 FPS

# 500 viewports Windows - Ryzen 5 9600X - Radeon RX 6900 XT
4.3: 1475 FPS
4.5.1:  1924 FPS
PR: 2307 FPS
Master: 2206 FPS

# 50000 Sprites Windows - Ryzen 5 9600X - Radeon RX 6900 XT
4.3: 102 FPS
4.5.1:  170 FPS
PR: 145 FPS
Master: 145 FPS

# 10000 viewports Windows - Ryzen 5 9600X - Radeon RX 6900 XT
4.3:  97 FPS (GPU Bound)
4.5.1:  13 FPS
PR: 98 FP
Master: CRASH

blueskythlikesclouds · 2025-11-13T07:36:02Z

I can implement the D3D12 changes. Should I make a PR to your fork?

AThousandShips · 2025-11-13T09:47:49Z

core/templates/hash_map.h

 		}
 	}

+	HashMap(HashMap &&p_other) noexcept {


Do these noexcept have any effect? We don't use exceptions and as far as I know we don't use this directive elsewhere

True – I'll remove them

clayjohn

Amazing work! I have tested extensively on Linux, Windows, and Android in addition to your testing on MacOS. So I think we have covered our bases.

From testing it appears we have resolved the performance regression introduced from batching in almost all cases and even improved performance in many cases. At this point I think this PR is ready to go and get wider testing!

While discussing this with Stuart, we identified some further optimizations we could make. But the current state is really good and gives us the majority of possible gains with the least intrusive changes

AThousandShips · 2025-11-14T09:47:09Z

doc/classes/RDVertexAttribute.xml

 	<members>
+		<member name="binding" type="int" setter="set_binding" getter="get_binding" default="4294967295">
+			The index of the buffer in the vertex buffer array to bind this vertex attribute. When set to -1, it defaults to the index of the attribute.
+			[b]Note:[/b] You cannot mix binding explicitly assigned attributes with implicitly assigned ones (i.e. -1). Either all attributes must have their binding set to -1, or all must have explicit bindings.


Suggested change

[b]Note:[/b] You cannot mix binding explicitly assigned attributes with implicitly assigned ones (i.e. -1). Either all attributes must have their binding set to -1, or all must have explicit bindings.

[b]Note:[/b] You cannot mix binding explicitly assigned attributes with implicitly assigned ones (i.e. [code]-1[/code]). Either all attributes must have their binding set to [code]-1[/code], or all must have explicit bindings.

AThousandShips · 2025-11-14T09:47:21Z

doc/classes/RDVertexAttribute.xml

 	</tutorials>
 	<members>
+		<member name="binding" type="int" setter="set_binding" getter="get_binding" default="4294967295">
+			The index of the buffer in the vertex buffer array to bind this vertex attribute. When set to -1, it defaults to the index of the attribute.


Suggested change

The index of the buffer in the vertex buffer array to bind this vertex attribute. When set to -1, it defaults to the index of the attribute.

The index of the buffer in the vertex buffer array to bind this vertex attribute. When set to [code]-1[/code], it defaults to the index of the attribute.

software-2 · 2025-11-14T12:57:32Z

I ran some tests on my Samsung Galaxy Tab S9 FE+, which seems to be a worst-case device, using my original test program (here). @clayjohn should I run your modified version? (I didn't see it until after I finished)

I couldn't get this PR branch to run (the android app would hang at the splash screen). Master at bd2ca13 ran just fine, so I pulled in this PR's commits on top of that.

4.5-dev3: 68-72fps lows, sometimes hitting 80
master: 72-80 lows, but frequently hitting the device max of 90
master + PR: 86-90 consistently, but I'm seeing dips sometimes to 74 briefly about every 10 seconds. (Those spikes seem to disappear when I have the Visual Profiler running, keeping a consistent 86+ when the profiler is on.)

For comparison, 4.3-stable (prior to the batching changes) gives lower overall frames (~82 on average), but does not have the occasional framerate dip.

This is absolutely a major improvement!

clayjohn · 2025-11-14T18:46:33Z

I ran some tests on my Samsung Galaxy Tab S9 FE+, which seems to be a worst-case device, using my original test program (here). @clayjohn should I run your modified version? (I didn't see it until after I finished)

No need. My modified version just added a couple lines of code to also test rendering a high number of sprites in a single batch

I couldn't get this PR branch to run (the android app would hang at the splash screen). Master at bd2ca13 ran just fine, so I pulled in this PR's commits on top of that.

Sorry about that, there was an android regression two days ago that was fixed yesterday #112716. Pulling in this change on top of master was the right thing to do!

For comparison, 4.3-stable (prior to the batching changes) gives lower overall frames (~82 on average), but does not have the occasional framerate dip.

Depending on your build settings, the dip may go away with official builds. We enable swappy by default on official builds which helps reduce frame dips. By default, swappy is disabled for custom build

stuartcarnie · 2025-11-14T18:52:19Z

@AThousandShips I've removed the except and fixed the documentation and added you as a co-author.

@clayjohn shall I'll wait for @Ivorforce's response before removing the move semantics from HashMap?

Ivorforce · 2025-11-14T18:54:13Z

I've already replied; in short: The HashMap changes look good to me.

- Add support for vertex bindings and UMA vertex buffers in D3D12. - Simplify 2D instance params and move more into per-batch data to save bandwidth Co-authored-by: Skyth <19259897+blueskythlikesclouds@users.noreply.github.com> Co-authored-by: Clay John <claynjohn@gmail.com> Co-authored-by: A Thousand Ships <96648715+athousandships@users.noreply.github.com>

Repiteo · 2025-11-14T20:28:52Z

Thanks!

YeldhamDev · 2025-11-19T21:20:56Z

This PR is still has regressions, see #112938.

stuartcarnie force-pushed the 2d_canvas_vbos branch 2 times, most recently from 83ceba7 to 0251525 Compare November 11, 2025 01:14

stuartcarnie commented Nov 11, 2025

View reviewed changes

clayjohn requested a review from blueskythlikesclouds November 11, 2025 02:11

stuartcarnie force-pushed the 2d_canvas_vbos branch 2 times, most recently from 18e4eea to f1ba020 Compare November 11, 2025 19:52

stuartcarnie changed the title ~~spike: VBOs for Canvas 2D~~ 2D: Use Vertex Buffer Objects for Canvas 2D instance data Nov 11, 2025

stuartcarnie force-pushed the 2d_canvas_vbos branch from f1ba020 to a40aca9 Compare November 11, 2025 21:06

stuartcarnie commented Nov 11, 2025

View reviewed changes

stuartcarnie force-pushed the 2d_canvas_vbos branch 5 times, most recently from 6cfae71 to 329eec7 Compare November 11, 2025 23:43

stuartcarnie commented Nov 11, 2025

View reviewed changes

servers/rendering/multi_uma_buffer.h Show resolved Hide resolved

stuartcarnie marked this pull request as ready for review November 12, 2025 03:48

stuartcarnie requested review from a team as code owners November 12, 2025 03:48

stuartcarnie mentioned this pull request Nov 12, 2025

Core: Switch RID_Alloc::owns to lock-free (reverted) #112657

Merged

AThousandShips added bug platform:android performance topic:rendering regression labels Nov 12, 2025

AThousandShips added this to the 4.6 milestone Nov 12, 2025

stuartcarnie force-pushed the 2d_canvas_vbos branch from 329eec7 to 4c65fbd Compare November 12, 2025 19:51

Calinou added the topic:2d label Nov 12, 2025

AThousandShips reviewed Nov 13, 2025

View reviewed changes

stuartcarnie force-pushed the 2d_canvas_vbos branch from b0722cf to 1f88bc0 Compare November 14, 2025 06:30

clayjohn approved these changes Nov 14, 2025

View reviewed changes

clayjohn changed the title ~~2D: Use Vertex Buffer Objects for Canvas 2D instance data~~ Massively optimize canvas 2D rendering by using vertex buffers Nov 14, 2025

clayjohn mentioned this pull request Nov 14, 2025

Performance regression in 4.4 on Android after introducing batching (GPU bottleneck) #104194

Closed

AThousandShips reviewed Nov 14, 2025

View reviewed changes

stuartcarnie force-pushed the 2d_canvas_vbos branch from 1f88bc0 to c954518 Compare November 14, 2025 18:45

stuartcarnie force-pushed the 2d_canvas_vbos branch from c954518 to 2b264fe Compare November 14, 2025 18:50

stuartcarnie force-pushed the 2d_canvas_vbos branch from 2b264fe to cfa9bab Compare November 14, 2025 19:24

stuartcarnie force-pushed the 2d_canvas_vbos branch from cfa9bab to 90c0e6a Compare November 14, 2025 19:25

Repiteo merged commit 235d112 into godotengine:master Nov 14, 2025
20 checks passed

stuartcarnie deleted the 2d_canvas_vbos branch November 14, 2025 20:37

This was referenced Nov 15, 2025

GDShader error when using varyings #112799

Closed

Reorganize canvas shader varyings in RD renderer #112800

Merged

AThousandShips mentioned this pull request Nov 18, 2025

Regression: Clip Children broken #112917

Closed

YeldhamDev mentioned this pull request Nov 19, 2025

Output panel's scrollbar invisible for interface/theme/style set to Classic #112938

Closed

YeldhamDev mentioned this pull request Dec 10, 2025

Graph's scrollbars are overbrighten in visual shader editor (classic style) #113638

Closed

matheusmdx mentioned this pull request Jan 3, 2026

Android editor shows lots of rendering bugs with Vulkan and RENDER_GRAPH_REORDER defined to 0 #102635

Closed

matheusmdx mentioned this pull request Jan 13, 2026

Sun flicker on Android when camera moves (Adreno 505 / Redmi 8) using Mobile Renderer #114892

Open

kleonc mentioned this pull request Jan 19, 2026

When using texture_margin and region_rect in StyleBoxTexture, the texture display exhibits an offset (VBO regression). #115117

Closed

clayjohn mentioned this pull request Jan 19, 2026

Increase precision of ninepatch source rect to ensure pixel perfect alignment #115152

Merged

bruvzg mentioned this pull request Feb 23, 2026

Fix LCD batching flag for StyleBoxTexture #116647

Merged

blueskythlikesclouds removed their request for review February 25, 2026 08:49

		virtual VertexFormatID vertex_format_create(VectorView<VertexAttribute> p_vertex_attribs) = 0;
		virtual VertexFormatID vertex_format_create(Span<VertexAttribute> p_vertex_attribs, const VertexAttributeBindingsMap &p_vertex_bindings) = 0;

	[b]Note:[/b] You cannot mix binding explicitly assigned attributes with implicitly assigned ones (i.e. -1). Either all attributes must have their binding set to -1, or all must have explicit bindings.
	[b]Note:[/b] You cannot mix binding explicitly assigned attributes with implicitly assigned ones (i.e. [code]-1[/code]). Either all attributes must have their binding set to [code]-1[/code], or all must have explicit bindings.

	The index of the buffer in the vertex buffer array to bind this vertex attribute. When set to -1, it defaults to the index of the attribute.
	The index of the buffer in the vertex buffer array to bind this vertex attribute. When set to [code]-1[/code], it defaults to the index of the attribute.

Uh oh!

Conversation

stuartcarnie commented Nov 6, 2025 • edited by clayjohn Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

TODOs

Testing

Benchmarks

Uh oh!

stuartcarnie left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Ivorforce Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

clayjohn commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

blueskythlikesclouds commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

clayjohn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

software-2 commented Nov 14, 2025

Uh oh!

clayjohn commented Nov 14, 2025

Uh oh!

stuartcarnie commented Nov 14, 2025

Uh oh!

Ivorforce commented Nov 14, 2025

Uh oh!

Uh oh!

Repiteo commented Nov 14, 2025

Uh oh!

YeldhamDev commented Nov 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

9 participants

stuartcarnie commented Nov 6, 2025 •

edited by clayjohn

Loading

Ivorforce Nov 14, 2025 •

edited

Loading

clayjohn commented Nov 13, 2025 •

edited

Loading

clayjohn left a comment •

edited

Loading