Optimize the bresenham algorithm to avoid an unneeded vector allocation by groud · Pull Request #112731 · godotengine/godot

groud · 2025-11-13T13:00:41Z

When I implemented the bresenham algorithm 4 year ago, I didn't really put a lot of effort into optimizing it. This is now fixed for internal use, as this new implementation does not allocate a Vector<Point2i> anymore. I assume this should improve performance a bit in the TileMap editor when drawing lines.

Instead, this PR implements a C++ iterable that can be used that way:

for (Vector2i point : Geometry2D::Bresenham(from, to)) {
    ...
}

Note that we have tests for the algorithm that all passed correctly. So I think we can safely assume this new implementation is working fine.

editor/import/editor_atlas_packer.cpp

groud · 2025-11-13T13:40:55Z

Here are the benchmark results:

	{
		Vector2i a;
		uint64_t old_time = OS::get_singleton()->get_ticks_usec();
		Vector<Vector2i> points = Geometry2D::bresenham_line(Vector2i(), Vector2i(1000000,5000));
		for (const Vector2i &v : points) {
			a = v; // Do something
		}
		uint64_t new_time = OS::get_singleton()->get_ticks_usec();
		print_line("Before", new_time - old_time);
	}

vs

	{
		Vector2i a;
		uint64_t old_time = OS::get_singleton()->get_ticks_usec();
		for (const Vector2i &v : Geometry2D::Bresenham(Vector2i(), Vector2i(1000000,5000))) {
			a = v; // Do something
		}
		uint64_t new_time = OS::get_singleton()->get_ticks_usec();
		print_line("After", new_time - old_time);
	}
}

gets me averaging several runs:

Before ~ 25ms
After ~ 10ms

So well, x2.5 faster with the new implementation.
This is with a single huge line though. As for smaller lines it's harder to quantify anyway, as both functions would be quite fast anyway.

AThousandShips · 2025-11-13T13:42:37Z

That looks like it just optimizes out the loop, I'd use some tool to ensure it actually loops and does anything with the data

Ivorforce · 2025-11-13T13:49:08Z

It might be solvable to solve the optimization issue by changing Vector2i a; to volatile. But I usually declare some function in another header, define it in another cpp file, and call it with the final argument (v). Since calls to other compile units are blackboxes, that's often a very cheap way to force the compiler to keep the result. Just make sure LTO is disabled.

AThousandShips · 2025-11-13T13:50:56Z

Can also be verified with checking the output

Though there are some aspects of this that this simple benchmarks misses, the non-bulk nature of this case isn't really represented in this case (so it would verify the case where just getting the bulk, but for longer code in the iteration it might not be representative, temporal locality etc.), so would be useful to benchmark the specific cases in the engine it is used to get real world data

groud · 2025-11-13T14:03:38Z

It might be solvable to solve the optimization issue by changing Vector2i a; to volatile. But I usually declare some function in another header, define it in another cpp file, and call it with the final argument (v). Since calls to other compile units are blackboxes, that's often a very cheap way to force the compiler to keep the result. Just make sure LTO is disabled.

I mean, for me it's very unlikely it gets optimized out. If it were, there would not be a need to spend 10ms doing nothing. That does not make sense.

I can try another benchmark, but TBH, I don't really wanna spend a ton of time explaining why allocating a huge vector vs not allocating it makes a difference.

Ivorforce · 2025-11-13T14:13:37Z

I mean, for me it's very unlikely it gets optimized out. If it were, there would not be a need to spend 10ms doing nothing. That does not make sense.

I noticed that too.
To be honest, the fact that it did not pass in 0ms is a bit surprising to me. gcc and clang are very good at recognizing when a variable or iterator can be omitted from the build.
Did you run the benchmark in a dev build, or in an optimized build?

I can try another benchmark, but TBH, I don't really wanna spend a ton of time explaining why allocating a huge vector vs not allocating it makes a difference.

I agree, It's obvious that avoiding the allocation should help performance.

However, CPU optimization is not always very straight-forward. Changing something that should obviously be faster can be slower in actuality because of compiler or cpu architecture details. And even if something was optimized, it's possible that there is a bottleneck elsewhere, so the optimization does not have an effect.

Considering this complexity, weighing the potential benefit against potential regressions because of the logical change, is why we expect proofs of optimization for performance PRs.

groud · 2025-11-13T14:19:08Z

Alright, made the benchmarks anyway with this:

	{
		Vector2i a;
		uint64_t old_time = OS::get_singleton()->get_ticks_usec();
		Vector<Vector2i> points = Geometry2D::bresenham_line(Vector2i(), Vector2i(1000000,5000));
		for (const Vector2i &v : points) {
			a += v; // Do something
		}
		uint64_t new_time = OS::get_singleton()->get_ticks_usec();
		print_line("Before", new_time - old_time);
		print_line(a);
	}

vs

	{
		Vector2i a;
		uint64_t old_time = OS::get_singleton()->get_ticks_usec();
		Geometry2D::Bresenham points = Geometry2D::Bresenham(Vector2i(), Vector2i(1000000,5000));
		for (const Vector2i &v : points) {
			a += v; // Do something
		}
		uint64_t new_time = OS::get_singleton()->get_ticks_usec();
		print_line("After", new_time - old_time);
		print_line(a);
	}

Results:

Before: ~26ms
After: ~11ms

So well, similar as before.

Did you run the benchmark in a dev build, or in an optimized build?

I ran them from the editor, with `dev_build=yes".

Considering this complexity, weighing the potential benefit against potential regressions because of the logical change, is why we expect proofs of optimization for performance PRs.

I mean, I understand that, but it's one of the few things we have actual tests for in test_geometry_2d.h. I think we're pretty safe on the regression parts (unless there are some situations not covered by them. Though well, it's a 2 args functions, there are not that many cases to cover I guess)

AThousandShips · 2025-11-13T14:20:26Z

The tests might not cover all edge cases so at least would need a functional validation that the change doesn't affect function in edge cases

Ivorforce · 2025-11-13T14:24:32Z

Thanks for running the test again! I think using addition (along with printing a) may work to 'trick' the optimizer.

I ran them from the editor, with `dev_build=yes".

That means that the optimizer is disabled. We should probably add this to the guidelines, since it's a common mistake to make with benchmarks.
I hate to ask this, but please run the benchmark again with a non-dev build (debug or release; release is preferred).

AThousandShips · 2025-11-13T14:41:46Z

I think it'd also be worth looking at if there's other ways the algorithm could be improved, and if there are any edge cases the tests might not handle correctly, and if other options for optimization might be applicable, it might be possible to estimate the number of points for example and reserve a reasonable amount of storage ahead of time etc., it should be possible to estimate it mathematically from the precision and step

groud · 2025-11-13T14:52:20Z

I think it'd also be worth looking at if there's other ways the algorithm could be improved, and if there are any edge cases the tests might not handle correctly, and if other options for optimization might be applicable, it might be possible to estimate the number of points for example and reserve a reasonable amount of storage ahead of time etc.

I mean, at some point you'll have to trust me on that. The Bresenham algorithm is a very simple and straightforward to implement algorithm. There's not a lot of edge-cases besides vertical/horizontal lines and single-point lines. I've tested the algorithm locally with many line drawn (it works flawlessly), and the unit tests pass too. I am positive nothing will be more efficient to improve the algorithm than avoiding an unnecessary allocation here (the algorithm was if fact designed to do no allocation. It's from 1962, when computers had really limited memory available).

I don't know what to tell you more than that. It's a fix to an implementation I already knew was suboptimal when I implemented it, I've just had the opportunity to fix it today.

AThousandShips · 2025-11-13T14:56:01Z

Then the benchmarks should show it, including running profiling on the relevant methods affected by this, it's not no work at all but it's not a lot to just confirm this and confirm that the other considerations and potential other performance aspects are not a problem, I get it but I've seen plenty of very obvious optimizations go away when running release builds and encountering the real world

groud · 2025-11-13T15:00:19Z

Anyway, reran the benchmark in release. In fact, it's more like a x20 improvement:

Before: ~9ms
After: ~0.6ms

Ivorforce · 2025-11-13T15:05:48Z

That sounds pretty realistic! There's of course still a possibility that some of the implementation was optimized away due to the constants used in your benchmark, but for me this is along the lines of what I would expect from this change. So I think this suffices as a proof of optimization.
Thanks again!

groud · 2025-11-13T15:13:48Z

Note that, for now, the optimization is not exposed to users, as bresenham_line still pushes everything to a vector anyway. It should be doable to expose though, as we could expose the Bresenham class as a custom iterator. It's not a common pattern in the API though.

I can do it in another PR if we feel the performance improvement is worth.

groud · 2025-11-13T16:57:41Z

As discussed, I've pushed a change to improve the readability by adding a bresenham variable where it made sense.

akien-mga · 2025-11-13T18:51:18Z

See #105292 which was just merged and might benefit from the same change.

groud · 2025-11-14T10:28:50Z

I'm having a second thought about the implementation. While I think the for( : ) syntax is nice, I kind of hate the fact we have to define two classes for it. I am thinking that, maybe, we can get a simpler implementation with only one simpler class.

Maybe something that would need to be used that way though:

for (Bresenham b = Bresenham(1000,500); b.is_end(); b.next()) { 
    Vector2i point = b.value(); 
 }

What do you think?

Ivorforce · 2025-11-14T10:34:04Z

I'm having a second thought about the implementation. While I think the for( : ) syntax is nice, I kind of hate the fact we have to define two classes for it. I am thinking that, maybe, we can get a simpler implementation with only one simpler class.

Maybe something that would need to be used that way though:
for (Bresenham b = Bresenham(1000,500); b.is_end(); b.next()) { 
    Vector2i point = b.value(); 
 }
What do you think?

If you want to avoid using two classes, I would prefer the following:

Iterable<BresenhamIterator> bresenham_iter(Vector2i p_from, Vector2i p_to) {
    return Iterable<BresenhamIterator>(BresenhamIterator(p_from, p_to), BresenhamIterator());
}

I introduced Iterable basically for the purpose of "I have two iterator instances, and I want to use C++ syntax to iterate across them". Which would be your use-case here, I think.

groud · 2025-11-14T10:36:22Z

I introduced Iterable basically for the purpose of "I have two iterator instances, and I want to use C++ syntax to iterate across them". Which would be your use-case here, I think.

Oooh, I has no clue we had that. Yeah that seems like a good plan, I'll have a look.

Ivorforce · 2025-11-14T10:38:15Z

Actually, I say that with the expectation of continuing to use the C++ iterator syntax.
C++ iteration syntax is actually really weird that it requires a begin and end. This doesn't make much sense for bresenham, so maybe your proposed solution would be better (fewer wasted variables).
I wanted to look into ways to avoid this problem for some time, but haven't gotten around to it yet.

groud · 2025-11-14T10:50:39Z

Actually, I say that with the expectation of continuing to use the C++ iterator syntax. C++ iteration syntax is actually really weird that it requires a begin and end. This doesn't make much sense for bresenham, so maybe your proposed solution would be better (fewer wasted variables).

Yeah I do agree. I think an additional variable is fine, the main problem IMO is that the two nested classes make the code quite hard to read, and it's a bit too much additional code to avoid doing an allocation. So if I can at least shrink it a bit that would be nice.

groud · 2025-11-14T11:29:45Z

Alright, updated the code, we went from 59 added LoC to 35, and we have a single class now. I think it looks better.

editor/import/editor_atlas_packer.cpp

groud added this to the 4.x milestone Nov 13, 2025

groud requested review from a team as code owners November 13, 2025 13:00

groud added the topic:core label Nov 13, 2025

groud requested a review from a team as a code owner November 13, 2025 13:00

groud added topic:editor topic:2d performance labels Nov 13, 2025

AThousandShips reviewed Nov 13, 2025

View reviewed changes

editor/import/editor_atlas_packer.cpp Outdated Show resolved Hide resolved

AThousandShips added the enhancement label Nov 13, 2025

groud force-pushed the optimize_bresenham branch from 4b244bc to ff55c33 Compare November 13, 2025 16:54

groud force-pushed the optimize_bresenham branch from ff55c33 to 480398f Compare November 14, 2025 11:28

akien-mga reviewed Nov 14, 2025

View reviewed changes

editor/import/editor_atlas_packer.cpp Outdated Show resolved Hide resolved

Optimize the bresenham algorithm to avoid an unneeded vector allocation

0da7c5f

groud force-pushed the optimize_bresenham branch from 480398f to 0da7c5f Compare November 14, 2025 12:44

Uh oh!

Conversation

groud commented Nov 13, 2025

Uh oh!

Uh oh!

groud commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AThousandShips commented Nov 13, 2025

Uh oh!

Ivorforce commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AThousandShips commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groud commented Nov 13, 2025

Uh oh!

Ivorforce commented Nov 13, 2025

Uh oh!

groud commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AThousandShips commented Nov 13, 2025

Uh oh!

Ivorforce commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AThousandShips commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groud commented Nov 13, 2025

Uh oh!

AThousandShips commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groud commented Nov 13, 2025

Uh oh!

Ivorforce commented Nov 13, 2025

Uh oh!

groud commented Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groud commented Nov 13, 2025

Uh oh!

akien-mga commented Nov 13, 2025

Uh oh!

groud commented Nov 14, 2025

Uh oh!

Ivorforce commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groud commented Nov 14, 2025

Uh oh!

Ivorforce commented Nov 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

groud commented Nov 14, 2025

Uh oh!

groud commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

groud commented Nov 13, 2025 •

edited

Loading

Ivorforce commented Nov 13, 2025 •

edited

Loading

AThousandShips commented Nov 13, 2025 •

edited

Loading

groud commented Nov 13, 2025 •

edited

Loading

Ivorforce commented Nov 13, 2025 •

edited

Loading

AThousandShips commented Nov 13, 2025 •

edited

Loading

AThousandShips commented Nov 13, 2025 •

edited

Loading

groud commented Nov 13, 2025 •

edited

Loading

Ivorforce commented Nov 14, 2025 •

edited

Loading

Ivorforce commented Nov 14, 2025 •

edited

Loading