New inflation layer with optional OpenMP acceleration by tonynajjar · Pull Request #51 · botsandus/navigation2

tonynajjar · 2026-02-04T16:33:23Z

Benchmark Comparison Summary

Test Environments

Dev Machine (ubuntu@dexory)

CPU: 16 cores × 5400 MHz
OS: Ubuntu 24.04.2 LTS
L1 Data Cache: 48 KiB × 16 = 768 KiB
L1 Instruction Cache: 64 KiB × 16 = 1024 KiB
L2 Cache: 3072 KiB × 16 = 48 MB (unified)
L3 Cache: 24576 KiB = 24 MB (shared)
Total Cache: ~74 MB

Robot (arri-74)

CPU: 16 cores × 5000 MHz
L1 Data Cache: 48 KiB × 8 = 384 KiB
L1 Instruction Cache: 32 KiB × 8 = 256 KiB
L2 Cache: 1280 KiB × 8 = 10 MB (unified)
L3 Cache: 18432 KiB = 18 MB (shared)
Total Cache: ~28.6 MB

Performance Comparison (Key Benchmarks)

1000×1000 Grid (1M cells, 50% occupancy, 2m inflation radius)

Configuration	Dev Time	Dev Throughput	Robot Time	Robot Throughput	vs Old Dev	vs Old Robot
Old Implementation	24.1 ms	41.4 M cells/s	28.7 ms	34.9 M cells/s	baseline	baseline
New (OpenMP disabled)	6.89 ms	145.2 M cells/s	11.0 ms	91.1 M cells/s	3.5× faster	2.6× faster
New (OpenMP enabled)	2.50 ms	707.3 M cells/s	2.35 ms	482.5 M cells/s	9.6× faster	12.2× faster

2000×2000 Grid (4M cells, 50% occupancy, 2m inflation radius)

Configuration	Dev Time	Dev Throughput	Robot Time	Robot Throughput	vs Old Dev	vs Old Robot
Old Implementation	91.5 ms	43.7 M cells/s	105 ms	38.1 M cells/s	baseline	baseline
New (OpenMP disabled)	30.6 ms	130.6 M cells/s	48.9 ms	81.8 M cells/s	3.0× faster	2.1× faster
New (OpenMP enabled)	6.64 ms	893.5 M cells/s	9.11 ms	468.9 M cells/s	13.8× faster	11.5× faster

3333×3333 Grid (11.1M cells, 50% occupancy, 2m inflation radius)

Configuration	Dev Time	Dev Throughput	Robot Time	Robot Throughput	vs Old Dev	vs Old Robot
Old Implementation	311 ms	35.7 M cells/s	357 ms	31.2 M cells/s	baseline	baseline
New (OpenMP disabled)	115 ms	96.8 M cells/s	182 ms	61.2 M cells/s	2.7× faster	2.0× faster
New (OpenMP enabled)	20.3 ms	697.6 M cells/s	29.9 ms	395.1 M cells/s	15.3× faster	11.9× faster

4000×4000 Grid (16M cells, 30% occupancy, 1m inflation radius)

Configuration	Dev Time	Dev Throughput	Robot Time	Robot Throughput	vs Old Dev	vs Old Robot
Old Implementation	261 ms	61.2 M cells/s	302 ms	53.1 M cells/s	baseline	baseline
New (OpenMP disabled)	176 ms	90.9 M cells/s	268 ms	59.7 M cells/s	1.5× faster	1.1× faster
New (OpenMP enabled)	28.4 ms	660.7 M cells/s	46.3 ms	364.1 M cells/s	9.2× faster	6.5× faster

Key Findings

1. New Implementation Impact (OpenMP disabled)

Dev machine: 2.7-3.5× faster than old implementation
Robot: 1.1-2.6× faster than old implementation
Performance scales better on more powerful dev machine

2. OpenMP Parallelization Impact

Dev machine: 2.5-4.8× additional speedup over single-threaded new implementation
Robot: 1.8-5.6× additional speedup over single-threaded new implementation
Combined with new implementation: 6.5-15.3× faster than old code

3. Grid Size Scaling

Old implementation shows poor scaling with grid size (35-43 M cells/s)
New implementation (OpenMP disabled) maintains 90-145 M cells/s
New implementation (OpenMP enabled) maintains 395-893 M cells/s on dev, 364-483 M cells/s on robot

4. Occupancy Impact (1500×1500 tests)

All implementations show relatively consistent performance across 10%, 30%, 50%, 80% occupancy
New implementation handles varying occupancy much more efficiently

5. Inflation Radius Impact

Old implementation: significant slowdown with larger radii (41→35→31 M cells/s)
New implementation: minimal impact from radius variation

Detailed Results by Parameter

Varying Occupancy (1500×1500 grid, 2m inflation)

Occupancy	Old Dev	New OpenMP Off Dev	New OpenMP On Dev	Old Robot	New OpenMP Off Robot	New OpenMP On Robot
10%	8.16 ms (275.8 M/s)	14.7 ms (152.8 M/s)	4.57 ms (816.7 M/s)	11.9 ms (189.7 M/s)	24.0 ms (93.9 M/s)	5.50 ms (445.2 M/s)
30%	28.5 ms (79.1 M/s)	15.3 ms (147.3 M/s)	4.65 ms (763.5 M/s)	36.1 ms (62.3 M/s)	24.2 ms (93.0 M/s)	5.05 ms (483.0 M/s)
50%	67.5 ms (33.4 M/s)	16.1 ms (139.5 M/s)	5.07 ms (763.0 M/s)	80.7 ms (27.9 M/s)	24.8 ms (90.9 M/s)	5.76 ms (466.2 M/s)
80%	75.6 ms (29.8 M/s)	15.6 ms (144.7 M/s)	4.96 ms (784.4 M/s)	89.1 ms (25.3 M/s)	23.4 ms (96.1 M/s)	5.48 ms (458.2 M/s)

Key Observation: Old implementation degrades significantly with higher occupancy (8→75 ms on dev), while new implementation remains stable (14-16 ms without OpenMP, 4-5 ms with OpenMP).

Varying Inflation Radius (1000×1000 grid, 50% occupancy)

Radius	Old Dev	New OpenMP Off Dev	New OpenMP On Dev	Old Robot	New OpenMP Off Robot	New OpenMP On Robot
0.5m	11.2 ms (89.5 M/s)	6.78 ms (147.5 M/s)	2.45 ms (716.4 M/s)	14.8 ms (67.6 M/s)	11.1 ms (90.5 M/s)	2.55 ms (484.8 M/s)
1.0m	12.7 ms (78.9 M/s)	6.88 ms (145.5 M/s)	2.44 ms (717.3 M/s)	17.0 ms (58.9 M/s)	10.9 ms (91.6 M/s)	2.23 ms (518.3 M/s)
2.0m	14.6 ms (68.7 M/s)	6.84 ms (146.2 M/s)	2.56 ms (677.9 M/s)	20.1 ms (49.8 M/s)	11.1 ms (90.4 M/s)	2.21 ms (508.3 M/s)
3.0m	15.5 ms (64.6 M/s)	6.92 ms (144.5 M/s)	2.46 ms (710.3 M/s)	21.6 ms (46.4 M/s)	11.6 ms (85.9 M/s)	2.47 ms (474.5 M/s)
5.0m	16.9 ms (60.0 M/s)	6.96 ms (143.7 M/s)	2.67 ms (666.2 M/s)	23.3 ms (43.0 M/s)	11.2 ms (89.2 M/s)	2.49 ms (458.6 M/s)
10.0m	17.6 ms (56.7 M/s)	6.95 ms (143.8 M/s)	2.65 ms (657.8 M/s)	25.5 ms (39.3 M/s)	11.4 ms (88.1 M/s)	2.37 ms (472.6 M/s)

Key Observation: Old implementation shows 36% slowdown from smallest to largest radius (11.2→17.6 ms on dev). New implementation shows minimal variation (<3% difference).

Varying Cost Scale (1000×1000 grid, 50% occupancy, 2m radius)

Cost Scale	Old Dev	New OpenMP Off Dev	New OpenMP On Dev	Old Robot	New OpenMP Off Robot	New OpenMP On Robot
1.0	14.3 ms (70.1 M/s)	6.89 ms (145.1 M/s)	2.77 ms (651.6 M/s)	20.1 ms (49.8 M/s)	11.1 ms (89.9 M/s)	2.19 ms (503.2 M/s)
3.0	14.4 ms (69.5 M/s)	6.90 ms (144.9 M/s)	2.63 ms (686.5 M/s)	20.0 ms (50.1 M/s)	11.1 ms (90.1 M/s)	2.30 ms (492.1 M/s)
5.0	14.3 ms (70.2 M/s)	6.89 ms (145.2 M/s)	2.82 ms (655.3 M/s)	20.0 ms (50.1 M/s)	11.0 ms (90.8 M/s)	2.24 ms (496.9 M/s)
10.0	14.4 ms (69.6 M/s)	6.91 ms (144.7 M/s)	2.65 ms (660.0 M/s)	19.9 ms (50.3 M/s)	11.0 ms (90.6 M/s)	2.23 ms (523.8 M/s)

Key Observation: Cost scale factor has negligible impact on performance across all implementations.

Recommendations

✅ Use new implementation with OpenMP enabled - Provides 6.5-15.3× speedup

✅ Even without OpenMP, new implementation is 1.1-3.5× faster

✅ Performance is more predictable and scales better with grid size

✅ Robot shows excellent speedup despite lower CPU frequency

✅ New implementation handles varying occupancy and inflation radii efficiently

Performance Summary Chart

Speedup Factor (vs Old Implementation)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Dev Machine (1000×1000):
Old:     ████ 1.0×
New-Off: █████████████ 3.5×
New-On:  ██████████████████████████████████████ 9.6×

Dev Machine (3333×3333):
Old:     ████ 1.0×
New-Off: ███████████ 2.7×
New-On:  ███████████████████████████████████████████████████████████ 15.3×

Robot (1000×1000):
Old:     ████ 1.0×
New-Off: ██████████ 2.6×
New-On:  ████████████████████████████████████████████████ 12.2×

Robot (3333×3333):
Old:     ████ 1.0×
New-Off: ████████ 2.0×
New-On:  ███████████████████████████████████████████████ 11.9×

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

tonynajjar · 2026-02-06T11:08:47Z

ros-navigation#5933

tonynajjar changed the title ~~Openmp inflation~~ OpenMP inflation layer Feb 4, 2026

implementation

c3e0f67

Signed-off-by: Tony Najjar <tony.najjar@dexory.com>

tonynajjar force-pushed the openmp-inflation branch from 4ce992e to c3e0f67 Compare February 5, 2026 13:47

tonynajjar changed the title ~~OpenMP inflation layer~~ New inflation layer with optional OpenMP acceleration Feb 5, 2026

tonynajjar closed this Feb 6, 2026

tonynajjar deleted the openmp-inflation branch March 10, 2026 14:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New inflation layer with optional OpenMP acceleration#51

New inflation layer with optional OpenMP acceleration#51
tonynajjar wants to merge 1 commit intomain_dexoryfrom
openmp-inflation

tonynajjar commented Feb 4, 2026 •

edited

Loading

Uh oh!

tonynajjar commented Feb 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

tonynajjar commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Comparison Summary

Test Environments

Dev Machine (ubuntu@dexory)

Robot (arri-74)

Performance Comparison (Key Benchmarks)

1000×1000 Grid (1M cells, 50% occupancy, 2m inflation radius)

2000×2000 Grid (4M cells, 50% occupancy, 2m inflation radius)

3333×3333 Grid (11.1M cells, 50% occupancy, 2m inflation radius)

4000×4000 Grid (16M cells, 30% occupancy, 1m inflation radius)

Key Findings

1. New Implementation Impact (OpenMP disabled)

2. OpenMP Parallelization Impact

3. Grid Size Scaling

4. Occupancy Impact (1500×1500 tests)

5. Inflation Radius Impact

Detailed Results by Parameter

Varying Occupancy (1500×1500 grid, 2m inflation)

Varying Inflation Radius (1000×1000 grid, 50% occupancy)

Varying Cost Scale (1000×1000 grid, 50% occupancy, 2m radius)

Recommendations

Performance Summary Chart

Uh oh!

tonynajjar commented Feb 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

tonynajjar commented Feb 4, 2026 •

edited

Loading

tonynajjar commented Feb 6, 2026 •

edited

Loading