render: Improve hairline strokes and scaling strokes on WebGL and WGPU#23011
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request significantly improves rendering quality of hairline and scaled strokes in Ruffle's WebGL and WGPU backends by implementing scale-aware tessellation. The implementation adds an LRU tessellation cache that stores up to 4 different tessellations per graphic at different scales, retessellating only when shapes grow or shrink by more than 2x. This approach addresses numerous long-standing rendering issues where strokes appeared too thick or too thin when graphics were scaled.
Changes:
- Introduces
TessellationCachewith LRU eviction to cache tessellated shapes at different scales - Adds
register_shape_with_scale()method to render backends to support scale-aware tessellation - Modifies tessellator to adjust hairline stroke width and tessellation tolerance based on scale
- Updates
Graphicdisplay objects to calculate current scale and retrieve or create appropriately scaled tessellations
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| core/src/tessellation_cache.rs | New LRU cache for storing up to 4 tessellated shapes per graphic at different scales |
| core/src/lib.rs | Adds tessellation_cache module to the core library |
| core/src/display_object/graphic.rs | Integrates tessellation cache; calculates scale from transform matrix and retrieves/creates scaled tessellations |
| render/src/backend.rs | Adds register_shape_with_scale() trait method with default implementation |
| render/wgpu/src/backend.rs | Implements scale-aware shape registration for WGPU backend |
| render/webgl/src/lib.rs | Implements scale-aware shape registration for WebGL backend |
| render/src/tessellator.rs | Adjusts hairline stroke width and tessellation tolerance based on scale to prevent artifacts |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Tests that have improved:
Slightly different but visually indistinguishable (mostly due to precision increasing):
Broken:
|
|
@darktohka Just a general remark about visual tests: the tolerance/max_outliers are set so that tests pass on CI and on devs' machines. You should either:
If you make changes and edit tests to pass locally in the same PR, it will result in an unmergeable mess. If there are any changes to tests, we want them to be well documented, and well-thought-out. I'd recommend to stick to the 2nd option for now. It should be relatively easy—set 0 tolerance, push, download image diffs, set appropriate tolerance and outliers based on image diffs. At the end of the day, if your PR brings us closer to Flash Player, you shouldn't need to increase tolerance/outliers. If you do, it could mean a bad test that didn't have output from Flash Player. I can then take care of those tests and fix them before merging this PR. TL;DR: my recommendation is to revert changes to tolerance/outliers in tests and see what happens. |
|
Good point. I will remove the change to the tests from this PR and keep only the functional changes, let's see what happens. |
6d8a2af to
dd53f0b
Compare
|
I think those failures are caused by the fact that we are using Ruffle's and not FP's output. I'll take a look in my free time and I'll try fixing them up. |
|
Worth noting that something similar seems to be true for #22961. That PR caused a bunch of changes to images because lyon changed a little bit about its rendering methods and most of those tests were from Ruffle, not FP. I fixed that up by downloading the images from CI, but ideally we'd replace those tests with FP images and then see whether the lyon update improves their consistency with Flash. |
|
Made 2 PRs which fix tests failing here:
Hopefully after merging them and rebasing this PR, they should stop failing, and they should even improve a bit (but don't worry about that, we can lower tolerance later). After those, there are few known failures failing, I'll try looking into them, but I think we're just closer to Flash Player and that's why they are failing. |
|
@darktohka Can you rebase the PR on top of main? The majority of tests should stop failing. |
dd53f0b to
fee6a8e
Compare
|
Only one non-known-failure test is failing: |
|
This should fix the missing ninja body and barely visible outlines of coins and mines in the N, right? swf: N.zip |
It fixes the barely visible outlines and improves on the mines, but does not improve the player character:
Strangely enough, JPEXS doesn't like the player character either: |
|
This PR Supersedes and Close #9981 ? Would you test this #9981 (comment) |
|
@darktohka Sorry for the delay, I should have taken care of it earlier... Code looks great! I would have implemented it roughly the same way. The only remaining thing is tests, which I will take care of and push changes here. |
c763caa to
1f57a7d
Compare
- Add tessellation cache for storing previous tessellation results - Base tessellation tolerance and width based on scale - Retessellate objects if their scale changes by 2x This pull request improves both hairline strokes and scaling strokes on the Web (WGPU, WebGL renderers) and Desktop (WGPU renderer) targets. The main idea is to keep track of the scale of the graphics that are being tessellated on the rendering backends. The tessellated shapes are then stored in a tessellation cache, which is a simple LRU cache that keeps track of the most frequently tessellated shapes (4 max per shared graphic). This means that the last 4 uniquely used tessellated scale buckets will be left cached. Shapes will only be retessellated if they grow or shrink by 2x relative to a cached variant (controlled by RETESSELLATION_SCALE_THRESHOLD). When a shape grows disproportionately, it is re-tessellated. The re-tessellation precision (threshold) is specified by the scale. The larger the scale, the more precise the tessellation will be: small objects are expected to have less detail either way. Tessellation cache is reused between graphic instances that use the same graphic as an optimization. Hairline stroke rendering is also improved.
1f57a7d to
c71e7a8
Compare
|
So I tested out the patch on a lot of content. Mostly it doesn't seem to negatively impact performance, sometimes it even affects performance positively—I guess it's because we're not only increasing detail for large scales, but also decreasing detail for small scales. It still doesn't fix the scaling 100%, it looks like Flash uses both x and y scales, and not a combined scale. Hairline strokes also seem off in some cases. However, this PR improves strokes in the majority of cases, and architecturally we're going in the right direction: retessellation cache is the right solution IMO. There are small things that could be improved with the code, but it doesn't make sense to block on them, they can be fixed as a follow-up by somebody. |
kjarosh
left a comment
There was a problem hiding this comment.
LGTM, thank you! That improves rendering by a lot and fixes one of the most annoying issues in Ruffle.
------------------------------------------------------------------------------------------ dolphin-emu.mk b0eb643c614ddeda6400dc4033d58934a20ba5eb # Version: Commits on May 05, 2026 ------------------------------------------------------------------------------------------ Merge pull request #14642 from SuperSamus/cpp-move-fixup-nocubeb Fixup #14565 (compilation with `-DENABLE_CUBEB=OFF`), ----------------------------------------------------------------------------------- eden.mk 4f4c298a39fee558f2a593157192afe7f821014c # Version: Commits on May 05, 2026 ----------------------------------------------------------------------------------- [hle, service] fix errors related to race conditions triggering under SMG1 and SMG2 (#3927) ----------------------------------------------------------------------------------------------- lindbergh-loader.mk 0af606d845b70339c335785c0eba68b47b78df3c # Version: Commits on May 05, 2026 ----------------------------------------------------------------------------------------------- Update Patreon link in README.md, -------------------------------------------------------------------------------------- openmsx.mk 22ec19b72a717446a18364fecda8e8132e0e0880 # Version: Commits on May 05, 2026 -------------------------------------------------------------------------------------- Update Node.js 20 actions to Node.js 24 versions., ----------------------------------------------------------------------------------- play.mk c9eccec03d1ee6840a3b818153df7fea7a6c142c # Version: Commits on Apr 16, 2026 ----------------------------------------------------------------------------------- FrameDebugger: Set initial file picker directory., ------------------------------------------------------------------------------------- ppsspp.mk 462b57bc1a21417b097acd06711935bdc9334c43 # Version: Commits on May 05, 2026 ------------------------------------------------------------------------------------- Merge pull request #21642 from hrydgard/dinput-code-cleanup UWP keyboard fix, DInput code cleanup, ------------------------------------------------------------------------------------ rpcs3.mk d93d9b2c5aa859d1cf2f1381cefd204fb022163a # Version: Commits on May 05, 2026 ------------------------------------------------------------------------------------ game_list: Fix ISO cache bypass in is_from_yml branch for multi-game ISOs (#18683) Fixes regression from #18546 and #18679. ## Problem The is_from_yml ISO branch constructed iso_archive unconditionally, bypassing the cache check inside add_game, making the cache write-only for yml-sourced ISOs. ## Fix Added a lightweight index cache entry (iso_path + \//index\) storing the subdir list + mtime. On hit, skips archive construction entirely. On miss, walks as before and writes the index, ----------------------------------------------------- ryujinx.mk 1.3.287 # Version: Commits on May 05, 2026 ----------------------------------------------------- 1.3.287 -------------------------------------------------------------------------------------- shadps4.mk 4d3827c34949d034cc47e86c943b7fd9318c48ae # Version: Commits on May 05, 2026 -------------------------------------------------------------------------------------- Avoid out-of-bounds array access when checking custom color for TV Remote (#4356), --------------------------------------------------------------------------------------- touchhle.mk f886c577758f596b2a77ed599a9e1a3597540cb7 # Version: Commits on May 04, 2026 --------------------------------------------------------------------------------------- Remove edits to SDLActivity.java It seems that debug builds work fine without it? I'm not sure why it was breaking before... Change-Id: Ibaf1cdaf55a91bdb12c02d5d5ac423ba1d112194, ------------------------------------------------- vice.mk r46091 # Version: Commits on May 04, 2026 ------------------------------------------------- null ------------------------------------------------------------------------------------------- xenia-canary.mk 80f2b535e9736a9772de528952877e912c328aea # Version: Commits on Feb 15, 2026 ------------------------------------------------------------------------------------------- [Kernel] Added KeSaveFloatingPointState and KeRestoreFloatingPointState from nukernel, ----------------------------------------------------------------------------------------- xenia-edge.mk ba5fd0f4149a99e8665e989d53bbd2c6b9b7bc91 # Version: Commits on May 05, 2026 ----------------------------------------------------------------------------------------- [GPU/macOS] Tighten vblank and present pacing with mach_wait_until, ----------------------------------------------------------------------------------- ymir.mk 374c8be5c37eb3853a9f0fc2b1eb5c263c725fe2 # Version: Commits on May 05, 2026 ----------------------------------------------------------------------------------- chore: Update Patreon supporters list, --------------------------------------------------------------- ruffle.mk nightly-2026-05-05 # Version: Commits on May 05, 2026 --------------------------------------------------------------- ## What's Changed * ci: Add support for release version bumps other than nightly by @kjarosh in ruffle-rs/ruffle#23618 * chore: Bump esbuild version in package-lock.json by @torokati44 in ruffle-rs/ruffle#23616 * chore: Bump rollup package version in package-lock.json by @torokati44 in ruffle-rs/ruffle#23615 * chore: Bump webpack-cli to 7 in web/ by @torokati44 in ruffle-rs/ruffle#23613 * render: Improve hairline strokes and scaling strokes on WebGL and WGPU by @darktohka in ruffle-rs/ruffle#23011 ## New Contributors * @darktohka made their first contribution in ruffle-rs/ruffle#23011 **Full Changelog**: ruffle-rs/ruffle@nightly-2026-05-04...nightly-2026-05-05, ----------------------------------------------------------------------------------------- catacombgl.mk a18035bf899d6f3093b487725b3c6e3867365231 # Version: Commits on May 05, 2026 ----------------------------------------------------------------------------------------- Adapt Catacomb 3-D menu instructions for game controller, ------------------------------------------------------------------------------------ cdogs.mk 3483ad394587f205f467a0d819b435395145b879 # Version: Commits on May 05, 2026 ------------------------------------------------------------------------------------ Fix vehicle head drawing, ------------------------------------------------------------------------------------------ devilutionx.mk 3eb2b44e5a572c7ae1aaf8eaaa3856d188110d88 # Version: Commits on May 01, 2026 ------------------------------------------------------------------------------------------ Ensure that buffered player info gets processed, ------------------------------------------------------------------------------------------ fallout2-ce.mk e42d8021c1fddc51ede3216f89cc9cdc75e07dc5 # Version: Commits on May 05, 2026 ------------------------------------------------------------------------------------------ WIP Mapper implementation (#438) * Add mapper CMakeTarget, tool for mapping function names to originals, load/save toolbar & update_art implemented * edit_mapper function + stubs * Rename exe to mapper-ce * load_lbm_to_buf * Add comments for read/write functions in db.h * load_dialog, save_dialog, save_as, info_dialog and some other functions * Fix LBM loading * Fix mouse input not working on initial empty map, changed error in partyMemberRecoverLoadInstance to print to log, matching vanilla * mapper.cc: basic hi-res support, NULL->nullptr * load_lbm_to_buf rewrite, print_toolbar_name background fix * Stubs for enter/exit playmode, art slot indexes fix, map_scr_toggle_hexes * Fix memory corruption on screen_width > 640, fix various UI offset bugs * mapper.cc: UI code style, toggle button fixes, rotation keys, edit button placeholders, PAGEUP key fix * Elevation display fix, object type switching * Spatial script placement and display, basic object selection * Fixed dragging objects, block object showing, add all missing cases in edit_mapper with stubs, move all keys codes to constants * chore: auto-format with clang-format * Fix non-win builds * Add stub calls from edit_mapper, fix objects being incorrectly deleted when unselected, fix tile number display * Fix compile on Linux * Attempt to fix iOS signing error * Placing of objects and tiles, F12 to erase map, bug fixes * Fixed block object toggling logic and add missing switch cases to edit_mapper * Object editing added, 'p' to scroll palette fixed * Add new files to CMakeLists * Attempt to fix some colors + alignment in critter edit window * chore: auto-format with clang-format * Linux build fix attempt * Critter inventory editing * Vanilla grid-based inventory item picker * Review fixes * More review fixes and const correctness --------- Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>, ---------------------------------------------------------------------------------------- retroarch.mk 14a5cc00a050b3d253d42ae0afa284c4a6fb129f # Version: Commits on May 05, 2026 ---------------------------------------------------------------------------------------- Fix Dolphin autostate load hang by sleeping a bit first, ---------------------------------------------------------- bgfx.mk v1.143.9248-539 # Version: Commits on May 05, 2026 ---------------------------------------------------------- Fix cmake syntax error when compiling shaders in Debug mode, ------------------------------------------------------------ libdof.mk c02135e90ce1acd13a5ab21a4878b1d1820bbe49 NULL-NULL ------------------------------------------------------------ Moved Permanently, --------------------------------------------------------------------------------------- vpinball.mk 034f9408539c8bc39866305fdb9cd57721961816 # Version: Commits on May 04, 2026 --------------------------------------------------------------------------------------- BGFX: use camera relative rendering to support low precision platform (Meta Quest), ---------------------------------------------------- glslang.mk 16.3.0 # Version: Commits on May 01, 2026 ---------------------------------------------------- Deprecation Notice: * Deprecate the HLSL front-end. See issue #4210 for details. Changes in this release: * Support GL_NV_explicit_typecast * Raise the maximum limit for specialization constant IDs * Add explicit 8-bit and 16-bit type support for bitfieldReverse * Implement system include directives for the standalone wrapper * Check for invalid usage of gl_WorkGroupSize components * HLSL: Provide string error context only if token is a string * Fix layoutDescriptorStride bitfield truncation for large stride values * GL_EXT_long_vector with 2-4 components no longer require LongVector capability * Fix alignment of guard blocks * Fix ShaderDebugInfo having invalid line numbers when generating SPIRV 1.0 * Replace ostringstream with string concat during #include preprocessing * Check for bad parameters on long vector type * HLSL: Check for bad integer argument on Load*, Store*, Interlocked* * HLSL: handle type error for ternary operator * HLSL: Ensure scope is popped even when method body fails to parse * Avoid unneccessary copies in SpirvIntrinsics.cpp * Unconditionally emit debug source for include files when using non-semantic debug info * Support bfloat16 and float8 tensors * Add small type capabilities for GLSL.STD.450 * Add initial support for NonSemantic.Shader.DebugInfo 101 * Fix access chains for GL_ARM_tensors with raw descriptor heap accesses * Support GL_KHR_compute_shader_derivatives * Require a quad or linear layout qualifier to be specified for GL_KHR_compute_shader_derivatives * Supportx SPV_KHR_constant_data and SPV_KHR_abort ---------------------------------------------------------------------------------------- doomretro.mk 827c09d875a53f4a6ad6464d30448c51496ab6b9 # Version: Commits on May 05, 2026 ---------------------------------------------------------------------------------------- Update releasenotes.md, -------------------------------------------------------------------------------------- yquake2.mk f8939a0561ac992837ab006c144fd972d9cd1628 # Version: Commits on May 04, 2026 -------------------------------------------------------------------------------------- game: scale ammo on fire Scale exposion effect is unsupported by protocol., ------------------------------------------------------------------------------------------ xash3d-fwgs.mk e6a44b70e08c379fc6dc059ae7cfeca799fb7c58 # Version: Commits on May 04, 2026 ------------------------------------------------------------------------------------------ engine: client: always load client.dll last to crash on nullptr in mods that fetch cvar pointers early, add comment for anyone who would modify this file, -------------------------------------------------------------------------------------------------- libretro-beetle-psx.mk 882e55b8cb3a1b4c3b91d71a2c156a9b33f279b8 # Version: Commits on May 05, 2026 -------------------------------------------------------------------------------------------------- mednafen: drop clamp.h; fold + optimize audio saturation; fix Vulkan static-after-extern shadow Two changes that travel together because they touch the same audit pass. (1) clamp.h dropped, callers folded inline ========================================== clamp.h was a 29-line file with one 4-line static inline function (`clamp(int32_t *val, ssize_t min, ssize_t max)`) that saturated a value in place. 12 call sites across spu.c (7), cdc.cpp (4), and gte.c (1). All but one saturated to the signed 16-bit audio range [-32768, 32767]; the gte.c outlier saturates to [-32768 + lm * 32768, 32767] where lm is a bit from the GTE opcode. Folded inline at every call site, where each fold also gets a comment explaining what kind of saturation is happening (audio output sample, ADPCM IIR-filter intermediate, GTE projected coordinate, etc.). While auditing the call sites for the fold, three real optimisation opportunities surfaced: (a) cdc.cpp ApplyVolume short-circuit on Muted: Historical body computed L/R volume-matrix mix unconditionally, ran two clamps, then conditionally zeroed both channels if Muted was set. Muted is the resting state any time CD audio isn't actively playing - probably the majority of frames in many games. Reordered to test Muted first and bail with samples[]=0 in that case; mix and clamp only run when the result is going to be used. Saves 4 multiplies + 4 shifts + 2 adds + 4 saturating compares per sample on the muted path. Same final samples[] in both paths so behaviour is identical. (b) cdc.cpp GetCDAudio resampler eliminates out_tmp[2] stack scratch: The fractional-rate path used an int32 out_tmp[2] stack accumulator, accumulated each channel's 25-tap windowed- sinc convolution into it, clamped, then copied to samples[i]. Folded into a per-channel local int32 acc that accumulates and writes straight to samples[i] - same ops, one fewer stack temp. (c) spu.c per-sample mix loop eliminates output[2] stack scratch and tightens the IntermediateBuffer overflow guard: The mix loop computed per-LR `output[lr]` from accum[lr] and the global volume sweep, clamped, and on the next line wrote `(output[lr] * 3 + 2) >> 2` to IntermediateBuffer. output[] only existed to carry one int32 per channel between those two lines. Fused: the post-volume-sweep value is computed inside the IntermediateBuffer write expression directly, saving 8 bytes of stack and one round-trip per sample. As a side effect the IntermediateBufferPos overflow guard now covers the volume-sweep step too - previously only the buffer write was guarded and the sweep + clamp ran every sample even when the buffer was full (debugger edge case). SPU_Sweep_ReadVolume is pure (returns sweep->Current), so skipping it on the buffer-full path is behaviour- preserving. The two reverb resampler helpers (Reverb4422 / Reverb2244) collapse from `clamp(&out, ...); return out;` to a pair of inline ifs followed by `return out;`. Each is a simple collapse, no semantic change. The voice-decode clamp inside the SPU's ADPCM nibble loop is a straight inline-the-clamp; no opportunity for a structural optimisation there because the saturated value feeds into both tb[i] and the M1/M2 history (PS1 silicon clamps at int16 for its IIR filter state), so the temporary is genuinely needed. Per-TU text-section sizes at -O2 (size /tmp/X.o): before after delta spu.o 34846 34910 +64 gte.o 20055 20055 0 cdc.o 29443 29379 -64 ---- 0 net Same total binary size; the optimisations balance the slight structural growth from the IntermediateBuffer-guard rework. (2) rsx_lib_vulkan.cpp: rename file-static crop_overscan to avoid extern-vs-static shadow ====================================================== fc4d742 (\core: prune dead globals; consolidate cross-TU extern decls\) replaced rsx_lib_vulkan.cpp's local-extern redecls of cross-TU globals with a `#include \beetle_psx_globals.h\`. The header includes `extern int crop_overscan;`. Unfortunately the file had a `static int crop_overscan;` declaration at file scope from long before fc4d742 - a long-standing shadow of the global that nothing else in the TU referenced. g++ (correctly) refuses the resulting static-after-extern: rsx/rsx_lib_vulkan.cpp:55:12: error: 'crop_overscan' was declared 'extern' and later 'static' [-fpermissive] 55 | static int crop_overscan; | ^~~~~~~~~~~~~ Renamed the file-static to `vulkan_crop_overscan` plus its 8 internal use sites; the BEETLE_OPT(crop_overscan) macro key on line 360 stays as-is (it's the env-var name, not the variable name). Behaviour preserved bit-perfect: the file still reads the BEETLE_OPT(crop_overscan) env var into its own private copy and uses that locally, exactly as before. The cross-TU global crop_overscan from beetle_psx_globals.h is left for other TUs (libretro.cpp, gpu.cpp, input.cpp, rsx_intf.cpp, rsx_lib_gl.cpp) which have always read it directly. The two parallel-but-separate values track identically because libretro.cpp's check_variables() reads the same env var into the global at the same time rsx_lib_vulkan reads it into the static. Verification ============ - All 9 sampled CXX TUs (gpu.cpp, frontio.cpp, cdc.cpp, cpu.cpp, guncon.cpp, justifier.cpp, gamepad.cpp, general.cpp, mempatcher.cpp) compile clean at -O2. - All 10 sampled C TUs (dma.c, gte.c, timer.c, spu.c, sio.c, irq.c, mdec.c, error.c, mednafen-endian.c, Deinterlacer.c) compile clean at -O2. - rsx_lib_vulkan.cpp structural check passes - no static-vs-extern conflicts, no undeclared-symbol errors (the file still needs Vulkan SDK headers not on this sandbox to compile fully, but those errors are unrelated and identical before/after this change). - Direct grep confirms zero remaining `clamp(` calls outside GLSL shader code (`clamp(uint(coords.x), 0, 0xff)` in rsx/shaders_gl/command_fragment.glsl.h is GLSL's built-in, not C)., --------------------------------------------------------------------------------------------- libretro-fbneo.mk f7574b86e0eeece0e8c633b77dd9833840155dd9 # Version: Commits on May 05, 2026 --------------------------------------------------------------------------------------------- (libretro) update files, -------------------------------------------------------------------------------------------------- libretro-gearcoleco.mk c4ae7b25b35ab1060fa84cc5464dd899b43651d2 # Version: Commits on May 04, 2026 -------------------------------------------------------------------------------------------------- Update publish to mcp registry workflow, ------------------------------------------------------------------------------------------------- libretro-geargrafx.mk c4b8b8eab4427ebfe4a5f08af8b349ff3b4a21bc # Version: Commits on May 04, 2026 ------------------------------------------------------------------------------------------------- Update publish to mcp registry workflow, -------------------------------------------------------------------------------------------------- libretro-gearsystem.mk 4dedd026c1c861158e1f17b8616bdf11d7cd9ad2 # Version: Commits on May 04, 2026 -------------------------------------------------------------------------------------------------- Update publish to mcp registry workflow, --------------------------------------------------------------------------------------------- libretro-noods.mk 626628ca270e41528c20ebbedb69408eca326834 # Version: Commits on May 05, 2026 --------------------------------------------------------------------------------------------- Libretro: fix saves on non unix platforms, ---------------------------------------------------------------------------------------------- libretro-ppsspp.mk 462b57bc1a21417b097acd06711935bdc9334c43 # Version: Commits on May 05, 2026 ---------------------------------------------------------------------------------------------- Merge pull request #21642 from hrydgard/dinput-code-cleanup UWP keyboard fix, DInput code cleanup, ------------------------------------------------------------------------------------------- libretro-ps2.mk 0f2c9a7c615357e6d82a4520e502f94ff27ca77b # Version: Commits on May 05, 2026 ------------------------------------------------------------------------------------------- Buildfixes: restore __forceinline on non-mingw toolchains The d2d1ebc / fdb0eec / c9d5ee4 series stubbed __fi / __ri / __releaseinline (and removed __forceinline from a few SPU2 hot-path functions) to make the libretro Makefile build link under mingw. That was correct for the failing target, but it was applied universally and silently disabled cross-TU inlining on every working toolchain too - MSVC, Linux gcc, macOS clang. The hot paths that lost their always- inline (SPU2 Mix / TimeUpdate / spu2M_Write / UpdateSpdifMode and everything reached through __fi / __ri elsewhere in the codebase) are all on the audio mix and EE/IOP-recompiler-adjacent paths where the inlining is the point of the decoration. The actual breakage is mingw-only. mingw-w64's _mingw.h defines __forceinline as `extern __inline__ __attribute__((__always_inline__, __gnu_inline__))`, which under GNU inline rules means \inline at every callsite AND DO NOT emit an out-of-line copy\. In a non-LTO build that turns every cross-TU caller of a __forceinline-decorated free function (dmaSIF1, vtlb_GetPhyPtr, x86Emitter::xPUSH, the four SPU2 ones above, ...) into an undefined reference. cmake builds avoid this because PCSX2_LTO=ON merges all TUs at link time; the libretro Makefile builds do not LTO. MSVC's __forceinline always emits an out-of-line copy, and Linux/macOS gcc/clang's __attribute__((always_inline, unused)) also emits one. On those toolchains the historical decoration is correct. So we keep the historical __forceinline definition and the historical __fi / __ri / __releaseinline = __forceinline mapping for everyone, and special-case __MINGW32__ to bind __fi / __ri / __releaseinline to empty. __forceinline itself stays untouched on mingw - the system headers (winbase.h, processthreadsapi.h, synchapi.h, _mingw.h) declare strnlen_s / _InterlockedIncrement / NtCurrentTeb / etc as __forceinline and rely on gnu_inline semantics for ODR. Verified by preprocessing common/Pcsx2Defs.h on both compilers: Linux gcc -DNDEBUG: __fi -> __attribute__((always_inline, unused)) mingw-w64 gcc : __fi -> empty, __forceinline left alone Verified by running nm against fresh .o files compiled with both compilers in NDEBUG mode: Linux: spu2M_Write / TimeUpdate / UpdateSpdifMode / Mix all emit out-of-line T symbols (cross-TU linkable). mingw: same four symbols emit T (cross-TU linkable, link will succeed for the libretro Makefile build). Also restored the __forceinline that was dropped from SPU2 Mixer.cpp's Mix() and from spu2sys.cpp's three __forceinline functions, but spelt as __fi instead of __forceinline directly so the mingw-stub path applies cleanly. Net effect on the Windows MSVC, Linux, macOS, and cmake builds: code emission goes back to whatever it was before d2d1ebc (perf restored). Net effect on the libretro Makefile mingw build: identical to ab74e3d (still links, still runs as far as it currently does)., --------------------------------------------------------------------------------------------------- libretro-snes9x-next.mk d9cba8a41b3407ebb929816a7033e0407fd7b2d0 # Version: Commits on May 05, 2026 --------------------------------------------------------------------------------------------------- tile.c: hoist invariant RealScreenColors assignment out of backdrop renderers The 28 DrawBackdrop16* renderers each began with GFX.RealScreenColors = IPPU.ScreenColors; GFX.ScreenColors = GFX.ClipColors ? BlackColourMap : GFX.RealScreenColors; The first line is invariant across the whole backdrop pass: backdrop has no per-tile palette slice (unlike SELECT_PALETTE for regular tiles) and no Direct Colour Mode override (unlike Mode 7 entry points), so it always sets RealScreenColors to IPPU.ScreenColors. Lift that line out of every renderer body into the DrawBackdrop() and DRAW_BACKDROP_NO_MATH() macros in ppu.c, set once before the per-clip-region loop. The second line stays inside each renderer (BlackColourMap is private to tile.c) and is genuinely per-clip-region (ClipColors changes each iteration of the macro's loop). Saves N-1 redundant assignments per backdrop pass where N is the number of clip regions; perf-negligible. Net -19 lines. src/ppu.c +9 src/tile.c -28, ---------------------------------------------------------------------------------------------- libretro-stella.mk 93a070e927573584bb3059028a5514ec22f2b0ce # Version: Commits on May 05, 2026 ---------------------------------------------------------------------------------------------- More ostringstream cleanups., --------------------------------------------------------------------------------------------- libretro-vba-m.mk 26fe5b40ca10931bf5e4bfde671a85625247e1a4 # Version: Commits on May 05, 2026 --------------------------------------------------------------------------------------------- ci: disable SDL3 PPA on Ubuntu runners for now Disable getting the SDL3 backport from a PPA on the Ubuntu CI runners for now due to issues with launchpad. Signed-off-by: Rafael Kitover <rkitover@gmail.com>, ------------------------------------------------------------------------------------------- glsl-shaders.mk 42fa8a98ab19bdaffb53280746a30819eb21f807 # Version: Commits on May 05, 2026 ------------------------------------------------------------------------------------------- crt-geom-mini; optimize to be closer to crt-geom, tiny-ntsc add saturation parameter (#562) * Update crt-geom-mini.glsl * Update tiny_ntsc.glsl * Update crt-geom-mini.glslp, -------------------------------------------------------------------------------------------- slang-shaders.mk 2ba50bfaeae630741216a9b60b5147485657316f # Version: Commits on May 05, 2026 -------------------------------------------------------------------------------------------- vectorscale: pack-positions pre-pass + geometric crossing intersection (#909) * vectorscale: pack-positions pre-pass + inline crossing intersection Adds a per-CP pre-pass (pack-positions) that denormalizes render geometry into a single PackedPositions texture and folds the crossing curve-curve intersection into the same pass. The rasterizer reads its full per-CP geometry from PackedPositions and skips ghost extension, neighbor-index decoding, and t_branch solving in its hot loop. New shader: pack-positions.slang For each CP slot, packs into 3 horizontally-adjacent texels: col 0 = (pp.x, pp.y, prev_ci_or_-1, _) col 1 = (cp.x, cp.y, t_branch, validity 0=skip 1=normal 2=line) col 2 = (np.x, np.y, next_ci_or_-1, _) (pp, cp, np) is the ghost-extended (pp = 2·prev - cp etc.) Bezier control triple. t_branch is computed per CP type: - IS_CROSSING: 2D Newton iteration on F(t,s) = B_a(t) - B_b(s) = 0, starting from (0.5, 0.5). The optimizer keeps crossings near the grid corner so the initial guess is within ~0.1 of the answer; 4 iterations drive the residual below f32 epsilon. Reads neighbor positions from both this slot's chain (N-S or E-W) and the partner slot's chain. This replaces the legacy ghost-aware inverse-correction that moved each crossing CP so the rendered curve passed through the grid corner at t=0.5. The CP now stays at its optimizer-final position and the rasterizer's wedge AA anchors at the geometric intersection B_a(t) = B_b(s). - 2-CP chain (degenerate stem with both ends as endpoint markers): t_branch = 0.5; render geometry pre-built as a straight line so the rasterizer dispatches to its closed-form line solver via is_line. - One-sided clamped Bezier (prev or next is endpoint): closed-form cubic project of the interior B-spline midpoint onto the clamped span — finds the t at which the rendered clamped curve reaches the same physical \before/after sc\ boundary an interior B-spline would at t=0.5. - Else: t_branch = 0.5. Modified: update-tjunction.slang Drop the IS_CROSSING ghost-aware inverse-correction branch; crossings pass through unchanged. Drops the now-unused Opt2 sampler binding, read_orig_pos helper, and Opt2Size UBO field. Modified: cell-rasterizer.slang Replace read_pos + read_neighbors + ghost extension + 2-CP-chain construction + t_branch cubic-solver in test_one_cp with a single read_packed_cp(ci) call returning a PackedCp struct. Per-active-probe fetch count: ~6 → 4 (1 flag + 3 packed reads). resolve_hit's neighbor-direction lookups for color resolution are unchanged. Modified: vectorscale.slangp 11 passes (was 10). pack-positions inserted between the final update-tjunction iteration (FinalPositions) and cell-rasterizer. PackedPositions framebuffer is 3.0 × source-relative wide. * vectorscale: cubic solver — FMA on discriminants, Newton polish, faster trig Three numerical improvements to closest_on_span: 1. FMA on discriminants. b²−4ac is the textbook catastrophic-cancellation case when b² ≈ 4ac (near-double-root); fma(b, b, -4·c·a) computes the sum with a single rounding instead of two, recovering ~1 extra bit and preventing disc from rounding to the wrong sign at the branch boundary. Same trick for the cubic disc q²/4 + p³/27 at the disc≈0 (near-triple- root) boundary between Cardano and trig branches. 2. Newton polish on every analytical root. Cardano + acos/cos/pow(_, 1/3) come back at ~5 ULP; one Newton step on D'(t) drives the root to ~1 ULP. polish_root_c skips when D''(t) is small or |step| ≥ 0.5 to avoid divergence at near-double/triple-root cases. 3. Faster trig branch. Replaces pow(sqrt(-p³/27), 1/3) (3 multiplies + sqrt + pow(_, 1/6)) with the equal 2·sqrt(-p/3). Reduces work and avoids precision loss of pow(_, 1/6). * vectorscale: split cell rasterizer into single-AA + multi-AA passes Replaces the monolithic cell-rasterizer.slang with two passes that share the same algorithm but separate the AA work for occupancy on register- constrained GPUs. 1. cell-rasterizer-single-aa.slang — tracks one best hit + the second- best hit's distance² (no full 2nd hit data). Resolves color, applies single-curve AA on the resolved hit. Writes RGB = AA color and A = sentinel (1.0 if 2nd hit is within aa_threshold so multi-curve AA could fire, 0.0 otherwise). Hit struct is slim (5 scalars: d2, t, cp_idx, prev_ci, next_ci) — geometry refetched via read_packed_cp at consumer sites (texture cache hits ~100% since test_one_cp just read the same texels). 2. cell-rasterizer-multi-aa.slang — reads SingleAA. If A < 0.5, passes RGB through unchanged (most pixels). Otherwise redoes find_hits (top-3) and runs wedge AA + dual-curve AA gates as in the original monolithic rasterizer, falling back to SingleAA's RGB if neither fires (single-curve AA already applied). pos/neg colors are scoped to each AA branch via out-params on resolve_hit instead of struct fields, keeping them out of the cross-branch live set. Two presets: - vectorscale.slangp: chains single-aa → multi-aa. Output is equivalent to the original monolithic rasterizer; most pixels take the cheap early-exit path on pass 2. - vectorscale-single-aa.slangp: single-aa pass alone. Faster on register- constrained GPUs but jaggy at junctions and dual-curve crossings. The sentinel is purely an inter-pass signal — the standalone single-aa preset writes it to viewport alpha where display ignores it. Measured on Apple Silicon: monolithic was 254 VGPRs (1/8 occupancy with 240 bytes spill); single-aa pass alone is ~120 VGPRs (clears the 128 threshold for ~30% occupancy, ~3x faster end-to-end). The chained two-pass setup matches monolithic output with the early-exit speedup.,







This pull request improves both hairline strokes and scaling strokes on the Web (WGPU, WebGL renderers) and Desktop (WGPU renderer) targets.
The main idea is to keep track of the scale of the graphics that are being tessellated on the rendering backends. The tessellated shapes are then stored in a tessellation cache, which is a simple LRU cache that keeps track of the most frequently tessellated shapes (4 max per shared graphic). This means that the last 4 uniquely used tessellated scale buckets will be left cached. Shapes will only be retessellated if they grow or shrink by 2x relative to a cached variant (controlled by
RETESSELLATION_SCALE_THRESHOLD).When a shape grows disproportionately, it is re-tessellated. The re-tessellation precision (threshold) is specified by the scale. The larger the scale, the more precise the tessellation will be: small objects are expected to have less detail either way.
Tessellation cache is reused between graphic instances that use the same graphic as an optimization.
Hairline stroke rendering is also improved.
This fixes issues such as (tested them): #18852 #21803 #751 #7369 #14268 #13984 #1955 #3216 #9044 #2023 #11704 #12360 #14551 #20211 #1412
Partially (composite issues - not all from these are fixed, just the strokes): #10524 #12057
Could not test (site locks, missing SWF, etc): #20345 #3216 #18855 #1625 #9309
Relevant technical discussions: #7042 #7369 #751
Before:


After:
Before:


After:
Before:


After:
Before:


After: