Apple Silicon + Windows CUDA perf: 4-5x FPS, wider capture, platform routing by maxwbuckley · Pull Request #1775 · hacksider/Deep-Live-Cam

maxwbuckley · 2026-04-22T09:09:09Z

Summary

CoreML graph rewrites eliminate CPU fallbacks on Apple Silicon: Pad(reflect) → Slice+Concat (inswapper_128), Shape→Gather folded to constants (det_10g), Split(axis=1) → Slice pairs (GFPGAN). All cached to disk with _coreml suffix, one-time cost per model per machine.
Camera capture negotiates MJPG fourcc + 960×540 @ 60 fps and measures actual FPS empirically (CAP_PROP_FPS lies on DirectShow).
Pipeline overlap: face detection runs one frame ahead of swap so ANE/GPU stay busy. GFPGAN uses a temporal cache (inference every Nth frame, affine-paste every frame).
Paste-back restricted to the face bounding box instead of the full frame; GPU path does erode/blur/blend end-to-end on CUDA.
Windows/CUDA: CUDA graphs + FP16 model + zero-copy display, DLL discovery for pip-installed cuDNN / nvidia-* wheels.
modules/platform_info.py centralizes OS/accelerator detection with a startup banner confirming which code path the app took.

Measured gains (MacBook Pro M3 Max vs upstream `main@64d3f06`)

Mode	Upstream	This fork
Face swap only	<5 FPS	>20 FPS
Face swap + GFPGAN	<2 FPS	>10 FPS
Camera	640×480 default	960×540 MJPEG @ 60 fps

Full per-contributor breakdown with before/after op latencies in PERFORMANCE.md.

Known issues (from post-review)

Two independent code reviews (Claude in-tree + Codex second opinion) produced 12 findings, cataloged in REVIEW_TODOS.md grouped as Blockers / Should-fix / Consider. The two highest-severity items are fixed in this PR (CUDA-graph replay race, many_faces enhancer loop breaking after the first face). The remaining 10 items are correctness hardening that should be addressed in follow-ups — none are merge-blockers for this PR's claimed wins.

Future cleanup

_decompose_reflect_pad in modules/onnx_optimize.py is marked TODO(ort>=1.26) — deletable once the ORT floor hits 1.26.0 (fixed upstream by microsoft/onnxruntime#28073). Code-only deletion, no perf change: native MIL pad(mode="reflect") matches the Slice+Concat rewrite to within noise (27.2 vs 27.4 ms on this machine).

Important

⚠️ TODO for @maxwbuckley — reverify on Windows/CUDA tonight

All numbers in this PR were measured on Apple Silicon (M3 Max). The Windows/CUDA code paths (CUDA graphs, FP16 model selection, MSMF→DSHOW camera fallback, DLL discovery for torch/lib + nvidia-*) need end-to-end reverify on a Windows + NVIDIA machine before this is ready for upstream merge.

Specifically check:

CUDA graph replay still works end-to-end after the _cuda_graph_lock addition (commit 4d04e83).
CAP_MSMF → CAP_DSHOW fallback opens the camera.
DLL discovery in run.py finds cuDNN from both torch/lib and pip-installed nvidia-* wheels.
No thread-race artifacts in output under multi_process_frame (the CUDA-graph lock should prevent them; confirm empirically).
1080p @ 60 FPS target on modern NVIDIA hardware (RTX Blackwell/Ada) per the original commit message claim.

Test plan

Apple Silicon M3 Max: app launches, [VideoCapturer] 960x540 @ ~60fps, inswapper runs as one CoreML partition, face swap >20 FPS, face swap + GFPGAN >10 FPS
CoreML Pad(reflect) measurement: 14 partitions @ 55 ms → 1 partition @ 27 ms, bit-identical output vs decomposition
Windows + NVIDIA: see reverify TODO above
many_faces mode: GFPGAN enhances every detected face (the fix in 4d04e83 restores this)

🤖 Generated with Claude Code

Summary by Sourcery

Improve real-time face swap/enhancement performance and platform handling across Apple Silicon and Windows/CUDA, including CoreML/CUDA optimizations, camera negotiation, and pipeline overlap.

New Features:

Add CoreML-focused ONNX optimization pipeline with disk caching for Apple Silicon models.
Introduce platform information module and startup banner to centralize OS and accelerator detection.
Provide standalone benchmarking script to measure end-to-end pipeline performance and FPS.
Add fast detection-only face analysis helpers and temporal caching for GFPGAN live enhancement.

Bug Fixes:

Fix many-faces GFPGAN enhancement loop so all detected faces are processed.
Address CUDA graph race conditions by serializing graph replay access.

Enhancements:

Optimize face swapper and enhancer paste-back operations to work on tight face regions with optional GPU blending and reduced copying.
Prefer FP16 swap model on Tensor Core GPUs and simplify CUDAExecutionProvider configuration for better modern GPU performance.
Pipeline face detection alongside swapping to overlap hardware usage and reduce per-frame latency.
Refine face enhancer preprocessing/postprocessing for fewer copies and fused operations.
Throttle and reuse face detection/enhancement results in the webcam processing thread based on measured camera FPS.
Bypass unnecessary landmark computation when only swapping is active, and optimize face detection model for CoreML.
Adjust default execution provider selection and GPU processing toggles for more sensible cross-platform behavior.
Improve webcam preview loop and color handling for lower overhead display updates.
Enhance GFPGAN ONNX session creation with shared provider configuration and CoreML optimizations.

Build:

Update CUDA DLL discovery on Windows to search both PyTorch and nvidia-* wheel locations.
Add ONNX optimization module and related dependencies for CoreML graph rewriting.

Documentation:

Document performance measurements, platform-specific optimizations, and review TODOs for this performance-focused change set.

Tests:

Add standalone benchmarking script to exercise the live pipeline and report per-stage timings.

Chores:

Introduce review TODO tracker documenting known issues and planned cleanups for the performance work.

…uting Bundles CoreML graph rewrites, GPU-accelerated pipeline work, Windows CUDA fixes, and Mac/Windows runtime routing into a single drop. CoreML (Apple Silicon): - Decompose Pad(reflect) → Slice+Concat in inswapper_128 so the model runs in one CoreML partition instead of 14 (TEMPORARY: fixed upstream in microsoft/onnxruntime#28073, drop when ORT >= 1.26.0). - Fold Shape/Gather chains to constants in det_10g (21ms → 4ms). - Decompose Split(axis=1) → Slice pairs in GFPGAN (155ms → 89ms). - Route detection model to GPU so the ANE is free for the swap model. - Centralize provider/config selection in create_onnx_session. Pipeline (all platforms): - Parallelize face landmark + recognition post-detection; skip landmark_2d_106 when only face_swapper is active. - Pipeline face detection with swap for ANE overlap. - GPU-accelerated paste_back, MJPEG capture, zero-copy display path. - Standalone pipeline benchmark script. Windows / CUDA: - CUDA graphs + FP16 model + all-GPU pipeline for 1080p 60 FPS. - Auto-detect GPU provider and fix DLL discovery for Windows CUDA execution. Cross-platform: - platform_info helper for Mac/Windows runtime routing. - GFPGAN 30 fps + MSMF camera 60 fps with adaptive pipeline tuning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Two issues surfaced in post-squash review of f65aeae: 1. CUDA-graph replay buffers were shared across threads with no lock. `_cuda_graph_swap_inference` mutates module-level ort_input/ort_latent and runs run_with_iobinding — concurrent swap calls on Windows/CUDA could overwrite each other's bound input buffers before replay, producing wrong-face output. Added `_cuda_graph_lock` around the full update/run/read sequence. 2. Face enhancer loop unconditionally broke after the first face, so `many_faces=True` silently enhanced only one face. Also, the single-slot temporal cache would paste the same enhancement onto every target if reused in many-faces mode. Gated the break on `not many_faces_mode` and disabled the cache path in that mode. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

PERFORMANCE.md documents measured gains on MacBook Pro M3 Max vs hacksider/Deep-Live-Cam main@64d3f06: - Face swap only: <5 FPS -> >20 FPS - Face swap + GFPGAN: <2 FPS -> >10 FPS - Camera: 640x480 -> 960x540 MJPEG @ 60fps Breaks down the contributors (camera negotiation, CoreML graph rewrites with before/after op latencies, pipeline overlap, GFPGAN temporal cache, paste-back optimization, platform routing, Windows CUDA path) and how to reproduce. REVIEW_TODOS.md captures 12 findings from two independent reviews (Claude in-tree + Codex second opinion) grouped as Blockers / Should-fix / Consider, each with file:line and suggested fix. The two Blocker/Should-fix items are addressed in the preceding commit. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

sourcery-ai · 2026-04-22T09:09:17Z

Reviewer's Guide

This PR substantially reworks the live video pipeline for Apple Silicon and Windows/CUDA: it adds CoreML-oriented ONNX graph optimization and platform-detection plumbing, optimizes face detection/swap/enhancement to overlap work and minimize copies, negotiates better camera formats and empirically measures FPS, introduces CUDA graph replay and FP16 model selection on NVIDIA, and wires UI/capture logic to use the new paths while documenting performance and open review items.

Sequence diagram for live webcam processing with pipelined detection and cached faces

sequenceDiagram
    actor User
    participant UI as modules.ui
    participant VC as VideoCapturer
    participant CapTh as _capture_thread_func
    participant ProcTh as _processing_thread_func
    participant FA as face_analyser
    participant FS as face_swapper
    participant FE as face_enhancer

    User->>UI: create_webcam_preview(camera_index)
    UI->>VC: start(width=1920,height=1080,fps=60)
    VC->>VC: open camera (MSMF/DShow), set MJPG
    VC->>VC: measure actual_fps
    VC-->>UI: success, actual_fps

    UI->>CapTh: start capture thread
    UI->>ProcTh: start processing thread(camera_fps)

    loop Capture loop
        CapTh->>VC: cap.read()
        VC-->>CapTh: frame
        CapTh->>CapTh: queue.put_nowait(frame) (drop oldest if full)
    end

    loop Processing loop per frame
        ProcTh->>CapTh: capture_queue.get()
        CapTh-->>ProcTh: temp_frame (BGR)

        ProcTh->>ProcTh: update det_count
        alt det_count % det_interval == 0
            alt many_faces
                ProcTh->>FA: detect_many_faces_fast(frame)
                FA-->>ProcTh: cached_many_faces
            else single_face
                ProcTh->>FA: detect_one_face_fast(frame)
                FA-->>ProcTh: cached_target_face
            end
        end

        ProcTh->>ProcTh: build _cached_faces from cache

        alt FE enabled
            ProcTh->>FE: process_frame(None,temp_frame,detected_faces=_cached_faces)
            FE->>FE: enhance_face(temp_frame,detected_faces)
            FE-->>ProcTh: enhanced frame
        end

        alt FS enabled
            ProcTh->>FS: process_frame(source_face,temp_frame,target_face=cached_target_face)
            FS->>FS: swap_face(source_face,target_face,temp_frame)
            FS-->>ProcTh: swapped frame
        end

        ProcTh->>ProcTh: cv2.cvtColor(BGR->RGB)
        ProcTh->>ProcTh: processed_queue.put_nowait(rgb_frame)
    end

    loop Display loop via ROOT.after
        UI->>UI: processed_queue.get_nowait()
        UI->>UI: fit_image_to_size(rgb_frame)
        UI->>UI: create CTkImage and update preview_label
    end

Class diagram for new and modified core modules

classDiagram
    class platform_info {
        <<module>>
        +bool IS_WINDOWS
        +bool IS_MACOS
        +bool IS_LINUX
        +bool IS_APPLE_SILICON
        +bool HAS_TORCH_CUDA
        +List~str~ ONNX_PROVIDERS
        +bool HAS_CUDA_PROVIDER
        +bool HAS_COREML_PROVIDER
        +bool HAS_DML_PROVIDER
        +List~(int,int)~ camera_backends()
        +str accelerator_label()
        +void print_banner()
    }

    class onnx_optimize {
        <<module>>
        +bool IS_APPLE_SILICON
        +str optimize_for_coreml(model_path,str input_shape)
        +bool _fold_shape_gather(model, input_shape)
        +bool _decompose_reflect_pad(model)
        +bool _decompose_split(model)
        +void _preserve_emap_position(model, numpy_helper)
    }

    class VideoCapturer {
        -int device_index
        -threading.Thread capture_thread
        -threading.Event _frame_ready
        -bool is_running
        -cv2.VideoCapture cap
        +int actual_width
        +int actual_height
        +float actual_fps
        +__init__(device_index:int)
        +bool start(width:int, height:int, fps:int)
        +void release()
        +float _measure_fps(warmup:int, sample:int, fallback:float)
        +void set_frame_callback(callback)
    }

    class FaceAnalyserModule {
        <<module>>
        +Any FACE_ANALYSER
        +threading.Lock FACE_ANALYSER_LOCK
        +tuple DET_SIZE
        +Any get_face_analyser()
        +void _optimize_det_model(fa:Any, providers)
        +bool _needs_landmark()
        +bool _is_dml()
        +list _analyse_faces(frame)
        +Any get_one_face(frame)
        +Any get_many_faces(frame)
        +Any detect_one_face_fast(frame)
        +Any detect_many_faces_fast(frame)
    }

    class FaceSwapperModule {
        <<module>>
        +Any FACE_SWAPPER
        +threading.Lock THREAD_LOCK
        +bool _HAS_TORCH_CUDA
        +dict _paste_cache
        +dict _cuda_graph_session
        +threading.Lock _cuda_graph_lock
        +Any get_face_swapper()
        +void _init_cuda_graph_session(model_path:str, swapper)
        +np.ndarray _cuda_graph_swap_inference(blob:np.ndarray, latent:np.ndarray)
        +Frame _fast_paste_back(target_img:Frame, bgr_fake:np.ndarray, aimg:np.ndarray, M:np.ndarray)
        +Frame swap_face(source_face:Face, target_face:Face, temp_frame:Frame)
        +Frame apply_post_processing(current_frame:Frame, swapped_face_bboxes:List)
    }

    class FaceEnhancerModule {
        <<module>>
        +onnxruntime.InferenceSession FACE_ENHANCER
        +threading.Semaphore THREAD_SEMAPHORE
        +bool _HAS_TORCH_CUDA
        +dict _enhancer_cache
        +dict _enh_live_cache
        +int _ENH_INTERVAL
        +onnxruntime.InferenceSession get_face_enhancer()
        +tuple _align_face(frame:Frame, landmarks:np.ndarray, output_size:int)
        +Frame _paste_back(frame:Frame, enhanced_face:np.ndarray, affine_matrix:np.ndarray, output_size:int)
        +np.ndarray _preprocess_face(aligned_face:np.ndarray)
        +np.ndarray _postprocess_face(output:np.ndarray)
        +Frame enhance_face(temp_frame:Frame, detected_faces)
        +Frame process_frame(source_face:Face, temp_frame:Frame, detected_faces)
        +Frame process_frame_v2(temp_frame:Frame, detected_faces)
    }

    class OnnxEnhHelperModule {
        <<module>>
        +list build_provider_config(providers)
        +np.ndarray run_inference(session, input_name:str, input_tensor:np.ndarray)
        +onnxruntime.InferenceSession create_onnx_session(model_path:str)
    }

    platform_info --> VideoCapturer : camera_backends()
    platform_info --> FaceSwapperModule : IS_APPLE_SILICON, HAS_TORCH_CUDA
    platform_info --> FaceEnhancerModule : IS_APPLE_SILICON

    onnx_optimize --> FaceSwapperModule : optimize_for_coreml()
    onnx_optimize --> FaceAnalyserModule : optimize_for_coreml()
    onnx_optimize --> OnnxEnhHelperModule : optimize_for_coreml()

    FaceAnalyserModule --> FaceSwapperModule : detect_one_face_fast()
    FaceAnalyserModule --> FaceEnhancerModule : get_many_faces()

    OnnxEnhHelperModule --> FaceEnhancerModule : create_onnx_session()

File-Level Changes

Change	Details	Files
Optimize face swapper for Apple Silicon and CUDA (model selection, CoreML-friendly graphs, CUDA graphs, and GPU-aware paste-back).	Change inswapper model selection to prefer FP16 on Tensor Core GPUs with FP32 fallback, and run a CoreML optimization pass on Apple Silicon before loading Simplify CUDAExecutionProvider configuration to use ONNX Runtime defaults and, when CUDA is active, initialize a CUDA-graph-enabled ONNX session with io_binding and a serialized replay lock Rewrite paste-back to operate on the face bounding box only, with a CUDA/torch path that keeps warp/mask/blend on GPU and an in-place CPU fallback, and avoid extra frame copies unless opacity or mouth-mask need the original frame Skip redundant norm_crop2 by passing a dummy aligned image used only for shape, and short-circuit post-processing when sharpening/interpolation are disabled	`modules/processors/frame/face_swapper.py`
Speed up GFPGAN face enhancer and add temporal/live-mode optimizations plus CoreML-aware session creation.	Replace local ONNX Runtime session setup with a shared create_onnx_session helper that applies CoreML graph optimizations on Apple Silicon Cache and reuse the feathered mask, operate paste-back on a tight crop, and add a CUDA/torch blending path analogous to the swapper’s paste-back Fuse pre/post-processing math to reduce conversions and allocations and use zero-copy BGR↔RGB channel views where possible Introduce a temporal cache for live mode that runs GFPGAN inference every N frames and reuses the enhanced face between detections, and thread this via enhance_face/process_frame APIs that can accept pre-detected faces	`modules/processors/frame/face_enhancer.py` `modules/processors/frame/_onnx_enhancer.py`
Refactor face analysis to avoid unnecessary landmark/recognition work and enable fast detection-only helpers and CoreML-optimized detection.	Introduce DET_SIZE constant and use it to prepare FaceAnalysis and to drive detector optimization Add _optimize_det_model to rewrite the detection ONNX model for CoreML (Shape→Gather folding) and to route detection to GPU (CPUAndGPU) while keeping swap on ANE Implement _needs_landmark and _analyse_faces to control when 106-point landmarks are run, and to replace the generic FaceAnalysis.get path Expose detect_one_face_fast and detect_many_faces_fast helpers that run detection-only and are used by the live pipeline to reduce per-frame overhead	`modules/face_analyser.py` `modules/onnx_optimize.py`
Retune webcam processing pipeline to adapt detection cadence and display loop to actual camera FPS, and to reuse detection results across processors.	Extend processing thread to accept camera_fps and compute a dynamic detection interval targeting ~80 ms between detections, using fast detect_*_fast helpers and caching faces across frame processors Change webcam preview creation to request 1920x1080@60, capture the empirically measured FPS from VideoCapturer, and wire it into the processing thread Move BGR→RGB conversion into the processing thread and simplify the display loop to consume RGB frames, resize them to the preview window, and schedule refresh at ~2× camera FPS	`modules/ui.py` `modules/video_capture.py`
Introduce centralized platform/accelerator detection and improve execution-provider defaults and DLL discovery on Windows.	Add modules/platform_info.py to compute OS/arch flags, detect torch CUDA and ONNX providers, choose camera backends, and print a single startup banner describing the chosen accelerator and providers Change core argument parsing to suggest a default execution provider based on available providers (cuda > rocm > coreml > dml > cpu) while still exposing the existing flag In run.py, rework Windows PATH munging to search both system and venv site-packages for torch/lib and nvidia/*/bin so onnxruntime-gpu can find cuDNN/cublas, and print the platform banner early in startup	`modules/platform_info.py` `modules/core.py` `run.py`
Disable OpenCV CUDA image-processing by default and add a benchmark script and performance/review docs.	Gate gpu_processing CUDA usage behind an OPENCV_CUDA_PROCESSING=1 env var with explanatory comments about upload/download overhead at webcam resolutions Add benchmark_pipeline.py to run a headless capture+swap pipeline benchmark and report stage timings and FPS for 1080p video Document Apple Silicon and cross-platform performance characteristics and contributors in PERFORMANCE.md and capture outstanding review items and follow-ups in REVIEW_TODOS.md	`modules/gpu_processing.py` `benchmark_pipeline.py` `PERFORMANCE.md` `REVIEW_TODOS.md`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've left some high level feedback:

In the pipelined _run_pipe_pipeline path you pass the same frame object to get_one_face via a background ThreadPoolExecutor while also mutating frame through the frame processors in the main loop; consider either copying the array for the detection task or doing detection on a separate frame buffer to avoid subtle data races and inconsistent face boxes.
The CUDA-graph integration monkey-patches swapper.session.run in _init_cuda_graph_session, which is brittle if insightface ever recreates or swaps out the session; it would be safer to wrap the call site (e.g. in swap_face) or add a thin adapter method on the swapper rather than replacing session.run in-place.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In the pipelined `_run_pipe_pipeline` path you pass the same `frame` object to `get_one_face` via a background `ThreadPoolExecutor` while also mutating `frame` through the frame processors in the main loop; consider either copying the array for the detection task or doing detection on a separate frame buffer to avoid subtle data races and inconsistent face boxes.
- The CUDA-graph integration monkey-patches `swapper.session.run` in `_init_cuda_graph_session`, which is brittle if insightface ever recreates or swaps out the session; it would be safer to wrap the call site (e.g. in `swap_face`) or add a thin adapter method on the swapper rather than replacing `session.run` in-place.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

…raph monkey-patch - core._run_pipe_pipeline: hand the background detector its own copy of the frame. The frame processors mutate in place via paste-back, which was racing with concurrent face detection on the same buffer. - face_swapper._init_cuda_graph_session: replace the `swapper.session.run` monkey-patch with a `_CudaGraphSessionAdapter` that proxies every attribute to the underlying session and only overrides `.run()`. Guarded so repeat init does not double-wrap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

hacksider · 2026-04-22T10:38:39Z

Awesome! You're the man @maxwbuckley !

maxwbuckley and others added 3 commits April 22, 2026 10:44

sourcery-ai Bot reviewed Apr 22, 2026

View reviewed changes

hacksider merged commit 19416cb into hacksider:main Apr 22, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Apple Silicon + Windows CUDA perf: 4-5x FPS, wider capture, platform routing#1775

Apple Silicon + Windows CUDA perf: 4-5x FPS, wider capture, platform routing#1775
hacksider merged 4 commits into
hacksider:mainfrom
maxwbuckley:unify-mac-windows

maxwbuckley commented Apr 22, 2026 •

edited by sourcery-ai Bot

Loading

Uh oh!

sourcery-ai Bot commented Apr 22, 2026 •

edited

Loading

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Uh oh!

Uh oh!

hacksider commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

maxwbuckley commented Apr 22, 2026 • edited by sourcery-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Measured gains (MacBook Pro M3 Max vs upstream main@64d3f06)

Known issues (from post-review)

Future cleanup

Test plan

Summary by Sourcery

Uh oh!

sourcery-ai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for live webcam processing with pipelined detection and cached faces

Class diagram for new and modified core modules

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

hacksider commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

maxwbuckley commented Apr 22, 2026 •

edited by sourcery-ai Bot

Loading

Measured gains (MacBook Pro M3 Max vs upstream `main@64d3f06`)

sourcery-ai Bot commented Apr 22, 2026 •

edited

Loading