Skip to content

Conversation

@penelopeysm
Copy link
Member

@penelopeysm penelopeysm commented Nov 19, 2025

Proof of principle. All these are run specifically on linked varinfo because that's what the optimisation affects.

# trivial        before this PR                        after this PR                        v0.38.9
eval      ----   14.664 ns (1 allocs: 32 bytes)        5.585 ns                             163.035 ns (7 allocs: 224 bytes)    
grad (FD) ----   43.595 ns (4 allocs: 144 bytes)       50.415 ns (3 allocs: 96 bytes)       324.728 ns (14 allocs: 544 bytes)   
grad (RD) ----   2.718 μs (52 allocs: 1.781 KiB)       3.107 μs (46 allocs: 1.578 KiB)      4.042 μs (82 allocs: 3.125 KiB)     
grad (MC) ----   319.596 ns (6 allocs: 256 bytes)      111.164 ns (2 allocs: 64 bytes)      1.194 μs (27 allocs: 1.281 KiB)     
grad (EN) ----   172.721 ns (6 allocs: 208 bytes)      73.662 ns (2 allocs: 64 bytes)       483.607 ns (20 allocs: 688 bytes)   
                                                                                                                                   
# eight-schools  before this PR                        after this PR                        v0.38.9
eval      ----   241.115 ns (7 allocs: 352 bytes)      159.574 ns (4 allocs: 256 bytes)     760.417 ns (22 allocs: 1.094 KiB)   
grad (FD) ----   886.719 ns (13 allocs: 2.812 KiB)     777.027 ns (11 allocs: 2.594 KiB)    1.476 μs (28 allocs: 4.828 KiB)     
grad (RD) ----   38.250 μs (593 allocs: 21.641 KiB)    43.750 μs (574 allocs: 20.938 KiB)   43.000 μs (639 allocs: 26.359 KiB)  
grad (MC) ----   1.511 μs (18 allocs: 976 bytes)       1.037 μs (10 allocs: 656 bytes)      4.910 μs (68 allocs: 3.859 KiB)     
grad (EN) ----   998.600 ns (33 allocs: 1.469 KiB)     605.429 ns (13 allocs: 832 bytes)    1.942 μs (59 allocs: 2.797 KiB)     
                                                                                                                                   
# badvarnames    before this PR                        after this PR                        v0.38.9
eval      ----   611.104 ns (22 allocs: 864 bytes)     315.217 ns (2 allocs: 224 bytes)     1.635 μs (66 allocs: 2.531 KiB)     
grad (FD) ----   3.530 μs (51 allocs: 8.656 KiB)       1.949 μs (11 allocs: 4.281 KiB)      5.475 μs (143 allocs: 18.641 KiB)   
grad (RD) ----   47.541 μs (893 allocs: 31.812 KiB)    50.792 μs (754 allocs: 26.859 KiB)   57.167 μs (1076 allocs: 40.078 KiB) 
grad (MC) ----   2.842 μs (68 allocs: 2.344 KiB)       1.221 μs (6 allocs: 672 bytes)       8.319 μs (200 allocs: 8.250 KiB)    
grad (EN) ----   2.633 μs (85 allocs: 4.469 KiB)       477.800 ns (10 allocs: 1.656 KiB)    3.589 μs (144 allocs: 8.641 KiB)    
                                                                                                                                   
# submodel       before this PR                        after this PR                        v0.38.9
eval      ----   124.658 ns (3 allocs: 96 bytes)       57.435 ns                            613.000 ns (19 allocs: 848 bytes)   
grad (FD) ----   229.321 ns (6 allocs: 304 bytes)      187.931 ns (3 allocs: 112 bytes)     920.966 ns (26 allocs: 1.594 KiB)   
grad (RD) ----   10.667 μs (165 allocs: 5.781 KiB)     11.729 μs (147 allocs: 5.156 KiB)    16.375 μs (235 allocs: 9.547 KiB)   
grad (MC) ----   729.158 ns (12 allocs: 432 bytes)     302.182 ns (2 allocs: 80 bytes)      5.667 μs (74 allocs: 3.000 KiB)     
grad (EN) ----   661.111 ns (25 allocs: 928 bytes)     278.971 ns (3 allocs: 128 bytes)     2.420 μs (70 allocs: 2.750 KiB)     
                                                                                                                                   
# demo3          before this PR                        after this PR                        v0.38.9
eval      ----   306.122 ns (12 allocs: 528 bytes)     298.242 ns (12 allocs: 528 bytes)    835.794 ns (23 allocs: 1.031 KiB)  
grad (FD) ----   555.288 ns (17 allocs: 1.094 KiB)     562.098 ns (17 allocs: 1.094 KiB)    1.141 μs (32 allocs: 2.219 KiB)    
grad (RD) ----   15.917 μs (253 allocs: 9.297 KiB)     17.375 μs (253 allocs: 9.297 KiB)    20.292 μs (315 allocs: 12.641 KiB) 
grad (MC) ----   1.839 μs (28 allocs: 1.250 KiB)       1.613 μs (26 allocs: 1.125 KiB)      3.810 μs (64 allocs: 3.266 KiB)    
grad (EN) ----   1.378 μs (32 allocs: 1.406 KiB)       1.377 μs (32 allocs: 1.406 KiB)      1.922 μs (52 allocs: 2.281 KiB)   

demo3 has no real difference because it only uses multivariates I believe.

@penelopeysm penelopeysm changed the base branch from main to py/type-stability November 19, 2025 10:17
@penelopeysm penelopeysm force-pushed the py/fastldf-transforms branch from 20d0459 to 72a123a Compare November 19, 2025 10:17
@github-actions
Copy link
Contributor

github-actions bot commented Nov 19, 2025

Benchmark Report

  • this PR's head: acff2745eca12b8925731f91998f8675b89122ee
  • base branch: accb515201d8391dea61ccbf87969a09a5db7978

Computer Information

Julia Version 1.11.7
Commit f2b3dbda30a (2025-09-08 12:10 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 4 × AMD EPYC 7763 64-Core Processor
  WORD_SIZE: 64
  LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 4 virtual cores)

Benchmark Results

┌───────────────────────┬───────┬─────────────┬────────────────┬────────┬───────────────────────────────┬────────────────────────────┬─────────────────────────────────┐
│                       │       │             │                │        │       t(eval) / t(ref)        │     t(grad) / t(eval)      │        t(grad) / t(ref)         │
│                       │       │             │                │        │ ─────────┬──────────┬──────── │ ───────┬─────────┬──────── │ ──────────┬───────────┬──────── │
│                 Model │   Dim │  AD Backend │        VarInfo │ Linked │     base │  this PR │ speedup │   base │ this PR │ speedup │      base │   this PR │ speedup │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│               Dynamic │    10 │    mooncake │          typed │   true │   321.48 │   335.93 │    0.96 │   9.88 │    9.40 │    1.05 │   3177.49 │   3158.89 │    1.01 │
│                   LDA │    12 │ reversediff │          typed │   true │  2265.54 │  2451.08 │    0.92 │   5.44 │    5.31 │    1.03 │  12333.29 │  13004.41 │    0.95 │
│   Loop univariate 10k │ 10000 │    mooncake │          typed │   true │ 90006.79 │ 67684.66 │    1.33 │   3.94 │    3.94 │    1.00 │ 354505.36 │ 266947.27 │    1.33 │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│    Loop univariate 1k │  1000 │    mooncake │          typed │   true │  6875.12 │  4406.34 │    1.56 │   4.88 │    5.76 │    0.85 │  33565.90 │  25384.00 │    1.32 │
│      Multivariate 10k │ 10000 │    mooncake │          typed │   true │ 27948.32 │ 30323.59 │    0.92 │  10.29 │   10.09 │    1.02 │ 287575.18 │ 305983.56 │    0.94 │
│       Multivariate 1k │  1000 │    mooncake │          typed │   true │  3132.04 │  3272.50 │    0.96 │   9.43 │    9.44 │    1.00 │  29534.24 │  30892.10 │    0.96 │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│ Simple assume observe │     1 │ forwarddiff │          typed │  false │     3.33 │     1.34 │    2.50 │   2.91 │    7.09 │    0.41 │      9.70 │      9.48 │    1.02 │
│           Smorgasbord │   201 │ forwarddiff │          typed │  false │  1061.92 │   997.98 │    1.06 │ 125.31 │   67.36 │    1.86 │ 133063.76 │  67223.75 │    1.98 │
│           Smorgasbord │   201 │      enzyme │          typed │   true │  1482.56 │  1196.90 │    1.24 │   5.01 │    4.86 │    1.03 │   7425.70 │   5822.25 │    1.28 │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │    mooncake │          typed │   true │  1947.42 │  1226.46 │    1.59 │   4.06 │    5.32 │    0.76 │   7899.03 │   6525.85 │    1.21 │
│           Smorgasbord │   201 │ reversediff │          typed │   true │  1492.89 │  1208.51 │    1.24 │  88.38 │  109.83 │    0.80 │ 131935.68 │ 132734.26 │    0.99 │
│           Smorgasbord │   201 │ forwarddiff │   typed_vector │   true │  1464.59 │  1195.94 │    1.22 │  57.01 │  151.18 │    0.38 │  83497.92 │ 180803.35 │    0.46 │
├───────────────────────┼───────┼─────────────┼────────────────┼────────┼──────────┼──────────┼─────────┼────────┼─────────┼─────────┼───────────┼───────────┼─────────┤
│           Smorgasbord │   201 │ forwarddiff │        untyped │   true │  1461.15 │  1199.86 │    1.22 │  57.96 │   81.02 │    0.72 │  84694.73 │  97209.90 │    0.87 │
│           Smorgasbord │   201 │ forwarddiff │ untyped_vector │   true │  1460.39 │  1204.64 │    1.21 │  55.97 │   81.47 │    0.69 │  81731.87 │  98143.20 │    0.83 │
│              Submodel │     1 │    mooncake │          typed │   true │     7.37 │     5.29 │    1.39 │   4.51 │    4.54 │    0.99 │     33.24 │     24.03 │    1.38 │
└───────────────────────┴───────┴─────────────┴────────────────┴────────┴──────────┴──────────┴─────────┴────────┴─────────┴─────────┴───────────┴───────────┴─────────┘

@codecov
Copy link

codecov bot commented Nov 19, 2025

Codecov Report

❌ Patch coverage is 64.04494% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 76.74%. Comparing base (accb515) to head (acff274).

Files with missing lines Patch % Lines
src/utils.jl 64.04% 32 Missing ⚠️
Additional details and impacted files
@@                  Coverage Diff                  @@
##           py/type-stability    #1147      +/-   ##
=====================================================
- Coverage              77.17%   76.74%   -0.44%     
=====================================================
  Files                     40       40              
  Lines                   3733     3801      +68     
=====================================================
+ Hits                    2881     2917      +36     
- Misses                   852      884      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@penelopeysm penelopeysm force-pushed the py/fastldf-transforms branch from 7bf80bf to acff274 Compare November 19, 2025 10:54
@github-actions
Copy link
Contributor

DynamicPPL.jl documentation for PR #1147 is available at:
https://TuringLang.github.io/DynamicPPL.jl/previews/PR1147/

@penelopeysm penelopeysm changed the title transforms on fast ldf improved univariate transforms on fast ldf Nov 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants