Skip to content

Conversation

@Valian
Copy link
Contributor

@Valian Valian commented Jul 18, 2025

I've spent way too much time on this, but I have a fast, working implementation of greedy algorithm detecting patches using object_hash opts, as asked in #26

My benchmarks show it's between 2-5 times faster, thanks to reducing number of deep comparisions. If hash can't be computed, we fall back to pairwise comparision.

Created patches are optimal when there are only additions / removals, no matter where in list. I don't detect moves - tried to do it, but it's a can of worms. For now moves are mapped to pair of remove + add.

Implemented also an LCS-based solution, but it was slow and detecting conflicting moves was very complex. Decided to rewrite with a simpler solution.

Feel free to check tests. That branch is based on diff-performance-optimizations, should be more readable after merging performance fixes.

TODO:

  • Documentation

benchmark:

Benchmarking With object_hash (greedy) with input Complex Objects - Product Catalog ...
Benchmarking With object_hash (greedy) with input Large List (500 items) ...
Benchmarking With object_hash (greedy) with input Medium List (100 items) ...
Benchmarking With object_hash (greedy) with input Mixed Operations - Social Feed ...
Benchmarking With object_hash (greedy) with input Nested Lists - User Management ...
Benchmarking With object_hash (greedy) with input Small List (20 items) ...
Benchmarking With object_hash (greedy) with input Very Large List (1000 items) ...
Benchmarking Without object_hash (pairwise) with input Complex Objects - Product Catalog ...
Benchmarking Without object_hash (pairwise) with input Large List (500 items) ...
Benchmarking Without object_hash (pairwise) with input Medium List (100 items) ...
Benchmarking Without object_hash (pairwise) with input Mixed Operations - Social Feed ...
Benchmarking Without object_hash (pairwise) with input Nested Lists - User Management ...
Benchmarking Without object_hash (pairwise) with input Small List (20 items) ...
Benchmarking Without object_hash (pairwise) with input Very Large List (1000 items) ...
Calculating statistics...
Formatting results...

##### With input Complex Objects - Product Catalog #####
Name                                     ips        average  deviation         median         99th %
With object_hash (greedy)            88.31 K       11.32 μs   ±214.54%        9.67 μs       35.43 μs
Without object_hash (pairwise)       74.66 K       13.39 μs    ±87.12%       12.04 μs       31.21 μs

Comparison: 
With object_hash (greedy)            88.31 K
Without object_hash (pairwise)       74.66 K - 1.18x slower +2.07 μs

Memory usage statistics:

Name                              Memory usage
With object_hash (greedy)             10.20 KB
Without object_hash (pairwise)         9.72 KB - 0.95x memory usage -0.48438 KB

**All measurements for memory usage were the same**

Reduction count statistics:

Name                           Reduction count
With object_hash (greedy)               1.22 K
Without object_hash (pairwise)          1.04 K - 0.85x reduction count -0.18300 K

**All measurements for reduction count were the same**

##### With input Large List (500 items) #####
Name                                     ips        average  deviation         median         99th %
With object_hash (greedy)             3.42 K        0.29 ms    ±22.15%        0.27 ms        0.48 ms
Without object_hash (pairwise)        0.62 K        1.62 ms    ±12.16%        1.59 ms        2.26 ms

Comparison: 
With object_hash (greedy)             3.42 K
Without object_hash (pairwise)        0.62 K - 5.54x slower +1.33 ms

Memory usage statistics:

Name                              Memory usage
With object_hash (greedy)            445.83 KB
Without object_hash (pairwise)       906.92 KB - 2.03x memory usage +461.09 KB

**All measurements for memory usage were the same**

Reduction count statistics:

Name                           Reduction count
With object_hash (greedy)              25.14 K
Without object_hash (pairwise)         89.90 K - 3.58x reduction count +64.77 K

**All measurements for reduction count were the same**

##### With input Medium List (100 items) #####
Name                                     ips        average  deviation         median         99th %
With object_hash (greedy)            19.45 K       51.42 μs    ±18.82%       49.63 μs       75.85 μs
Without object_hash (pairwise)        3.03 K      330.16 μs    ±35.01%      320.92 μs      519.85 μs

Comparison: 
With object_hash (greedy)            19.45 K
Without object_hash (pairwise)        3.03 K - 6.42x slower +278.75 μs

Memory usage statistics:

Name                              Memory usage
With object_hash (greedy)             87.38 KB
Without object_hash (pairwise)       177.93 KB - 2.04x memory usage +90.55 KB

**All measurements for memory usage were the same**

Reduction count statistics:

Name                           Reduction count
With object_hash (greedy)               5.41 K
Without object_hash (pairwise)         16.30 K - 3.01x reduction count +10.89 K

**All measurements for reduction count were the same**

##### With input Mixed Operations - Social Feed #####
Name                                     ips        average  deviation         median         99th %
With object_hash (greedy)           126.17 K        7.93 μs    ±79.47%        7.29 μs       20.21 μs
Without object_hash (pairwise)       90.90 K       11.00 μs    ±40.83%       10.08 μs       27.04 μs

Comparison: 
With object_hash (greedy)           126.17 K
Without object_hash (pairwise)       90.90 K - 1.39x slower +3.08 μs

Memory usage statistics:

Name                              Memory usage
With object_hash (greedy)              7.95 KB
Without object_hash (pairwise)         7.47 KB - 0.94x memory usage -0.48438 KB

**All measurements for memory usage were the same**

Reduction count statistics:

Name                           Reduction count
With object_hash (greedy)                  941
Without object_hash (pairwise)             826 - 0.88x reduction count -115

**All measurements for reduction count were the same**

##### With input Nested Lists - User Management #####
Name                                     ips        average  deviation         median         99th %
With object_hash (greedy)           109.85 K        9.10 μs   ±345.27%        7.42 μs       24.91 μs
Without object_hash (pairwise)       98.71 K       10.13 μs    ±32.27%        9.38 μs       27.42 μs

Comparison: 
With object_hash (greedy)           109.85 K
Without object_hash (pairwise)       98.71 K - 1.11x slower +1.03 μs

Memory usage statistics:

Name                              Memory usage
With object_hash (greedy)              8.40 KB
Without object_hash (pairwise)         6.83 KB - 0.81x memory usage -1.57031 KB

**All measurements for memory usage were the same**

Reduction count statistics:

Name                           Reduction count
With object_hash (greedy)                  998
Without object_hash (pairwise)             776 - 0.78x reduction count -222

**All measurements for reduction count were the same**

##### With input Small List (20 items) #####
Name                                     ips        average  deviation         median         99th %
With object_hash (greedy)           112.15 K        8.92 μs   ±222.21%        7.17 μs       43.81 μs
Without object_hash (pairwise)       19.54 K       51.18 μs    ±28.79%          49 μs       93.25 μs

Comparison: 
With object_hash (greedy)           112.15 K
Without object_hash (pairwise)       19.54 K - 5.74x slower +42.27 μs

Memory usage statistics:

Name                              Memory usage
With object_hash (greedy)             16.63 KB
Without object_hash (pairwise)        35.44 KB - 2.13x memory usage +18.80 KB

**All measurements for memory usage were the same**

Reduction count statistics:

Name                           Reduction count
With object_hash (greedy)               1.19 K
Without object_hash (pairwise)          3.36 K - 2.83x reduction count +2.17 K

**All measurements for reduction count were the same**

##### With input Very Large List (1000 items) #####
Name                                     ips        average  deviation         median         99th %
With object_hash (greedy)             1.77 K        0.56 ms    ±18.26%        0.54 ms        1.07 ms
Without object_hash (pairwise)        0.32 K        3.13 ms    ±15.77%        3.06 ms        5.09 ms

Comparison: 
With object_hash (greedy)             1.77 K
Without object_hash (pairwise)        0.32 K - 5.55x slower +2.56 ms

Memory usage statistics:

Name                              Memory usage
With object_hash (greedy)              0.93 MB
Without object_hash (pairwise)         1.76 MB - 1.90x memory usage +0.83 MB

**All measurements for memory usage were the same**

Reduction count statistics:

Name                           Reduction count
With object_hash (greedy)              52.62 K
Without object_hash (pairwise)        165.64 K - 3.15x reduction count +113.01 K

**All measurements for reduction count were the same**

@corka149
Copy link
Owner

corka149 commented Jul 20, 2025

Sounds very interesting but I am far away from a computer for two weeks. 😬 So don't wonder, when feedback takes some time.

@Valian Valian mentioned this pull request Jul 22, 2025
@Valian Valian force-pushed the greedy-object-hash branch 2 times, most recently from 05f5ba2 to 3d81783 Compare July 28, 2025 19:31
@Valian
Copy link
Contributor Author

Valian commented Jul 28, 2025

@corka149 successfully rebased and added additional tests for ancestor path and object_hash. Everything seems fine, just credo checks are failing - for some reason I'm unable to run them locally, that's why I'll need a few iterations to have it sorted out.

It would be great if you could check it - implementation should be quite straightforward, even if looking a bit intimidating ;)

@Valian Valian force-pushed the greedy-object-hash branch from 3d81783 to e08a60a Compare July 28, 2025 19:51
…hash for efficient patches on collections with unique identifiers.
@Valian
Copy link
Contributor Author

Valian commented Aug 10, 2025

Hi @corka149, anything else I could improve here?

@corka149
Copy link
Owner

Hi @Valian , you are not forgotten. But I am really back tomorrow with access to my hardware. Then I want to take a deeper look.

@corka149 corka149 merged commit a02c130 into corka149:master Aug 11, 2025
10 checks passed
@Valian
Copy link
Contributor Author

Valian commented Aug 11, 2025

💜💜💜🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants