Improve canonicalization performance by ddeschepper · Pull Request #3119 · RDFLib/rdflib

ddeschepper · 2025-04-25T10:57:11Z

We're noticing big performance issues when using longturtle serialization on some graphs. I've been able to narrow this down to the performance of canonicalization, which is also tracked in issue #2528.

Looking into it I found that the current implementation of the _traces method of _TripleCanonicalizer causes much of the performance impact.

This PR reduces the complexity of _traces, which leads to a performance gain of at least an order of magnitude in our worst cases (100s -> 4s). All rdflib tests still pass, and additionally, I've tested these changes with our set of a few hundred examples that are longturtle serialized, which causes no changes in the serialization output.

The author of the linked issue has created a performance test that, with the current code, gives the following results on my machine:

file: test1.ttl
isomorphic: 0.07787537574768066
canonical: 0.03909921646118164

file: test2.ttl
isomorphic: 3.528538942337036
canonical: 1.8337273597717285

file: test3.ttl
isomorphic: 20.140648365020752
canonical: 9.535402774810791

where my new version results in:

file: test1.ttl
isomorphic: 0.012566566467285156
canonical: 0.006159543991088867

file: test2.ttl
isomorphic: 0.15960264205932617
canonical: 0.09874987602233887

file: test3.ttl
isomorphic: 0.531768798828125
canonical: 0.2606654167175293

edmondchuc · 2025-05-07T05:12:48Z

rdflib/compare.py

-                best = [refined_coloring]
+            color_score = tuple(c.key() for c in refined_coloring)
+
+            if best_score is None or best_score < color_score:


The right-hand side of the or is never evaluated as best_score is None is always true here. Can you please review this.

nicholascar · 2025-06-02T00:45:45Z

Agreed that there are large performance issues using longturtle instead of turtle and it would be great to fix them!

A few changes to compare.py have come through other PRs so there's a conflict to be resolved here. I'll try and resolve this now and then see how the PR looks.

edmondchuc · 2025-08-29T04:26:04Z

Hi @ddeschepper, I will put the use of the canon algorithm behind a flag when calling the longturtle serializer and set it to false by default. This should address the performance issues for those who don't need deterministic outputs.

edmondchuc · 2025-09-03T06:42:47Z

Hi @ddeschepper, please can you review this PR #3197 and see if it addresses your concerns with performance? By default, canonicalization is no longer applied when using the longturtle serializer unless the canon flag is set to True.

reduce _traces complexity

312138d

ddeschepper mentioned this pull request Apr 25, 2025

Performance issues with rdflib.compare #2528

Open

edmondchuc mentioned this pull request May 7, 2025

7.x canonicalization perf #3135

Closed

8 tasks

edmondchuc reviewed May 7, 2025

View reviewed changes

nicholascar added 2 commits June 2, 2025 10:48

Merge branch 'main' into canonicalization-perf

5c93b12

Merge branch 'main' into canonicalization-perf

53a84b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve canonicalization performance#3119

Improve canonicalization performance#3119
ddeschepper wants to merge 3 commits intoRDFLib:mainfrom
ddeschepper:canonicalization-perf

ddeschepper commented Apr 25, 2025

Uh oh!

edmondchuc May 7, 2025 •

edited

Loading

Uh oh!

nicholascar commented Jun 2, 2025

Uh oh!

edmondchuc commented Aug 29, 2025

Uh oh!

edmondchuc commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ddeschepper commented Apr 25, 2025

Uh oh!

edmondchuc May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nicholascar commented Jun 2, 2025

Uh oh!

edmondchuc commented Aug 29, 2025

Uh oh!

edmondchuc commented Sep 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

edmondchuc May 7, 2025 •

edited

Loading