Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make HTML ID generation fast #2383

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

jwueller
Copy link

Replaces HTML ID generation with a simple monotonic sequence to avoid several expensive hash lookups for every generated ID.

This entirely eliminates the chance for collisions under regular circumstances, and the default seed even performs better in some adverse circumstances, like multiple instances of the same library sharing an ID namespace on the same page.

The only potential downside is that IDs might not be particularly semantically helpful, but that can be mitigated by just appending that information instead.

This might be a breaking change if this module is considered a public API.

Fixes issues like this: https://jsonforms.discourse.group/t/expensive-id-generation-mechanism-with-createid-and-isuniqueid/1737

Replaces HTML ID generation with a simple monotonic sequence to avoid
several expensive hash lookups for every generated ID.

This entirely eliminates the chance for collisions under regular
circumstances, and the default seed even performs better in some
adverse circumstances, like multiple instances of the same library
sharing an ID namespace on the same page.

The only potential downside is that IDs might not be particularly
semantically helpful, but that can be mitigated by just appending that
information instead.

This might be a breaking change if this module is considered a public
API.
@CLAassistant
Copy link

CLAassistant commented Oct 24, 2024

CLA assistant check
All committers have signed the CLA.

Copy link

netlify bot commented Oct 24, 2024

Deploy Preview for jsonforms-examples ready!

Name Link
🔨 Latest commit e12bd5f
🔍 Latest deploy log https://app.netlify.com/sites/jsonforms-examples/deploys/671a774fae70450008542df6
😎 Deploy Preview https://deploy-preview-2383--jsonforms-examples.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@sdirix sdirix requested a review from lucas-koehler December 4, 2024 10:52
@lucas-koehler
Copy link
Contributor

lucas-koehler commented Dec 4, 2024

Hello @jwueller, thanks for the PR ❤️ and sorry for the late reply!
Unfortunately, we cannot accept this behavior change because it is an intentional feature that generated IDs are predictable. Predictable IDs are required because some downstream adopters depend on them for their code or CSS.

Clean solution

To cleanly solve this, the id generator would be handed into JSON Forms via configuration. However, this requires a proper config approach being implemented. This is planned in issue:

As it will take quite a while until this is implemented, we suggest the following alternative solution below.

Alternative solution

We discussed this with the team and came up with a potential solution that would allow you and others to override the ID generation when necessary:

Currently, the id-generation related methods createId, removeId, clearAllIds are directly imported from packages/core/src/util/ids.ts. This makes it currently impossible to override the implementations for adopters.

To make them overridable, we propose exporting a constant singleton object Id from packages/core/src/util/ids.ts and then calling the methods from the object. This allows to import the object and override its methods.

This requires adapting all current calls of the id-methods to call them from the new Id object

For this, introduce the object in packages/core/src/util/ids.ts:

// [...]
export const Id = {
  createId: createId,
  removeId: removeId,
  clearAllIds: clearAllIds
}

Callers of the Id methods then import the Id object and call the methods from there, like so:

// OLD
import { createId } from '@jsonforms/core';
// [...]
const newId = createId();

// NEW
import { Id } from '@jsonforms/core';
// [...]
const newId = Id.createId();

To override and implementation, i.e. of createId:

import { Id } from '@jsonforms/core';

function myCreateId() {
  // [...]
}

Id.createId = myCreateId;

If you want to contribute this, we would gladly accept this refactoring. It should then allow you to provide your own random-based ID generation in your project :)

@jwueller
Copy link
Author

jwueller commented Dec 6, 2024

Unfortunately, we cannot accept this behavior change because it is an intentional feature that generated IDs are predictable. Predictable IDs are required because some downstream adopters depend on them for their code or CSS.

@lucas-koehler Maybe you can help me understand, because I'm slightly confused. The IDs are still very much predictable. Or are we talking about the randomized namespace prefix? That can be overwritten, like I do in the tests. The randomized default namespace just reduces the risk of naming collisions in edge cases, but is not strictly required.

Or are you saying that IDs need to retain their exact structure? Because that structure inherently can't scale well, because it requires coordination between IDs, which will always be slow. Also, relying on IDs to have a certain structure seems like a significant anti-pattern, considering they have to be unique on the entire page, and can't be predicted reliably across multiple forms already anyway.

Clean solution

Yes, passing the generator function into the config would certainly be the best solution, but is also the most disruptive. I just wanted fast generation that we can actually use ASAP, because the current ID generator scales so poorly, that it takes up the majority of the entire application compute time once you have enough forms on the page.

To make them overridable, we propose exporting a constant singleton object Id from packages/core/src/util/ids.ts and then calling the methods from the object. This allows to import the object and override its methods.

This is an okay compromise, but will still be very slow by default. I can make a separate PR for that if it's the only acceptable solution, since the main goal of this one is to make it fast.

@sdirix
Copy link
Member

sdirix commented Dec 6, 2024

Hi @jwueller,

You're right, the suggested id mechanism is deterministic.

Or are you saying that IDs need to retain their exact structure? Because that structure inherently can't scale well, because it requires coordination between IDs, which will always be slow. Also, relying on IDs to have a certain structure seems like a significant anti-pattern, considering they have to be unique on the entire page, and can't be predicted reliably across multiple forms already anyway.

Yes, we can't change the ID generation in a minor version as this will break downstream in an unexpected manner. We know adopters who rely on the current unique ids to programmatically identify and/or style special inputs.

In many use cases this is not an anti-pattern as adopters who rely on this mechanism are using JSON Forms in a predictable manner.


In practice, the use case of rendering thousands of inputs at the same time is already a nice use case. In the more complex renderer sets (e.g. Material UI), rendering is then already so expensive that the use case is out of question already anyway.

The significant slowdown will only appear in cases in which there are not only thousands of ids, but they actually have to be exactly identical ids. This can only happen if you render the exactly same form thousands of times.

I did a quick performance test:

  • Invoking createId with 1000 very long identical ids completes in ~120 ms
  • Invoking createId with 10000 very long identical ids completes in ~12s
  • Invoking createId with 10000 very long ids of which 1000 are spread out duplicates completes in ~125 ms
  • Invoking createId with 200 very long unique ids completes in ~0.5ms
  • Invoking createId with 200 very long unique ids, each time handed over 10 times, completes in ~5ms

So even with 1000 identical duplicates, the penalty is barely noticeable compared to the rendering time needed to render 1000 inputs. The only very bad case is once you have over 1000+ identical duplicates. This is certainly an niche use case. Therefore it's not a good enough reason to completely change the id behavior within a minor version.

I'm fine with changing the id generation with the next major version though, especially when making it configurable.


Yes, passing the generator function into the config would certainly be the best solution, but is also the most disruptive. I just wanted fast generation that we can actually use ASAP, because the current ID generator scales so poorly, that it takes up the majority of the entire application compute time once you have enough forms on the page.

This is really only true in use cases where you have >1000 identical ids. What kind of use case do you have?

This is an okay compromise, but will still be very slow by default. I can make a separate PR for that if it's the only acceptable solution, since the main goal of this one is to make it fast.

We would happily accept that PR as this just makes JSON Forms better for everyone by adding more options.


There is also the other alternative to speed up the current implementation without changing its behavior.

For example using the following improved implementation we avoid long string hashes and instead use a more performant number set

const idMap = new Map<string, Set<number>>();

function createId(proposedId) {
  if (!idMap.has(proposedId)) {
    idMap.set(proposedId, new Set());
  }

  const usedIndices = idMap.get(proposedId);

  let index = 0;
  while (usedIndices.has(index)) {
    index++;
  }

  const newId = index === 0 ? proposedId : `${proposedId}${index}`;

  usedIndices.add(index);

  return newId;
}

Using this approach, I get the following performance numbers:

  • 1000 very long identical ids completes in ~5ms
  • 10000 very long identical ids completes in ~450ms
  • 10000 very long ids of which 1000 are spread out duplicates completes in ~15ms
  • 200 unique ids completes in ~0.1ms
  • 200 unique ids, each time handed over 10 times, completes in ~3.5ms

The worst cases is significantly improved by a factor of 2400% from 12 seconds to 450ms. Of course the delete use case becomes a bit more complicated.

So if this is sufficient performance for you, feel free to contribute this improvement with testcases, as it's more complicated than before.


Edit: I just realized that the suggested performance improvement slightly breaks the uniqueness guarantees. In case where we have properties which are named name and name2 then we could end up with duplicated ids.

These cases need to be additionally handled by recognizing such a number as an index and prefixing that number to each tried number.

@jwueller
Copy link
Author

jwueller commented Dec 6, 2024

Hi @sdirix,

Yes, we can't change the ID generation in a minor version as this will break downstream in an unexpected manner. We know adopters who rely on the current unique ids to programmatically identify and/or style special inputs.

Yes, this is definitely a breaking change if you consider IDs part of the public API. Although if data about the property name is required by CSS or other post-processing, then it might be better to move those into data-* attributes, or similar in the future, since those don't have to deal with collisions.

In many use cases this is not an anti-pattern as adopters who rely on this mechanism are using JSON Forms in a predictable manner.

Are they really that predictable, though? The fact that IDs depend on the order that they were generated in makes them pretty hard to predict if you have more than one form on a page.

For example, since the IDs basically always boil down to just a property name and a numeric suffix, something as simple as #/properties/name would be very likely to collide between unrelated schemas, leading to unpredictable IDs.

In practice, the use case of rendering thousands of inputs at the same time is already a nice use case.

I agree that this might not be the primary use-case, but in our case we have a lot of list items, each containing rather simple forms, but they tend do be similar in structure. This means that there is also a lot of ID collisions, because most of them happen to use common names for their properties.

I did a quick performance test:

It seems that you're getting much better numbers than this example did, which is surprising to me:
https://jsonforms.discourse.group/t/expensive-id-generation-mechanism-with-createid-and-isuniqueid/1737

This is really only true in use cases where you have >1000 identical ids. What kind of use case do you have?

Lots of small but similar forms in a larger list structure, as described above.

There is also the other alternative to speed up the current implementation without changing its behavior.

Yes, it's definitely possible to speed up the current externally visible behavior. Would you prefer a separate PR for a backwards-compatible solution?

@sdirix
Copy link
Member

sdirix commented Dec 9, 2024

I agree that this might not be the primary use-case, but in our case we have a lot of list items, each containing rather simple forms, but they tend do be similar in structure. This means that there is also a lot of ID collisions, because most of them happen to use common names for their properties.

Interesting, you are rendering all these forms at the same time? I would have expected more of a tree-master with detail approach in use cases like that.

It seems that you're getting much better numbers than this example did, which is surprising to me:
https://jsonforms.discourse.group/t/expensive-id-generation-mechanism-with-createid-and-isuniqueid/1737

The discussion post is not very detailed. "1000 forms" could mean anything, from 1000 identical ids up to 10000, 100000 etc. Besides that it obviously depends on the executing machine too.

Yes, it's definitely possible to speed up the current externally visible behavior. Would you prefer a separate PR for a backwards-compatible solution?

I'm fine either way, i.e. a new PR or force-pushing here.

First I would recommend opening a PR just changing the id structure, i.e. with the suggested object approach. That would already allow overwriting for your case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants