-
Notifications
You must be signed in to change notification settings - Fork 379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make HTML ID generation fast #2383
base: master
Are you sure you want to change the base?
Conversation
Replaces HTML ID generation with a simple monotonic sequence to avoid several expensive hash lookups for every generated ID. This entirely eliminates the chance for collisions under regular circumstances, and the default seed even performs better in some adverse circumstances, like multiple instances of the same library sharing an ID namespace on the same page. The only potential downside is that IDs might not be particularly semantically helpful, but that can be mitigated by just appending that information instead. This might be a breaking change if this module is considered a public API.
✅ Deploy Preview for jsonforms-examples ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
Hello @jwueller, thanks for the PR ❤️ and sorry for the late reply! Clean solutionTo cleanly solve this, the id generator would be handed into JSON Forms via configuration. However, this requires a proper config approach being implemented. This is planned in issue: As it will take quite a while until this is implemented, we suggest the following alternative solution below. Alternative solutionWe discussed this with the team and came up with a potential solution that would allow you and others to override the ID generation when necessary: Currently, the id-generation related methods To make them overridable, we propose exporting a constant singleton object This requires adapting all current calls of the id-methods to call them from the new For this, introduce the object in // [...]
export const Id = {
createId: createId,
removeId: removeId,
clearAllIds: clearAllIds
} Callers of the Id methods then import the // OLD
import { createId } from '@jsonforms/core';
// [...]
const newId = createId();
// NEW
import { Id } from '@jsonforms/core';
// [...]
const newId = Id.createId(); To override and implementation, i.e. of import { Id } from '@jsonforms/core';
function myCreateId() {
// [...]
}
Id.createId = myCreateId; If you want to contribute this, we would gladly accept this refactoring. It should then allow you to provide your own random-based ID generation in your project :) |
@lucas-koehler Maybe you can help me understand, because I'm slightly confused. The IDs are still very much predictable. Or are we talking about the randomized namespace prefix? That can be overwritten, like I do in the tests. The randomized default namespace just reduces the risk of naming collisions in edge cases, but is not strictly required. Or are you saying that IDs need to retain their exact structure? Because that structure inherently can't scale well, because it requires coordination between IDs, which will always be slow. Also, relying on IDs to have a certain structure seems like a significant anti-pattern, considering they have to be unique on the entire page, and can't be predicted reliably across multiple forms already anyway.
Yes, passing the generator function into the config would certainly be the best solution, but is also the most disruptive. I just wanted fast generation that we can actually use ASAP, because the current ID generator scales so poorly, that it takes up the majority of the entire application compute time once you have enough forms on the page.
This is an okay compromise, but will still be very slow by default. I can make a separate PR for that if it's the only acceptable solution, since the main goal of this one is to make it fast. |
Hi @jwueller, You're right, the suggested
Yes, we can't change the ID generation in a minor version as this will break downstream in an unexpected manner. We know adopters who rely on the current unique ids to programmatically identify and/or style special inputs. In many use cases this is not an anti-pattern as adopters who rely on this mechanism are using JSON Forms in a predictable manner. In practice, the use case of rendering thousands of inputs at the same time is already a nice use case. In the more complex renderer sets (e.g. Material UI), rendering is then already so expensive that the use case is out of question already anyway. The significant slowdown will only appear in cases in which there are not only thousands of ids, but they actually have to be exactly identical ids. This can only happen if you render the exactly same form thousands of times. I did a quick performance test:
So even with 1000 identical duplicates, the penalty is barely noticeable compared to the rendering time needed to render 1000 inputs. The only very bad case is once you have over 1000+ identical duplicates. This is certainly an niche use case. Therefore it's not a good enough reason to completely change the id behavior within a minor version. I'm fine with changing the id generation with the next major version though, especially when making it configurable.
This is really only true in use cases where you have >1000 identical ids. What kind of use case do you have?
We would happily accept that PR as this just makes JSON Forms better for everyone by adding more options. There is also the other alternative to speed up the current implementation without changing its behavior. For example using the following improved implementation we avoid long string hashes and instead use a more performant number set const idMap = new Map<string, Set<number>>();
function createId(proposedId) {
if (!idMap.has(proposedId)) {
idMap.set(proposedId, new Set());
}
const usedIndices = idMap.get(proposedId);
let index = 0;
while (usedIndices.has(index)) {
index++;
}
const newId = index === 0 ? proposedId : `${proposedId}${index}`;
usedIndices.add(index);
return newId;
} Using this approach, I get the following performance numbers:
The worst cases is significantly improved by a factor of 2400% from 12 seconds to 450ms. Of course the delete use case becomes a bit more complicated. So if this is sufficient performance for you, feel free to contribute this improvement with testcases, as it's more complicated than before. Edit: I just realized that the suggested performance improvement slightly breaks the uniqueness guarantees. In case where we have properties which are named These cases need to be additionally handled by recognizing such a number as an |
Hi @sdirix,
Yes, this is definitely a breaking change if you consider IDs part of the public API. Although if data about the property name is required by CSS or other post-processing, then it might be better to move those into
Are they really that predictable, though? The fact that IDs depend on the order that they were generated in makes them pretty hard to predict if you have more than one form on a page. For example, since the IDs basically always boil down to just a property name and a numeric suffix, something as simple as
I agree that this might not be the primary use-case, but in our case we have a lot of list items, each containing rather simple forms, but they tend do be similar in structure. This means that there is also a lot of ID collisions, because most of them happen to use common names for their properties.
It seems that you're getting much better numbers than this example did, which is surprising to me:
Lots of small but similar forms in a larger list structure, as described above.
Yes, it's definitely possible to speed up the current externally visible behavior. Would you prefer a separate PR for a backwards-compatible solution? |
Interesting, you are rendering all these forms at the same time? I would have expected more of a tree-master with detail approach in use cases like that.
The discussion post is not very detailed. "1000 forms" could mean anything, from 1000 identical ids up to 10000, 100000 etc. Besides that it obviously depends on the executing machine too.
I'm fine either way, i.e. a new PR or force-pushing here. First I would recommend opening a PR just changing the id structure, i.e. with the suggested object approach. That would already allow overwriting for your case. |
Replaces HTML ID generation with a simple monotonic sequence to avoid several expensive hash lookups for every generated ID.
This entirely eliminates the chance for collisions under regular circumstances, and the default seed even performs better in some adverse circumstances, like multiple instances of the same library sharing an ID namespace on the same page.
The only potential downside is that IDs might not be particularly semantically helpful, but that can be mitigated by just appending that information instead.
This might be a breaking change if this module is considered a public API.
Fixes issues like this: https://jsonforms.discourse.group/t/expensive-id-generation-mechanism-with-createid-and-isuniqueid/1737