-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Account ID structure updates #140
Comments
Points 2 and 3 listed above are actually done now (for point 2, we did add an extra bit but for now left it unused). So, the only thing remaining here is reducing the ID size to 60 bits and adjusting PoW threshold accordingly. |
One of the negative aspects of the current account ID generation scheme is that we kind of "waste" the work that goes into generating the IDs (it is not technically wasted as we do get more security from the operators trying to highjack an account ID, but that's just "one-time" effect). There are some ways in which we could make this work useful - for example:
Below is a proposal on how we could alter the account ID encoding scheme to make the PoW more useful. First, we still maintain the basics of the current scheme in that to derive an account ID we start with
Then, we'd impose the following rules:
The additional rules for the lower 32 bits of a faucet ID are as follows:
Some benefits of the above scheme:
Potential downsides:
|
As a side note, I always get confused by this. The account id adds to the security of the hash. That is to say the attacker needs to both, find a result with enough With that said, we probably should still make this configurable, to maintain the cost in the infeasible area as hash power increases (80bits should be plenty, but it is in the low end). Ideally this should be done in such a way that a user can change the setting, and the protocol is agnostic to it. This has the benefit of giving control to the user, and removing the need for the protocol to estimate or agree on hash power. We can pack the PoW requirements in the account id. Since we use one of the Felts in the Digest to encode the AccountId, we have a max of 3 Felts to do PoW for a total of 192bits, meaning we need a max of 8bits to encode the PoW requirements, that leaves about 53bits for the account itself, that is about Edit: This could also be a nice way of getting rid of the patching for the tests. Meaning we can just configure the test to have a very low PoW requirement, and assume users will actually use a high value. The downside of this is that we don't guarantee security, only allow for it, and a improperly configured client could result into problems (not sure how bad this is, it is basically always the case) |
We could also pack a protocol version in these bits. Instead of using all zeros, we could use something like |
Maybe lets make this a first class concept and add a
A few notes about the ticker:
|
I like it! I don't think we need 8 bits though - 6 bits should be enough. With 6 bits we would get up to ~128 bits PoW (64 bits from the account ID itself and up to 64 bits from the extra PoW specified via these 6 bits.
This would be a bit more tricky as we probably can't "allocate" too many bits for this (e.g., probably not much more than ~8). I'm also not sure this is needed: if I can create an account ID for a given combination of storage and code on one chain, it may actually be convenient to create the same account ID for the same combination on a different chain.
Yep - agreed. I'd probably call it something like
I thought about this, but think it would be too risky specifically because of your last point: we don't want to mediate who gets
We have some of this already. For the sample faucet contract, some of this info is stored in slot 1 of the account storage (this currently also includes the ticker symbol).
This could work for regular accounts but if we do want to encode extra data into faucet IDs, we would still need to have special treatment for them. |
Also, I just realized that the scheme I proposed for regular account IDs actually doesn't give us any extra security beyond 64 bits. It does create extra work for the original account creator to find an ID with 16 trailing zeros, but for the attacker, they jus need to find a 64-bit pre-image and if they use random search, it doesn't matter whether the last 16 bits must be 0 or not. Same thing actually applies to encoding ticker symbols. Basically, encoding extra info into the ID itself, does require extra work for the user, but I don't think it provides any additional security against a potential attacker. Not sure if there is a way around this yet. @Al-Kindi-0 - curious what you think. |
To expand on my comment, hash is a one-to-one mapping, imagine the worst hash function possible, something like the |
But wouldn't the pre-image found by the attacker go through the "the last 16 bits must be 0"-check at some point for this to be useful for the attacker? Or does any pre-image do? |
I think this boils down to whether we are concerned about collisions or only pre-image resistance. I think in the current context collisions are not a problem and thus 128 bits for the seed should be fine. |
Usually, a hash function compresses the input in some way and the simplest such case would be a 2-to-1 compression. This is clearly not one-to-one and hence I am probably missing something in your comment. |
The way I think about it is this: For the user, the task is to find For the attacker, the task is to find |
Exactly, the hardness of finding a preimage is dependent only on the size of the image space (i.e., where |
So, this seems to suggest that the amount of data we can encode in the ID is pretty limited because otherwise we'd impose too big of a computational burden on the user. Basically, we have a trade-off:
The PoW requirement for the user would be the sum of PoW for both items. For the attacker, the security in bits would be 64 + PoW from item 1 above. So, for example, if we want to have 80 bits of security against ID highjacking and 16 trailing zeros in the ID, we'd need 32 bits of PoW for the user - which I think is too much. I think we do want to have at least 80 bits of security against ID hijacking - which means that we are probably limited to:
|
My bad, I was trying to paint a picture using the |
Agreed - though I think the layout to accomodate 120-bit IDs would need to be different from what is in the diagram.
Indeed, this is the most unclear case to handle. I guess a couple more questions on this:
|
This is what I tried to illustrate with the collision resistance part. Increasing the Account ID decreases the room for the Non Fungible Asset (NFA) in the word, so some examples for that trade-off:
* Cost is for 1/1000th of the total required storage / compute cost to build the rainbow table (as done here). You discussed some cost estimations above and I tried to extrapolate from those:
The compute and storage estimation was then calculated this way: compute_estimate = lambda x: (300_000 * (2**x) / (2**60))
# 10 dollars for 1 TB
# Divide by 1T to get number of terrabytes
# Divide by 1000 to get 1/1000th of the cost
storage_estimate = lambda x: ((8 + x//8) * 2**x) / 1_000_000_000_000 * 10 / 1000 It's worth mentioning that in your comment (linked above) you mentioned that
while my estimations here seem lower. If the calculations and assumptions are correct, then building a rainbow table for 1/1000th of the possible Account IDs at 80 bits would cost $217B + $314M USD. I guess we also want a safety margin here to account for some major improvement in the future, say 1000x. Then an account worth more than $217M USD would be worth highjacking by an attacker with this table with 1/1000th of a chance and only if the account was not already registered on chain, iiuc. That seems acceptable to me given the mitigating circumstances under which this could happen. Do you see flaws in that logic? On the other end of the spectrum, with 120 bits for the Account ID we definitely couldn't represent NFAs in one word as the collision resistance for them would be too low. |
As briefly discussed online, seems like we have roughly 2 options here:
I am leaning somewhat towards the conservative option - just os that we don't have to face this problem again in the future. Assuming this, I think we have a couple of outstanding questions:
|
I agree with the conservative option simply for safety and future-proof reasons.
I agree. Stages sound like a good approach.
Note Metadata
The layout I sketched was for 96-bits, but I wasn't very clear there - sorry. To actually accommodate 120 bits comes with quite a few constraints when it comes to the note metadata. This is a proposal for a layout which should be possible.
Proof of Work
Is there some documentation how you came up with these numbers here (trailing zeroes and min ones)? miden-base/objects/src/accounts/account_id.rs Lines 108 to 125 in 6f54100
That would be helpful for looking into this.
Do we want to make this potentially user-configurable or protocol-configurable? Also can you remind me: Is PoW used for anything other than spam protection? Asset LayoutIf what I wrote is roughly what you also had in mind, then I think one of the last questions is the asset layout and if we should orient the design with the AggLayer? I tired to find some docs on this but wasn't able to so far. There seems to be little information. |
I would increase account size in a follow-up PR.
I am thinking two felts. For example, when we store account header in kernel memory, currently the ID is stored in the same word as nonce and looks something like
Yep, that should be pretty small.
Yes, agreed - this seems like this simplest solution.
I think requiring that the top most significant bit is 0 is the best option here (and even if we switch to 1 second block times, we still have almost 70 years to address any potential issues). It is a bit annoying - but I don't see a better alternative.
There are probably discussions in issues somewhere - though, may be difficult to track down now. But from memory:
Overall, the main purpose of PoW was to make sure people can't grind too much on account IDs - but if we explicitly prohibit collisions on the first 64 bits, then we could try to get rid of the PoW entirely. Basically, as soon as one account with a given 64-bit prefix is recorded on-chain, no other account with the same prefix can be recorded. The only potential downside of this is that people may create colliding accounts on accident - but that should be pretty rare (1 in 4 billion chance?) and also we could add some ways to mitigate this. |
Is this because of the account ID being the key in the account DB sparse merkle tree?
Getting rid of PoW entirely would be nice, because in my mind it has a bad public reputation due to being a waste of CPU cycles. On a practical level it would also speed up the account creation process and as a consequence we would no longer have to differentiate between testing and non-testing scenarios, which is also nice.
This seems ideal to me but I'm wondering about an attack similar to account id highjacking here. Before we said that highjacking is possible for 64-bit account ids due to rainbow tables being feasible for this security level. If we allow a 64-bit prefix to be registered once, then an attacker could not actually highjack an id, but they could make it inaccessible (DoS) by registering one of their ids with the same 64-bit prefix as the to-be-attacked id, before that id appears on chain. That takes the same amount of work as highjacking a 64-bit account id, iiuc. Regarding implementing this existence check: |
Yes, it is somewhat related to this. If account IDs are 64 bits, then we can use a
I'm not too concerned about reputation of PoW here because here it would be done by end-user devices and should be nearly imperceptible. But being able to use the same code for testing and production is definitely a big benefit.
Good point! I didn't think about this.
It would work slightly differently. The user won't need to do anything extra (in fact trying to prove non-existence of an account would be very difficult for the user because the root of the account tree may change by the time the transaction makes it to the block producer), but the block producers would preform this check before inserting a new account into the account tree. This check would actually not be too difficult and use the advice provider in the way similar to what you described.
The root of the account tree would change after every transaction - and so if one transaction modifies the tree, the proof of any other transaction would become invalid. This is the reason why verifying consistency of the account states is left to the block producer. Overall, it seems like we have two options here:
|
I think one of the last outstanding question is the account ID generation process and the two options you mentioned (the other outstanding question being whether we want to significantly change the asset layout or just expand it to allow 120-bit account ids to be contained). I think from a UX and DevX point of view the option without Proof of Work would be preferable due to better ID generation performance and not having to differentiate between testing and non-testing code as well as for efficiency as we can use Block Producer Account ID Existence Check
I wanted to understand this a little better and looked deeper into the node. If I understand correctly we're currently using this Assuming we add a way to determine if the account we're iterating is new, would it be possible to assert here that the Account ID front-running
There is, but given that an adversary can now no longer highjack an account but only DOS it makes the attack less attractive. The adversary doesn't gain any assets of value but only prevents the actual owner of accessing it and the potential assets that were sent to that account. If the sender of the note would use the P2IDR script then a dos'ed account's note could just be recalled and the process tried again, so not much harm done. Account ID Generation The simplest account id generation process would be similar to the one we have now but without Proof of Work. We would generate a random We might still need a couple of rounds of generating random seeds to satisfy the criteria of an account ID like storage mode and type. As mentioned before, an addition to the current validation of IDs would be the restriction:
The "entire value" here being any other arbitrary value like the note type and note execution hint tag. Still this scheme would be super fast to generate an ID and its conceptually very similar to what we have now so the changes should not be massive. With this the chance of a collision is 2^60 assuming 120 bit IDs. Accidental collisions should practically never happen thus protecting users and the account SMT. As you mentioned earlier:
Regular accounts currently require 55 bits and faucets 63 bits of work for a collision. Requiring 60 bits of work with this proposed scheme to produce a collision makes me think we can unify the ID generation process for both account types and make handling the different account types easier on that front. So put simply: Isn't the account ID generation with POW that we have now roughly equivalent in terms of collision security to the same process without POW but increasing the ID space to 120 bits? |
Agreed. This is an expensive attack which doesn't yield any direct benefit to the attacker (though, we shouldn't underestimate potential of indirect benefits). There may also be some additional ways for us to make this attack difficult. For example, we could integrate block epochs into account ID somehow (e.g., set bits 48 - 64 of account ID based on the block the account was created at), and then compute the account ID base as Not sure if the complications associated with this are worth it though.
Yes, this check shouldn't be too difficult to add. This does still make it possible to generate a bunch of account IDs with shorter common prefixes. For example, someone could try to generate 100M accounts all sharing the same 32 bit prefix. Without additional PoW, this would take roughly 45 bits of work. Inserting this many accounts under a single subtree may make storage less efficient. But maybe that's OK. In general, one of the reasons for PoW was to make account creation process relatively expensive (e.g., to cost a few cents) so as to discourage spam. But maybe I was overthinking this. Assuming we go with 120-bit IDs, let's sketch out how the ID would look like. The most straight-forward way is to take the current approach and just add extra bits - e.g.,:
But we could also have other arrangements: each element is 60 bits (with top 4 bits set to zeros). Also, maybe we should allocate some number of bits (e.g., 8) to account/id version - in case we want to make changes in the future. For faucets, we could also encode things like ticker symbol + number of decimal places into the ID - but not sure if that's a good idea. One other thing that I was thinking about. Maybe even with 120-bit IDs we can keep assets to 1 word. For fungible assets this is straight-forward. For non-fungible assets, we'd loose the ability to tell from the asset itself which faucet it was issued by. But maybe even here it is not such a big issue. Specifically, maybe we still can set one of the asset elements to the first 64 bits of the originating faucet. This should be enough to uniquely identify the faucet as we won't allow accounts with the same 64 bit prefixes. The collision resistance for the remaining bits would be 96 - which is probably fine. |
Block Hash Dependence
I guess this refers to an earlier comment:
The creator could pick any block hash from that epoch, right? It is generally possible for a transaction to reference any earlier block (as long as expiration delta is 0), correct? This means there is no time window in which the creator must act before their account ID would become invalid, which would be a big UX annoyance. That approach would be very effective in ruling out the kinds of targeted DOS attack that we've discussed. It would not be a solution for the "chain-wide attack", where many accounts with the same prefix are created. That could be solved with fees. As we've discussed online, with the introduction of fees it would be fairly straightforward to put a higher price on account creation. But it would not be high enough to deter these targeted DOS attacks. It seems to me we need a combination of these two approaches to rule out both kinds of attacks. The block epoch approach together with a note specifying an expiration delta could prohibit someone from creating their account in some circumstances. If an account-creating transaction references block 1000 which is the last block in the allowed epoch, the note this transaction consumes specifies a delta of 5 and the latest block is 1010, then this transaction could not be included in the chain. I assume that this would not be a frequent problem, but when it happens it could likely lead to loss of a note's assets that were sent to that account ID. That is a caveat that the sender would have to be aware of to mitigate effectively, like always sending recallable/reclaimable notes to accounts that are not yet on-chain. Overall it seems like a footgun we should avoid if possible. Block Hash Dependence With Proof To fix this maybe we could reference blocks through a proof instead, so we no longer depend on the block hash input to the transaction. Thinking through the example above, with this approach the note could be consumed in block 1009 but the proof could reference block 1000. The other properties of the approach should be the same. For an attacker to highjack/DOS a specific ID which is bound to epoch X, they can pick one block hash from epoch X and then starting hashing many seeds. For any other epoch Y, they must repeat this process so it is prohibitively expensive. They also cannot know any block hash from a future epoch Z before that epoch has begun, so it's not possible to precompute IDs. 64-bit IDs One question this also raises is if we were to go with one of the approaches of using the block hash, could we keep account IDs at 64 bits? In this context you first discussed this idea. First Felt Layout
Assuming 120 bits again but in the context of the above approach, one interesting question is whether we should actually control parts (like the 16 bits epoch) of the first 64 bits so we can force account ids of certain epochs into certain subtrees in storage or if we want it to be completely random. I'd say random is better since the number of IDs per epoch would be dependent on chain usage and this will not be uniformly distributed. Account ID Version
I like the idea of having a version, but is 8 bits necessary? I'm wondering because versions are usually at the very front of a layout so adding 8 bits for version would again reduce the random part of the ID further. Maybe 8/16 versions (3-4 bits) is also fine? If we actually get to 8/16 versions then it should be possible to designate the last version as an extension of the version, so that It also depends on whether we require the storage mode, type and version to be generated directly from the seed or if we simply overwrite the first x bits with those parts. If the size of the version is significant, then we'd effectively introduce non-trivial proof of work again. I guess we can overwrite parts of the ID, we just need to make sure that it is still a valid felt. If that was the only reason to generate those fields directly from the seed so far, then we wouldn't have to worry about it initially as long as version is small (like 0), since the top x bits would be 0 and so the entire element would always be valid. Second Felt Layout Regarding the second felt being valid I previously suggested:
Which was to accommodate the layout of the note metadata. But I now realized that is only a useful description if we still want to grind on the ID as it gives flexibility. If we don't, which I would prefer, then just picking one of the higher bits to clear is best to ensure felt validity. I'll just pick 0 arbitrarily. Layout Suggestion So building on your suggestion, I would suggest:
One-Word Non-Fungible Assets
As discussed online, I agree that should be possible and would be really neat. |
That's roughly how I imaged it would work. Though, I was thinking the user would "anchor" only to "epoch" blocks (i.e., blocks with numbers in multiples of So, basically, when creating a new account, I would be free to choose any epoch block from the past (even the genesis block) and then would prove locally that (via authenticating against the MMR) that I've used that block's hash as input into the account seed computation. This should work pretty well. The only potential complication is that the user needs to pick and prove some block from the chain, but if we allow using genesis block, that shouldn't be an issue.
Yeah, 48-bit space is just too small to be comfortable. With 80 bits, this could be workable (i.e., set the 16 least significant bits to the epoch number) - but since this requires 2 field elements, the benefits vs. 120 bit IDs are not too significant (I think).
Yes, I'd say having a more uniform distribution is better.
I don't think it matters if we put the version in front or not - as long as it is in the first element. I would probably put it into the least significant bits to make ID distribution more uniform (same can actually be argued for the other "control bits").
I haven't spent too much time thinking about it, but I think overwriting reduces the amount of work one needs to generate colliding IDs. Not sure if that's an issue here though.
As mentioned above, we could move the version and other "control bits" to the end. This would make the distribution of IDs more uniform. The only thing that would get affected by this is An alternative suggestion of the layout:
Assuming we generate valid field elements during hashing, we don't need to set the top bit of the 2nd felt to 0 as resetting the lower 8 bits will always map a valid field element to a valid field element. |
DOS Attack I don't think I fully understand how this scheme would prevent the targeted DOS attack.
Unless I misunderstand something then this wouldn't force an attacker to recompute the rainbow table every
But for the latter, allowing any block from some epoch Y to be used where Y <= X would also allow an attacker to use the genesis block's hash afaict.
Why is it complicated for a user to prove a block, and why is proving genesis easier? The only thing I can think of is that recent block's MMR proofs will typically change more frequently because newer MMR peaks are smaller and are combined more frequently than older ones. But I guess you would just ask the chain for an up-to-date proof anyway when constructing the account-creating transaction, and the chain should have blocks from recent epochs available I assume. Proof of Work
Yeah I think so too. The only reason to keep IDs short is to be able to embed them fully in other layouts. Since we can uniquely identify accounts with 64 bits this may not be a real concern though. That means whether IDs are 80, 96 or 120 bits in total doesn't matter for embedding an issuer in other layouts, like non fungible assets.
I think you're right. Assuming we put the epoch in the second felt and 4 bit versions, 2 bit storage mode and 2 bit type in the first felt, we'd have only 56 random bits in the 64 bit prefix. And those would only require 28 bits of work to find a collision. Doesn't this also mean that we need some mechanism like proof of work to make finding a collision on the 64 bit prefix harder than it is without Pow? Without Pow it should just be 32 bits of work to find a collision. On an RTX 4090 with 140 megahashes/s it takes 30s to compute such a collision, if my math is correct. This is so short that the block hash dependence doesn't matter either. If this is true, then:
I hope I've missed something and this is not the case. Version and NoteTag
Good point, that's true.
Sounds good.
Good point, I hadn't thought of this. |
I think I had something like your second point in mind. Basically, the user can use any block as a reference block in the transaction (e.g., i can always reference the genesis block as a reference), but once the reference block is chosen, it would make sense to use the epoch of that reference block as the input into account ID computation. The assumption here is also that the epoch gets copied into the ID (the first element) and so the attacker can't pre-compute seeds until they know block hash for a given epoch. To illustrate this: let's say we set the lower 16 bits of the first element to the epoch value. Say this value is
In the transaction kernel, the user would have to verify that
Yeah, the main complication is that you'd need to get some info from the chain before creating an account (which may complicate testing too). But if you always have an option to create an account against the genesis block, this is somewhat mitigated as you don't need to sync (assuming genesis block is hard-coded into the client at some point).
Collisions are a problem only for DoS attack (e.g., someone tries to create many IDs with the same prefix). They are not that much of a problem for the "highjacking" or "frontrunning" attacks (e.g., someone wants to generate some specific 64-bit prefix) because this requires 64 bits of work. So, the only option here the attacker has is to pre-compute a rainbow table. But I agree, from DoS standpoint, it is not great that generating a bunch of collisions is not that expensive. So, maybe adding a small proof of work (e.g., 16 bits) is a good idea after all (this should be negligible for the user, but would still be annoying for tests). I we are willing to accept 16-bit PoW and want to add epoch anchoring, the layout could look like so:
The ID would be derived from:
Here, the user would specify code and storage commitments as well as an epoch block. Then, they would perform PoW on the seed until the
I'm not sure it needs to. I think we made use of the ID structure to "pack" one extra bit into the tag - but not sure if that's all that important. |
Block Hash Dependence
I think linking the transaction's reference block and having to use the epoch block from the reference block in the account ID computation has this UX problem I described earlier:
One possible account ID generation for an end user could look like the following, assuming the approach you described:
So linking the reference block and the epoch block introduces this caveat for account IDs that are generated offline and are only (much) later registered on chain. If we just require that the user provides a proof to some arbitrary epoch block of their choosing, then, afaict, we would avoid this problem. This would still allow the user to pick the genesis epoch block if they want to (which we could do for testing scenarios), but we should discourage that for real-world usage, because an attacker could build a rainbow table based on Proof of WorkFees
Ah, thank you! I had a hunch that there was an error in my logic somewhere, but couldn't figure out where. That's good news to me, because then I still have hopes we can avoid significant PoW. If we only need to use it to make it hard for attackers to produce many IDs with common prefixes so we can protect storage efficiency, then isn't that better solved by making account creation reasonably expensive with fees? I think we should be able to find a sweet spot between making it very expensive to attack the storage which can only be done with a ton of accounts, and still making legitimate use cases where a few accounts are created cheap enough. Account Storage And if that is not a solution, I'm trying to understand the issue with storage better.
Layout without PoW If we can avoid dedicated PoW, regarding the layout, I'm wondering if we can move the epoch to the second felt. This is to avoid having to grind a seed that produces a hash with the desired epoch. We only need the epoch to verify during account id creation, beyond that it is basically useless. It doesn't really tell anything about the ID other than that an ID with epoch X was created before epoch X+1, because any available epoch block could've been chosen. So it seems optional to have it in the first felt. In that case we would allow overwriting the 16 epoch bits of the second felt so those bits don't incur PoW. We'd have some non-zero but still insignificant PoW left. For the first felt we'd have the 4 So would the following layout be viable?
I'm still not quite sure if we can't just overwrite all parts of the ID with control bits and epoch so we don't need to grind at all. It does lower collision resistance, but if we can protect against various attacks with cryptographic (block hash + proof) or economic (fees) means before collision resistance matters, maybe it's fine? |
Ah yes - agreed that your proposal is better.
Generally agreed that handling this via fees is probably better, but 2 comments:
Yes, the main concern is for the future when the tree gets too big to keep it all in memory. Probably not a concern until we have more than 10M accounts or so. But once the tree is big, having leaves more evenly distributed allows for various storage optimizations.
I'd probably go with the second option here (i.e., second felt But also, I'm still thinking that putting the epoch into the first element may be better as it makes it very difficult to build a rainbow table for all 64-bit prefixes. Maybe we do this w/o PoW for now and then introduce PoW later, if needed. |
I don't understand this point. If we put the epoch in the first element but don't do PoW, meaning we overwrite those parts, then it makes it easier to create a rainbow table, not harder, or am I misunderstanding? Aren't the options we have to:
(Note that in practice I'm seeing an average of 20 bits of work for the first option and 7 bits of work for the second one, not sure yet where the difference comes from. But even so, the former would be quite significant which is why I don't think it's a viable option to put the epoch in the first felt with PoW.) I've now started to implement what we're discussing with this layout:
During grinding, I'm only validating that the first felt meets the requirements here, while the requirements of the second felt are ensured by setting the bits to the expected values (i.e. "overwriting"). We can still adjust things of course, like if we actually can put the epoch in the first felt, but I wanted to get started with something. |
Yes, that's correct. I was just thinking we could introduce PoW later (e.g., closer to mainnet) and for now skip it to simplify testing.
Sounds good. Let's go with this approach for now and we can adjust later if needed. One though about the specific layout: we can probably put the epoch into the top 16 bits of the second element (epoch with all ones is not going to happen until very close to the exhaustion of the block number address space).
Other things we could add to the ID:
Though, maybe these are better handled at ID format/encoding level. |
Great point! Used that approach now.
I think this would be better handled by some additional encoding layer like bech32. The RPC API could receive such an address and verify network and checksum and then pass only Not sure yet if bech32 is the best format here, but this could look something like:
|
One thing that's constantly on the back of my mind during implementation is whether we really need 120 bit IDs, so I want to recap our rationale again. Please let me know if this is an accurate summary. 120 bit ID rationale
64 bit prefix uniquenessAll of the above is important for the creation process of the ID. But since we ensure the 64 bit prefix is unique, we said we can use just the prefix as the faucet of non fungible assets. Where else is that the case? More generally, when do we actually need to use the full 120 bit ID and when is the 64 bit prefix sufficient? Once the ID is registered on chain, could we not essentially use just the 64 bit prefix from that point on, since it is unique? Post-creation the ID is no longer used for "authentication" in the sense that it is only used as an identifier but we no longer prove anything about it (e.g. that we have a seed and other values that hash to that ID). Instead the authentication is done via actual signatures. The only important property of an identifier is uniqueness and that is the case with just the 64 bit prefix. If we could use the full ID just for the creation and the prefix beyond that for everything else, that would keep many things similar as they are now. Any thoughts on this? If this is not possible for some reason, then my specific question would be whether we need the full ID or just the prefix in fungible assets and |
I think there are two related but separate attacks that we are trying to prevent:
The first attack doesn't worry me too much because it is applicable only in a very narrow set of cases and there are possible workarounds. For example, it works only if the notes directed to account The frontrunning attack is a bit more troublesome as it can be used to disrupt the network. E.g., people try to create accounts, but the attacker is faster and can create a "blocking" accounts before them. The attacker doesn't benefit from this attack directly (in fact, they'd need to pay a lot of money to execute such an attack), but maybe disrupting the network leads to financial gains elsewhere. It is hard to say where the realistic threshold lies and it would also depend on how much is riding on Miden - but ideally, we'd want to make sure that an attacker willing to "burn" $100B USD, cannot cause significant disruptions to the network (though, with such resources, there are probably many other ways to cause network disruptions). The main way to execute both of the above attacks is to pre-compute a rainbow table with either all or some meaningful fraction of 64-bit prefixes. Currently, and even for the next few years, this is not very practical: assuming we enforce 9-bit PoW building a table with all 64-bit prefixes would require 73 bits of work (though, would be good to double-check my math) which would cost over $1T USD. But as technology improves, the costs will come down (e.g., bitcoin network does 95 bits of PoW in a year). So, even if we keep IDs to 64 bits and if we could impose 9 bit of PoW now, that's pretty safe. In the future, we could increase PoW to 25 bits, and that should be good enough for medium-term future (e.g., 5 - 10 years). But 64-bit IDs are not future-proof. So, the way I think about it, we have 2 alternatives:
|
And to answer your questions more concretely:
I think this would basically be equivalent to keeping 64-bit IDs (we can impose a bunch of extra conditions on the part of the digest that does not become the ID - but for all practical purposes, w/e is used in the rest of the protocol is the ID). As I mentioned in the previous comment, this is probably fine for 5 - 10 year timeframe, but not future-proof beyond that.
As I mentioned in the previous comment, 64-bit prefix uniqueness property makes the account frontrunning attack still possible. So, I think of this as a potentially temporary restriction that we may need to relax in the future.
I would use the full 120-bit ID everywhere except the non-fungible asset definition. Unfortunately, it does affect a lot of places - but curious if you see any specifically problematic ones. |
64 bit uniqueness
Thanks for clarifying that. What I missed before is that the frontrun attack is possible with a (seed, genesis_block_hash) combination if the epoch is encoded in the second felt, as we've landed on in the latest layout. This is because the attacker only needs to have a match on the 64 bit prefix and the block hash dependence doesn't matter because the epoch is not part of the prefix. Regarding the math, assuming our 120 bit scheme, then our prefixes are 64 bits long and have no trailing zeroes requirement, so no extra PoW from that. We require specific bits to be set a certain way, i.e. one zero bit, 4 version bits, 2 bits for storage mode and 2 bits for type.
So from what I can tell, this doesn't impose additional work for someone searching for a specific 64 bit prefix. It only imposes additional work (2^9) for someone wanting to generate any 64 bit prefix with those restrictions. Based on the earlier estimation in this thread of 2^60 hashes costing $300K, 2^64 would be just $4.8M but storage cost would be $2.95B (both costs assuming all prefixes, not a fraction of it). Doing this for just 1/1000th of prefixes would feasibly allow a significant disruption of the network. If this is correct, then I'm thinking we might want to make the frontrun attack harder, though I'm not sure how. Semi-great options I can think of:
The 64-bit IDs with configurable PoWFollowing from a brief discussion with @bobbinth, we also wanted to explore the idea of user-configurable PoW again. This is a worthwhile approach because the ID would still be just a single field element of ~64 bits. Some number of bits in the ID encode the PoW requirement (e.g. trailing zeroes). Users can choose their security level by picking an appropriate number of trailing zeroes. There is a baseline PoW which is the Proof of Work that must be done even if the user configures zero additional PoW. This comes from the trailing zeroes, storage mode, type and potentially version being encoded into the layout. For this approach the ID layout could look something like:
The storage mode and type bits are unchanged. To keep baseline PoW very low but still add some way to differentiate potentially future versions of account IDs, we could add two bits for the version (see below for why it is at the end).
Those would allow us to set Highjack and Frontrun attacks An attacker wanting to highjack or frontrun another ID must have a pre-computed rainbow table which matches the trailing zeroes requirement of the to-be-attacked ID. The more bits we allocate for those trailing zeroes the less likely that is. Here it is actually harder than in the 64 bit prefix discussion before, because the trailing zeroes are part of the larger 256 bit hash, and so computing a 64 bit prefix requires Another way to put this is that an attacker can predict what likely future account IDs look like. For example, they might estimate that account IDs with 30 trailing zeroes become highly used in the future and so they precompute a table for exactly that number. Then if users happen to increase their security requirement to that number, they will be susceptible to an attack. It might not feasibly allow an attacker to target a specific account ID, but it would allow them to disrupt the network for all account IDs with their chosen number which is still bad. Comparing this with the block hash dependence with 120 bit IDs on the other hand, every epoch that appears increases that "factor of difficulty" by one, in the sense that each new epoch block requires another rainbow table. Another differentiator is that even though an ID of a future epoch X is known, the underlying block hash from which it is computed isn't, which is a really good property to have here. Future PoW requirements As @bobbinth pointed out in the initial discussion, in the medium to long-term future an ASIC might become available that speeds up hashes and in those cases users could pick higher PoW requirements to combat that. And if users can no longer compute such IDs on their own machines due to the high requirements, they can delegate ID creation to services. Version Placement I've moved the version to the very end because I think (and this applies to our earlier layouts as well) it's essential for parsing that the version field is at a static offset. For example, assume the tail end of a layout we had earlier with:
Assume a second version adds a new type and must increase to 3 bits:
In our ID parser, the first thing we need to parse is the version, so we can parse the remainder of the ID correctly. Overall, I would go with 120 bit IDs. They are a lot more future-proof, but we have to consider the frontrun attack. |
Pretty much agree with all of the analysis above. A few comments:
Makes sense. Let's do it.
We can reduce this to I think in the long run, the secure ID structure should look something like this, but for now, even
The above would require about 8 bits of PoW. But at some point after mainnet (or maybe even before if we have some fast WebGpu implementations which can grind through
The above would require about 24 bits of PoW.
We will be using the If we are thinking of using some relatively high level of PoW after all, maybe we could consider slightly shorter IDs as well. Seems like maybe 96 bits should be "good enough" even in the long term? |
This issues combines several proposed changes to the account ID structure as they are all likely to be implemented together. These changes are:
seed = hash(code_root, storage_root, nonce)
and thenseed[3]
becomes the account ID.Given the above changes, we should probably update the grinding conditions to work as follows:
seed[0]
must be zeros.seed[0]
must be zeros.This is because we now require grinding on 8 bits within the account ID itself (4 most significant bits for type/storage mode, and 4 least significant bits for reducing ID length to 60 bits). Thus, the total still remains 24 and 32 bits of grinding for regular and faucet accounts respectively.
The text was updated successfully, but these errors were encountered: