Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dandelion++ Algorithm Changes #2176

Closed
antiochp opened this issue Dec 18, 2018 · 20 comments
Closed

Dandelion++ Algorithm Changes #2176

antiochp opened this issue Dec 18, 2018 · 20 comments

Comments

@antiochp
Copy link
Member

antiochp commented Dec 18, 2018

The Dandelion++ paper talks about nodes choosing stem vs. fluff per epoch and not per tx.
i.e. every 600s (in Grin) we choose a new outbound Dandelion relay and set our own relay to either stem or fluff for the duration of this "epoch".

This might allow us to remove the "patience timer" on all relay nodes set in stem mode.
i.e. We would immediately forward a stem tx to the next relay node.
Only on nodes currently set as "fluff" would the patience timer be applicable.

These would then act as "sinks", txs would flow into them from various stem paths and would enable aggregation until the patience timer fired.
This would provide (temporary) places in the network for unaggregated txs to collect - so we could have a longer patience timer, and intuitively, a better chance of achieving tx aggregation as other txs would not be delayed in prior stem phase nodes on the way to the fluff node.

Right now in Grin we delay txs for 10-15s at _every_relay node. And I think this is actually detrimental to tx aggregation as it makes it less likely for 2 txs traveling along the same stem path to actually meet at the same node at the same point in time. One is always delayed behind the other and will never catch up to the earlier one.

At the start of every "epoch" (every 600s) -

  • select an outbound peer as the next stem relay
  • decide if we are acting as "stem" or "fluff" for duration of the epoch

Then for every stem tx received (from a peer) -

  • if we are in stem mode - immediately relay it to the next stem relay
  • if we are in fluff mode - aggregate txs up until our next patience time expires, then broadcast it

This seems to be simpler than our current approach and reduces unnecessary latency while allowing a longer patience timer, so potentially allowing more aggregation.

Related - #1982.

@lehnberg
Copy link
Collaborator

This would make a lot of sense to me.

The D++ paper also talks about using approximate 4-regular graphs, i.e. every node at the start of the epoch would get ~two inbound connections and ~two outbound connections. With a rule on how to forward transactions from specific nodes (a transaction coming from one inbound node always is forwarded to the same node, as is the node's own transaction).

The end of §4.2 notes:

Lessons. If running Dandelion spreading over a 4-regular graph, use pseudorandom, one-to-one forwarding. If linkability of transactions is a first-order concern, then line graphs may be a more suitable choice.

Are we deliberately avoiding the 4-regular graph approach here due to the concern of the linkability of transactions?

@antiochp
Copy link
Member Author

I believe we went with a simplified "one outbound relay" originally after discussions with Giulia Fanti.
This was to increase the chances of aggregation I believe with some trade-offs in terms of the anonymity guarantees in Dandelion, @quentinlesceller probably knows more about this.
But I do wonder if the reference to "linkability of transactions" in the Dandelion++ paper is a reference to these earlier conversations with us.

@lehnberg
Copy link
Collaborator

How would having one of two possible routes have an impact on aggregation?

Based on the approach proposed in this issue, the likelihood of aggregation will depend on the number of 'sink nodes' in each epoch, i.e. the number of nodes that are set to fluff, and the time it will take for a transaction to end up there.

Saying this, I wonder if the current approach would allow for the creation of "stem-loops" where a transaction is forwarded around in circles between stem-only nodes that are linked in a circle? Is that a real issue?

@lehnberg
Copy link
Collaborator

But I do wonder if the reference to "linkability of transactions" in the Dandelion++ paper is a reference to these earlier conversations with us.

I believe it has to do with the forwarding rule of forwarding multiple transaction received from a node in on the same path.

From the paper (§4.2):

Another issue to consider is that line graphs do not help the adversary link transactions from the same source; meanwhile, on a 4-regular graph with one-to-one routing, two transactions from the same source in the same epoch will traverse the same path—a fact that may help the adversary link transactions.
Hence we recommend making the design decision between 4-regular graphs and line graphs based on the priorities of the system builders. If linkability of transactions is a first-order concern, then line graphs may be a better choice. Otherwise, we find that 4-regular graphs can give constant- order privacy benefits against adversaries with knowledge of the graph. Overall, both choices provide significantly better privacy guarantees than the current diffusion mechanism.

@0xmichalis
Copy link
Contributor

Saying this, I wonder if the current approach would allow for the creation of "stem-loops" where a transaction is forwarded around in circles between stem-only nodes that are linked in a circle? Is that a real issue?

As a circuit breaker, if you are a stem node and receive an already seen stem transaction, you can fluff it.

@quentinlesceller
Copy link
Member

@Kargakis This would allow an adversary to know which transactions are in your stempool.

@0xmichalis
Copy link
Contributor

0xmichalis commented Dec 18, 2018

@quentinlesceller @antiochp in the new epoch scheme are you able to easily tell if your peers are stem or fluff nodes at a certain point in time without having to closely monitor everyone? If so, you could ensure that these "stem-loop" transactions are sent to a fluff node, if you happen to have such a peer, which most nodes could have, assuming the # of connected peers a node has on average (eg. 12) exceeds the percentage of fluff nodes in the network (eg. one out of ten).

@0xmichalis
Copy link
Contributor

in the new epoch scheme are you able to easily tell if your peers are stem or fluff nodes at a certain point in time without having to closely monitor everyone?

A passive observer should be able to identify this easily so we might as well make it part of the peer api.

@sesam
Copy link
Contributor

sesam commented Dec 19, 2018 via email

@antiochp
Copy link
Member Author

antiochp commented Dec 19, 2018

It's not immediately obvious in the Dandelion++ paper but there is a footnote on page 21 -

Our implementation enters fluff mode if there is a loop in the stem.

Other nodes would not know if the node doing the fluffing was doing so because they had seen a loop/cycle (via seeing a tx twice) or if that node had entered a new epoch and was now fluffing.

So I guess this is one valid approach. The other being wait for the embargo timer to expire (but this gets tricky, if not impossible, with fast propagation in the proposed Dandelion++ impl).

Edit: Ooh. I think this maybe explains why we see a lot of embargo timer expirations in our current Dandelion implementation. If we see the same tx a second time in stem phase then I think we just silently drop it and do not forward it on. In which case the stem comes to a halt.
And if our network is not particularly large, we probably have a reasonable chance of encountering a cycle in the stem with 10 hops.

@antiochp
Copy link
Member Author

antiochp commented Dec 19, 2018

How would having one of two possible routes have an impact on aggregation?

In the current impl in Grin? It means we can potentially aggregate txs in different ways and then forward them down different multiple paths. So we run the very real risk of having conflicting txs in the network at fluff time.
We can only safely aggregate them in one single way so we needed to limit the relay to a single outbound peer.

Tx1 comes in and we forward it to peer B.
Tx2 arrives and we aggregate and send (Tx1, Tx2) to peer C.
Tx3 arrives and we aggregate and send (Tx1, Tx2, Tx3) to peer B.
Tx4 now arrives at node C and they aggregate and send (Tx1, Tx2, Tx4) to peer D.
We now have conflicting (Tx1, Tx2, Tx3) and (Tx1, Tx2, Tx4) on the network when they both eventually fluff.

This would not be an issue for Dandelion++ in something like Bitcoin where its being used purely to anonymize the entry point of the tx.
But the tx aggregation means we can only have single outbound stem paths in Grin.

@lehnberg
Copy link
Collaborator

How would having one of two possible routes have an impact on aggregation?

In the current impl in Grin?

No, in your current proposal, with D++.

It means we can potentially aggregate txs in different ways and then forward them down different multiple paths. So we run the very real risk of having conflicting txs in the network at fluff time.

Hang on a sec. The paper proposes a 4-regular graph with one-to-one routing.

A 4 regular graph:

in1        on1
  \       /
   \     /
    NodeX
   /     \
  /       \
in2        on2

Where NodeX has four connections to other nodes, input nodes in1 and in2, and output nodes on1 and on2. In the one-to-one routing approach, at the start of each epoch, NodeX:

  1. Picks one of on1 and on2 to use as a route to broadcast its own transactions on;
  2. Picks one of on1 and on2 to forward any txs from in1, and similarly a route to forward txs from in2.

These routes are kept consistent and do not change throughout the duration of the epoch.

So now going back to what you describe:

It means we can potentially aggregate txs in different ways and then forward them down different multiple paths. So we run the very real risk of having conflicting txs in the network at fluff time.

Based on the above behaviour, wouldn't multiple transactions coming in to NodeX from the same source in the same epoch, traverse the network along the same path? And so wouldn't then multiple conflicting txs (coming from the same source) end up at the same "fluff sink node"? And therefore simply cut through?

@sesam
Copy link
Contributor

sesam commented Dec 19, 2018

Based on

We must also account for an artifact of our experiment setup, namely that short cycles in the stems are more likely among our 30 well connected Dandelion++ nodes (6). We therefore parse debug logs from our nodes for each trial in order to determine the effective stem length, which we then use as the basis of our evaluation.
(6) Our implementation enters fluff mode if there is a loop in the stem.

which I read as (6) being about an "artifact of our experimental setup" - bug/feature?

And what @antiochp said:

Edit: Ooh. I think this maybe explains why we see a lot of embargo timer expirations in our current Dandelion implementation. If we see the same tx a second time in stem phase then I think we just silently drop it and do not forward it on. In which case the stem comes to a halt. And if our network is not particularly large, we probably have a reasonable chance of encountering a cycle in the stem with 10 hops.

It does seem we might have a privacy leak currently.

Scenario:

  1. Mallory intercepts a single "partially" stemmed transaction tx_123.
  2. Mallory sends tx_123 to some node A
  3. through another connection to A, does now Mallory get to learn whether A laready before knew about tx_123? I think with current grin this is a privacy leak. I had a look into pool/src/transaction_pool.rs and servers/src/grin/dandelion_monitor.rs, maybe I didn't look carefully enough.

Possible defences?

I think we can't differentiate between "new, should be stemmed" and "partially stemmed, in the process of stemming"? That might be a bad idea anyway.

Maybe make sure a node knows if a node is our upstream stemming peer, and if a tx is seen twice (logically from different upstreams, or else one of them is faylty) from upstreams we fluff, otherwise we don't reveal that we've seen that tx before and just send it to our downstream stemming peer.

@antiochp
Copy link
Member Author

And so wouldn't then multiple conflicting txs (coming from the same source) end up at the same "fluff sink node"? And therefore simply cut through?

We cannot cut-through conflicting txs.

(tx1, tx2, tx3) and (tx1, tx2, tx4) cannot be cut-through and cannot be de-aggregated (unless we have seen tx1 and tx2 prior). There is no way of recovering either tx3 or tx4 in this scenario.

So we have to drop one of them. Which results in tx3 or tx4 never getting confirmed.

@lehnberg
Copy link
Collaborator

To help my understanding, based on the setup outlined above (4-regular graphs with one-to-one routing), could you give me an example of how two conflicting txs ((tx1, tx2, tx3) and (tx1, tx2, tx4)) would arise?

@antiochp
Copy link
Member Author

Simplest scenario would be Mallory sends tx1 (and I guess tx2 in this example) to multiple nodes with the intention of causing mischief.

But I guess this would apply whether stem routing was done with single paths or with 4-regular graphs like this.

So yeah maybe with Dandelion++ we can consider alternative stem routing without it impacting tx aggregation.

@antiochp
Copy link
Member Author

We can approach this in two discrete stages I think -

  • Single path routing with modified "per epoch" behavior.
  • Then explore the additional routing changes (4-regular graph thingy)

If I understand the routing correctly: The additional pseudorandom routing over two different outbound connections allows a single node to effectively have two different but deterministic routes passing through and crossing over it. Which intuitively adds significant complexity to the possible layouts of the stem paths for a given set of connected nodes.

But we would also get a lot of the benefits by simply reworking the "per epoch" changes first as a discrete piece of work.
And from some early exploratory work this week it looks like it would actually simplify our dandelion code quite a bit (which is always a 👍 from my perspective).

@lehnberg
Copy link
Collaborator

Agreed - makes sense to get a quick win in with the changes you suggest above first and then look at 4-regular graphs second.

@antiochp
Copy link
Member Author

[WIP] PR is here - #2344.

@antiochp antiochp added this to the 1.1.0 milestone Feb 21, 2019
@quentinlesceller
Copy link
Member

Fixed by #2628.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants