[bitcoin/bitcoin] Dandelion transaction relay (BIP 156) (#13947)

stevenroose requested changes on this pull request.

> @@ -399,6 +399,7 @@ void SetupServerArgs()
gArgs.AddArg(«-bantime=«, strprintf(«Number of seconds to keep misbehaving peers from reconnecting (default: %u)», DEFAULT_MISBEHAVING_BANTIME), false, OptionsCategory::CONNECTION);
gArgs.AddArg(«-bind=«, «Bind to given address and always listen on it. Use [host]:port notation for IPv6», false, OptionsCategory::CONNECTION);
gArgs.AddArg(«-connect=«, «Connect only to the specified node; -connect=0 disables automatic connections (the rules for this peer are the same as for -addnode). This option can be specified multiple times to connect to multiple nodes.», false, OptionsCategory::CONNECTION);
+ gArgs.AddArg(«-dandelion», strprintf(«Probability to extend the stem in Dandelion transaction relay by one hop (default: %d, 0 to disable dandelion)», DEFAULT_DANDELION_STEM_PERCENTAGE), false, OptionsCategory::CONNECTION);

A simple «probability» should a number between 0 and 1. The variable used as default and the code below suggest a percentage is used instead. I think it would make sense to document it as such as well.

> if (gArgs.SoftSetBoolArg(«-whitelistrelay», false))
LogPrintf(«%s: parameter interaction: -blocksonly=1 -> setting -whitelistrelay=0\n», __func__);
+ if (gArgs.SoftSetBoolArg(«-dandelion», false)) {

Just to not mix up values, this should set 0 instead of false, even if false would be converted automatically to 0.

> @@ -1741,6 +1754,80 @@ int CConnman::GetExtraOutboundCount()
return std::max(nOutbound — nMaxOutbound, 0);
}

+void CConnman::AssignNewDandelionDestination(CNode* from)
+{
+ from->m_dandelion_destination = nullptr;
+
+ if (!CNode::IsDandelionEnabled()) return;

This check should not be necessary, failing it means there’s a bug. I’d suggest either using an assertion or something more severe than this or just removing the check.

> @@ -317,6 +327,16 @@ class CConnman
*/
int64_t PoissonNextSendInbound(int64_t now, int average_interval_seconds);

+ using DandelionPeer = std::pair;

Please elaborate a bit more on the documentation of the integer.

> /** Time between pings automatically sent out for latency probing and keepalive (in seconds). */
static const int PING_INTERVAL = 2 * 60;
/** Time after which to disconnect, after waiting for a ping response (or inactivity). */
static const int TIMEOUT_INTERVAL = 20 * 60;
/** Run the feeler connection loop once every 2 minutes or 120 seconds. **/
static const int FEELER_INTERVAL = 120;
+/** Pick new dandelion peers once every 10 minutes or 600 seconds. */
+static constexpr int INTERVAL_DANDELION = 600;

The convention seems to be `XXX_INTERVAL` instead of `INTERVAL_XXX`.

> + LogPrint(BCLog::DANDELION, «Dandelion: Shuffle Dandelion Destinations.\n»);
+ std::vector all_nodes;
+ ForEachNode([&](CNode* node) { all_nodes.push_back(node); });
+
+ LOCK(cs_dandelion_peers);
+
+ // Collect all potential dandelion nodes
+ m_nodes_dandelion.clear();
+ for (CNode* node : all_nodes) {
+ bool full_dandelion = //
+ // Ignore node->fRelayTxes here, to allow for dandelion-only nodes
+ node->m_accept_dandelion;
+ bool weak_dandelion = //
+ node->fRelayTxes /* ignore peers that reject normal txs */ &&
+ !node->fClient /* ignore peers that don’t serve blocks */ &&
+ !node->nLastTXTime /* ignore peers that never sent us any tx */;

First, I think this test should be performed as well for Dandelion advertising peers, because they could otherwise deny you service. Furthermore, I think it might be more useful to have a threshold time of last relay instead of just one. Like «relayed tx in last minute».

With this, a peer could advertise dandelion and never relay anything.

> @@ -1741,6 +1754,80 @@ int CConnman::GetExtraOutboundCount()
return std::max(nOutbound — nMaxOutbound, 0);
}

+void CConnman::AssignNewDandelionDestination(CNode* from)
+{
+ from->m_dandelion_destination = nullptr;
+
+ if (!CNode::IsDandelionEnabled()) return;
+
+ unsigned min_use{std::numeric_limits::max()};

Please explain how this variable is used and why. I fail to see its function.

Also since peers are checked with `min_use > peer.second`, this is an upper limit, not a lower limit. Making the name `min_use` not very telling.

> @@ -1741,6 +1754,80 @@ int CConnman::GetExtraOutboundCount()
return std::max(nOutbound — nMaxOutbound, 0);
}

+void CConnman::AssignNewDandelionDestination(CNode* from)

I think this method could use some documentation on the strategy/algorithm used to assign destinations.

>
+/** Default for -dandelion stem percentage */
+static constexpr int64_t DEFAULT_DANDELION_STEM_PERCENTAGE = 90;

Why 90? This is like the most important parameter of the dandelion algorithm, I think it deserves some motivation..

> /** Time between pings automatically sent out for latency probing and keepalive (in seconds). */
static const int PING_INTERVAL = 2 * 60;
/** Time after which to disconnect, after waiting for a ping response (or inactivity). */
static const int TIMEOUT_INTERVAL = 20 * 60;
/** Run the feeler connection loop once every 2 minutes or 120 seconds. **/
static const int FEELER_INTERVAL = 120;
+/** Pick new dandelion peers once every 10 minutes or 600 seconds. */
+static constexpr int INTERVAL_DANDELION = 600;

Also, why 10 minutes?

> });
}

+static void FluffTransaction(CTxMemPool& tx_pool, const CNode* from, const CTransactionRef& ptx, CConnman* connman)
+{
+ LogPrint(BCLog::DANDELION, «fluff\n»);

Perhaps mention txid?

> + // Should never happen, since we checked for validity
+ LogPrint(BCLog::DANDELION, «Dandelion transaction fluff failed due to %s\n», FormatStateMessage(state));
+ return;
+ }
+ tx_pool.check(pcoinsTip.get());
+ RelayTransaction(*ptx, connman);
+ LogPrint(BCLog::MEMPOOL, «AcceptToMemoryPool: peer=%d: accepted %s (poolsz %u txn, %u kB)\n»,
+ from->GetId(),
+ ptx->GetHash().ToString(),
+ tx_pool.size(), tx_pool.DynamicMemoryUsage() / 1000);
+}
+
+static void RelayDandelionTransaction(CTxMemPool& tx_pool, CNode* node_from, const CTransactionRef& ptx, const CNetMsgMaker& msg_maker, CConnman* connman)
+{
+ bool fluff = GetRand(100 * 100) / 100. >= CNode::m_dandelion_stem_pct_threshold;
+ if (fluff) return FluffTransaction(tx_pool, node_from, ptx, connman);

Should this check also be made for the wallet’s own transactions? Or should a forced non-fluff be done in that case?

> + node_from->m_cache_expiry.push(expiry);
+
+ static constexpr int DANDELION_SEND_FLAGS{0}; // Always send as witness tx
+
+ // Check if the next peer would fluff (assume same probability)
+ bool force_fluff = GetRand(100 * 100) / 100. >= CNode::m_dandelion_stem_pct_threshold;
+
+ if (force_fluff || !dest->m_accept_dandelion) {
+ // The next peer is going to transition into fluff-phase
+ LogPrint(BCLog::DANDELION, «one-hop relay to peer=%d\n», dest->GetId());
+ connman->PushMessage(dest, msg_maker.Make(DANDELION_SEND_FLAGS, NetMsgType::TX, *ptx));
+ return;
+ }
+
+ LogPrint(BCLog::DANDELION, «dandelion relay to peer=%d\n», dest->GetId());
+ connman->PushMessage(dest, msg_maker.Make(DANDELION_SEND_FLAGS, NetMsgType::TX_DANDELION, *ptx));

Wrong indentation since after return.

> @@ -120,6 +120,11 @@ extern const char *GETHEADERS;
* @see https://bitcoin.org/en/developer-reference#tx
*/
extern const char *TX;
+/**
+ * The dandelion tx message transmits a single transaction.
+ * @see https://github.com/bitcoin/bips/blob/master/bip-0156.mediawiki
+ */
+extern const char* TX_DANDELION;

Why not a service bit?