network: merge two loops in iteratePeersWithSendMsg, send to 2/3

Refactor code and be fine with sending to just 2/3 of proper peers. Previously it was an edge case, but it can be a normal thing to do also as broadcasting to everyone is obviously too expensive and excessive (hi, #608). Baseline (four node, 10 workers): RPS 8180.760 8137.822 7858.358 7820.011 8051.076 ≈ 8010 ± 2.04% TPS 7819.831 7521.172 7519.023 7242.965 7426.000 ≈ 7506 ± 2.78% CPU % 41.983 38.775 40.606 39.375 35.537 ≈ 39.3 ± 6.15% Mem MB 2947.189 2743.658 2896.688 2813.276 2863.108 ≈ 2853 ± 2.74% Patched: RPS 9714.567 9676.102 9358.609 9371.408 9301.372 ≈ 9484 ± 2.05% ↑ 18.40% TPS 8809.796 8796.854 8534.754 8661.158 8426.162 ≈ 8646 ± 1.92% ↑ 15.19% CPU % 44.980 45.018 33.640 29.645 43.830 ≈ 39.4 ± 18.41% ↑ 0.25% Mem MB 2989.078 2976.577 2306.185 2351.929 2910.479 ≈ 2707 ± 12.80% ↓ 5.12% There is a nuance with this patch however. While typically it works the way outlined above, sometimes it works like this: RPS ≈ 6734.368 TPS ≈ 6299.332 CPU ≈ 25.552% Mem ≈ 2706.046MB And that's because the log looks like this: DeltaTime, TransactionsCount, TPS 5014, 44212, 8817.710 5163, 49690, 9624.249 5166, 49523, 9586.334 5189, 49693, 9576.604 5198, 49339, 9491.920 5147, 49559, 9628.716 5192, 49680, 9568.567 5163, 49750, 9635.871 5183, 49189, 9490.450 5159, 49653, 9624.540 5167, 47945, 9279.079 5179, 2051, 396.022 5015, 4, 0.798 5004, 0, 0.000 5003, 0, 0.000 5003, 0, 0.000 5003, 0, 0.000 5003, 0, 0.000 5004, 0, 0.000 5003, 2925, 584.649 5040, 49099, 9741.865 5161, 49718, 9633.404 5170, 49228, 9521.857 5179, 49773, 9610.543 5167, 47253, 9145.152 5202, 49788, 9570.934 5177, 47704, 9214.603 5209, 46610, 8947.975 5249, 49156, 9364.831 5163, 18284, 3541.352 5072, 174, 34.306 On a network with 4 CNs and 1 RPC node there is 1/256 probability that a block won't be broadcasted to RPC node, so it won't see it until ping timeout kicks in. While it doesn't see a block it can't accept new incoming transactions so the bench gets stuck basically. To me that's an acceptable trade-off because normal networks are much larger than that and the effect of this patch is way more important there, but still that's what we have and we need to take into account.
2021-08-06 15:36:46 +03:00 · 2021-08-06 15:36:46 +03:00 · 7bb82f1f99
commit 7bb82f1f99
parent 966a16e80e
1 changed files with 20 additions and 35 deletions
--- a/pkg/network/server.go
+++ b/pkg/network/server.go
@ -1164,45 +1164,30 @@ func (s *Server) iteratePeersWithSendMsg(msg *Message, send func(Peer, bool, []b
 	// have already sent an Inv to it.
 	finished := make([]bool, peerN)
 	// Try non-blocking sends first and only block if have to.
 	for _, blocking := range []bool{false, true} {
 		for i, peer := range peers {
-		err := send(peer, false, pkt)
+			// Send to 2/3 of good peers.
 			if 3*sentN >= 2*(peerN-deadN) {
 				return
 			}
 			if finished[i] {
 				continue
 			}
 			err := send(peer, blocking, pkt)
 			switch err {
 			case nil:
 				if msg.Command == CMDGetAddr {
 					peer.AddGetAddrSent()
 				}
 				sentN++
-		case errBusy:
+			case errBusy: // Can be retried.
 				continue
 			default:
 				deadN++
 			}
 			finished[i] = true
 		}
 	// Send to at least 2/3 of good peers.
 	if 3*sentN >= 2*(peerN-deadN) {
 		return
 	}
 	// Perform blocking send now.
 	for i, peer := range peers {
 		if finished[i] {
 			continue
 		}
 		if err := send(peer, true, pkt); err != nil {
 			if err != errBusy {
 				deadN++
 			}
 			continue
 		}
 		if msg.Command == CMDGetAddr {
 			peer.AddGetAddrSent()
 		}
 		sentN++
 		if 3*sentN >= 2*(peerN-deadN) {
 			return
 		}
 	}
 }