One of the things that surprised me following last week's Jepsen report on Radix DLT (https://jepsen.io/analyses/radix-dlt-1.0-beta.35.1) was seeing both blockchain/DLT people *and* the database community go "Hang on, 16 transactions per second can't be right"--and expecting wildly different figures.
Is it that DLTs are doing *byzantine* consensus? Etcd uses Raft (https://raft.github.io/), which is not Byzantine fault-tolerant. Takes 2 network hops plus a disk sync on a majority of nodes to commit. ~2n messages/txn. Throughput bounded by the single, totally-ordered Raft log.
Radix is based on Hotstuff (https://arxiv.org/abs/1803.05069), which is Byzantine fault-tolerant, three-phase consen. ~6n (I think?) messages/txn.
And like, Hotstuff *itself* can go fast. The paper reports c5.4xlarge clusters pushing ~120K ops/sec (1KB/op, batches of 400 ops per round).
As the crypto maxim goes: DYOR!
Here's a YourKit snapshot from one of those Radix nodes pushing ~12 txns/sec. Some of it's crypto (BouncyCastle), but it looks like it's burning a ton of time in BerkeleyDB IO. Roughly 1/3rd waiting for fsync.
Rather a *lot* of fsyncs, as it turns out. Roughly 11 calls per txn on each node, at least in this particular run.
Etcd does way more per second (!?) but, like most DBs, batches. At ~2700 txns/sec, etcd gets away with only ~0.27 syncs/txn in this run.
Zooming out: Some of these costs can probably be optimized away in time. I suspect permissionless DLTs are always going to be at a latency and throughput disadvantage though. For starters, Lamport 2002 puts a two msg-delay lower bound on async consensus: https://lamport.azurewebsites.net/pubs/lower-bound.pdf
"Hang on, wasn't Radix slow with COMMIT_NO_SYNC too?"
Yup! That tells us fsync can't be the only factor. All that CPU has to be going somewhere. High but variable system time. I'd also look at 540kBps of inbound network traffic vs 1.9 MBps of disk writes: write amplification?
A single-user Mastodon instance for Jepsen announcements & discussion.