One of the things that surprised me following last week's Jepsen report on Radix DLT (https://jepsen.io/analyses/radix-dlt-1.0-beta.35.1) was seeing both blockchain/DLT people *and* the database community go "Hang on, 16 transactions per second can't be right"--and expecting wildly different figures.
Is it that DLTs are doing *byzantine* consensus? Etcd uses Raft (https://raft.github.io/), which is not Byzantine fault-tolerant. Takes 2 network hops plus a disk sync on a majority of nodes to commit. ~2n messages/txn. Throughput bounded by the single, totally-ordered Raft log.
Radix is based on Hotstuff (https://arxiv.org/abs/1803.05069), which is Byzantine fault-tolerant, three-phase consen. ~6n (I think?) messages/txn.
And like, Hotstuff *itself* can go fast. The paper reports c5.4xlarge clusters pushing ~120K ops/sec (1KB/op, batches of 400 ops per round).
As the crypto maxim goes: DYOR!
Here's a YourKit snapshot from one of those Radix nodes pushing ~12 txns/sec. Some of it's crypto (BouncyCastle), but it looks like it's burning a ton of time in BerkeleyDB IO. Roughly 1/3rd waiting for fsync.
Rather a *lot* of fsyncs, as it turns out. Roughly 11 calls per txn on each node, at least in this particular run.
Etcd does way more per second (!?) but, like most DBs, batches. At ~2700 txns/sec, etcd gets away with only ~0.27 syncs/txn in this run.
Zooming out: Some of these costs can probably be optimized away in time. I suspect permissionless DLTs are always going to be at a latency and throughput disadvantage though. For starters, Lamport 2002 puts a two msg-delay lower bound on async consensus: https://lamport.azurewebsites.net/pubs/lower-bound.pdf
Thing is that none of this is even remotely close to saturating disk or network bandwidth. It's a fresh, empty cluster and request volumes are *tiny*, so like... page cache should be able to hold most if not all of this data.
I dunno. Software is a ~rich tapestry~
A single-user Mastodon instance for Jepsen announcements & discussion.