RT @redpandadata
Hear from @jepsen_io about the safety and of our streaming data engine – what we fixed and what we shouldn’t. Live webinar on May 25 at 10am PST.


"Hang on, wasn't Radix slow with COMMIT_NO_SYNC too?"

Yup! That tells us fsync can't be the only factor. All that CPU has to be going somewhere. High but variable system time. I'd also look at 540kBps of inbound network traffic vs 1.9 MBps of disk writes: write amplification?

Show thread

As the crypto maxim goes: DYOR!

Here's a YourKit snapshot from one of those Radix nodes pushing ~12 txns/sec. Some of it's crypto (BouncyCastle), but it looks like it's burning a ton of time in BerkeleyDB IO. Roughly 1/3rd waiting for fsync.


Show thread

To wit: both of these are 5-node consensus systems. Both use that consensus system to build a totally ordered log of state transitions, and run a replicated state machine on top of that log. But etcd here is pushing 2600 instead of 12 TPS, and with median latencies ~20x lower.

Show thread

Here's etcd on that same cluster doing writes of transaction-sized (~250B) JSON blobs to random keys. No special tuning, just an out of the box install. No connection pipelining/pooling or batching. Just plain old connection-per-request over HTTP.

This is ~typical for dist DBs.

Show thread

With that said: here are throughput and latency graphs from Jepsen talking to Radix DLT running on a cluster of 5 m5.xlarge nodes (all validators) backed by EBS. No reads, just write transactions between an exponentially-distributed pool of 30 accounts. Whole dataset fits in RAM.

Show thread

New release! Maelstrom 0.2.0 is a workbench for learning distributed systems by writing your own, in any language. Comes with a six-chapter tutorial in writing your own toy echo, gossip, CRDT, Datomic, and Raft systems. Powered by Jepsen and Elle! github.com/jepsen-io/maelstrom

RT @yow_conf
Can you believe it?! We're adding ONE more speaker to and it's @aphyr!

We trust databases to store our data, but should we? Learn the basics of distributed systems testing & advice for testing your own systems in his keynote - Jepsen 13.

Love this thing where Google Cloud decides that jepsen.io has been stable for a while and it really ought to do something about that, so it kills the VM and spins up a new one to replace it only *after* it's dead, resulting in ~10 minutes of spurious downtime.

It's been doing this for ~two years, COME ON Google, y'all are supposed to be experts at rollouts. Start new nodes *before* you kill existing ones!

RT @andy_pavlo
@jepsen_io .@halberenson sent me an amazing email last year about why the ANSI SQL isolation levels got muddied up. I'm sure he can share more details about what happened.

So here's a neat thing postgres 12.3 might do? Maybe I'm doing it wrong, not sure yet.

All these transactions are executed with SERIALIZABLE isolation over lists implemented as comma-separated TEXT fields. `r x [1, 2]` means we read the current value of row x and found it to be [1,2]. `a x 3` means "append 3 to x", like so:

insert into txn1 as t (id, val) values ($1, $2) on conflict (id) do update set val = concat(t.val, ',', $3) where t.id = $4

rw is an anti-dep, ww and wr are deps.

I keep thinking about their VLDB paper which says ~80% of writes to MongoDB's hosted service don't set a write concern, and 99.6% of reads don't set a read concern. vldb.org/pvldb/vol12/p2071-sch

Show thread

A single-user Mastodon instance for Jepsen announcements & discussion.