RT @redpandadata
Hear from @jepsen_io about the safety and of our streaming data engine – what we fixed and what we shouldn’t. Live webinar on May 25 at 10am PST.

go.redpanda.com/jepsen-webinar

Hey y'all! Doing an hour-long free webinar on May 25th with @redpandadata to talk about what we found in the last Jepsen analysis (jepsen.io/analyses/redpanda-21). Come learn about streaming systems safety!

go.redpanda.com/jepsen-webinar

Cheers to @redpandadata on a delightful collaboration, and congratulations on their new release. :-)

Show thread

Redpanda has addressed most of these issues in the just-released 21.11.15, and the upcoming 22.1.1 fixes aborted reads and lost writes with transactions--lost/stale messages are still under investigation. A few more issues require only documentation to address.

Show thread

A new report! We analyzed @redpandadata (a Kafka-compatible distributed queue) and discuss crashes, aborted reads, inconsistent offsets, and lost/stale messages, along with some potentially surprising aspects of the Kafka transaction protocol. jepsen.io/analyses/redpanda-21

I am begging the cryptocurrency community to consider alternative ways of knowing, such as "emailing someone to ask them questions instead of speculating in chat" and "submitting a handful of transactions and seeing if they show up"

Show thread

<sigh> No, Radix folks, Jepsen will not be accepting a follow-up engagement with any Radix-related entities. Y'all can stop suggesting there's going to be a follow-up analysis on Xi'an now.

I take my ethical commitments seriously.

They also stressed the importance of end-to-end verification of safety properties, because APIs are how exchanges and users actually interact with DLTs. This is a challenge in traditional databases as well: composition of (e.g.) serializable transactional DBs is nontrivial!

Show thread

I'm not sure how widespread this understanding is in the DLT space (still looking for a citation for RDX Works's definition) but the researchers I've talked to were unanimous: losing committed transactions *is* a safety error, even if every validator agrees to throw away data.

Show thread

Since the release I've had the chance to chat with a handful of analysts working specifically on verification of blockchain/cryptocurrency/DLT systems, and can confirm that they also use the usual distsys sense of "safety property"--namely: "something bad does not happen".

Show thread

Some helpful and much-better-informed comments from @trianglesphere on tendermint/hotstuff latency, including a nicely drawn Lamport diagram.
---
RT @trianglesphere
@jepsen_io I’m pretty sure it’s 7 delays. 1 to validator, 7 to finalize, 1 from any validator back to the client. By this metric, PBFT/tender mint is 3. Ignore the new view, but each set of arrows is a hop
twitter.com/trianglesphere/sta

RT @trianglesphere
@jepsen_io I'm not at all familiar with Radix DLT, but I've got a bunch of thoughts on consensus algorithms and improving their performance in blockchain/DLTs.
Byzantine consensus slows down DLTs, but the environment is adversarial which slows everything down as well.

Thing is that none of this is even remotely close to saturating disk or network bandwidth. It's a fresh, empty cluster and request volumes are *tiny*, so like... page cache should be able to hold most if not all of this data.

I dunno. Software is a ~rich tapestry~

Show thread

"Hang on, wasn't Radix slow with COMMIT_NO_SYNC too?"

Yup! That tells us fsync can't be the only factor. All that CPU has to be going somewhere. High but variable system time. I'd also look at 540kBps of inbound network traffic vs 1.9 MBps of disk writes: write amplification?

Show thread

If I can leave you with one idea, it's:

DLTs, like any database, are empirically investigable artifacts. You can build, install, and ask one to store some data. See if it comes back like you'd expect. Even simple tests can lead to interesting & exciting results.

Try it out! ❤️

Show thread

This is something I kind of expected DLT & DeFi whitepapers to discuss as a matter of course: What kinds of apps would be insensitive to these costs? Which ones might find it more efficient to keep running on permissioned, centralized networks?

Curious to hear y'all's thoughts!

Show thread

Does this *matter*? I honestly don't know.

ACH latencies are multiple days. OTOH, some trading systems start issuing order requests while the packet with offers is still coming over the wire. Transaction value can just barely beat--or far outweigh--processing & storage costs.

Show thread

Redundancy: Sybil resistance pushes DLTs to run the same computation on lots of nodes. Ethereum uses at least 5770 nodes to do a single (very slow) computer's computation. Radix uses ~100. DBs using regular old (i.e. permissioned) consensus usually run 3, 5, or 7 replicas.

Show thread

Constant factors: BFT and permissionless networks tend to rely on cryptographic signatures, and those take compute, bandwidth, and storage.

I have a loose suspicion that the UTXO state machine representation for the ledger might also impose costs. Would love to hear about this.

Show thread

That tells us that any globe-spanning consensus system using light through fiber will have a hard latency floor of ~200 ms. Consensus latencies in local datacenters can get down around 5-10. Not every application is latency-sensitive, but some require or profit from low latency!

Show thread
Show older
Jepsen

A single-user Mastodon instance for Jepsen announcements & discussion.