Also delighted to share that there's now a (limited) Jepsen test for local filesystems, which we've been using to find bugs in lazyfs:

Show thread

Jepsen 0.2.7 is now available! Includes a (known-buggy) preview of lazyfs: a filesystem which can intentionally lose un-fsynced writes!

Ayyyyyy, congratulations! 🎉
RT @kjnilsson
When I joined I had a very clear view of what I wanted to fix: that @jepsen_io data loss failure from 2012. :)

With the release of the Raft based Quorum Queues we now have a queue type that provides the kind of data safety users expect from a messaging system.

RT @redpandadata
Hear from @jepsen_io about the safety and of our streaming data engine – what we fixed and what we shouldn’t. Live webinar on May 25 at 10am PST.

Hey y'all! Doing an hour-long free webinar on May 25th with @redpandadata to talk about what we found in the last Jepsen analysis ( Come learn about streaming systems safety!

Cheers to @redpandadata on a delightful collaboration, and congratulations on their new release. :-)

Show thread

Redpanda has addressed most of these issues in the just-released 21.11.15, and the upcoming 22.1.1 fixes aborted reads and lost writes with transactions--lost/stale messages are still under investigation. A few more issues require only documentation to address.

Show thread

A new report! We analyzed @redpandadata (a Kafka-compatible distributed queue) and discuss crashes, aborted reads, inconsistent offsets, and lost/stale messages, along with some potentially surprising aspects of the Kafka transaction protocol.

I am begging the cryptocurrency community to consider alternative ways of knowing, such as "emailing someone to ask them questions instead of speculating in chat" and "submitting a handful of transactions and seeing if they show up"

Show thread

<sigh> No, Radix folks, Jepsen will not be accepting a follow-up engagement with any Radix-related entities. Y'all can stop suggesting there's going to be a follow-up analysis on Xi'an now.

I take my ethical commitments seriously.

They also stressed the importance of end-to-end verification of safety properties, because APIs are how exchanges and users actually interact with DLTs. This is a challenge in traditional databases as well: composition of (e.g.) serializable transactional DBs is nontrivial!

Show thread

I'm not sure how widespread this understanding is in the DLT space (still looking for a citation for RDX Works's definition) but the researchers I've talked to were unanimous: losing committed transactions *is* a safety error, even if every validator agrees to throw away data.

Show thread

Since the release I've had the chance to chat with a handful of analysts working specifically on verification of blockchain/cryptocurrency/DLT systems, and can confirm that they also use the usual distsys sense of "safety property"--namely: "something bad does not happen".

Show thread

Some helpful and much-better-informed comments from @trianglesphere on tendermint/hotstuff latency, including a nicely drawn Lamport diagram.
RT @trianglesphere
@jepsen_io I’m pretty sure it’s 7 delays. 1 to validator, 7 to finalize, 1 from any validator back to the client. By this metric, PBFT/tender mint is 3. Ignore the new view, but each set of arrows is a hop

RT @trianglesphere
@jepsen_io I'm not at all familiar with Radix DLT, but I've got a bunch of thoughts on consensus algorithms and improving their performance in blockchain/DLTs.
Byzantine consensus slows down DLTs, but the environment is adversarial which slows everything down as well.

Thing is that none of this is even remotely close to saturating disk or network bandwidth. It's a fresh, empty cluster and request volumes are *tiny*, so like... page cache should be able to hold most if not all of this data.

I dunno. Software is a ~rich tapestry~

Show thread

"Hang on, wasn't Radix slow with COMMIT_NO_SYNC too?"

Yup! That tells us fsync can't be the only factor. All that CPU has to be going somewhere. High but variable system time. I'd also look at 540kBps of inbound network traffic vs 1.9 MBps of disk writes: write amplification?

Show thread

If I can leave you with one idea, it's:

DLTs, like any database, are empirically investigable artifacts. You can build, install, and ask one to store some data. See if it comes back like you'd expect. Even simple tests can lead to interesting & exciting results.

Try it out! ❤️

Show thread

@bonzoesc ahahahaha no, I got *so* many messages about this

This is something I kind of expected DLT & DeFi whitepapers to discuss as a matter of course: What kinds of apps would be insensitive to these costs? Which ones might find it more efficient to keep running on permissioned, centralized networks?

Curious to hear y'all's thoughts!

Show thread
Show older

A single-user Mastodon instance for Jepsen announcements & discussion.