Split brains and viral epidemics




Split brains and viral epidemics... Sounds nasty, a product of a mad scientists crazier experiments!

However, as I was hoping this presentation is a gold mine.

Under the Covers of AWS: Core Distributed Systems Primitives That Power Our Platform | AWS re:Invent 2014

  1. 1. November 13, 2014 | Las Vegas, NV Al Vermeulen and Swami Sivasubramanian


Even better, the video: https://www.youtube.com/watch?v=QVvFVwyElLY&feature=youtu.be 


A few highlights:

Unlikely things happen (eventually). I suspect this is what "everything fails, all the time" actually means. This is particularly tricky with distributed systems as you can't know 100% if something has failed (for good), or if it's just intermittent failure, or if something in between has failed (e.g. network), or if it will come back, or if it's gone mad or rogue etc.

One of the fundamental problems of distributed systems that Amazon had to solve was group membership (who's in my group, who should be in my group, what happens when something fails (some of the time) (e.g. partial failure) and you get a network partition - the split brain problem).

The first medical example of a real "split brain" was from the 1800's from the case of Phinesas Gage, when an "unlikely event" occurred:

On 13th September, 1848, 25-year-old Gage and his crew were working on the Rutland and Burlington Railroad near Cavendish in Vermont. Gage was preparing for an explosion by compacting a bore with explosive powder using a tamping iron. While he was doing this, a spark from the tamping iron ignited the powder, causing the iron to be propelled at high speed straight through Gage’s skull. It entered under the left cheek bone and exited through the top of the head, and was later recovered some 30 yards from the site of the accident.

Note that astonishingly he survived for 13 years afterwards, but suffered personality disorders.

https://neurophilosophy.wordpress.com/2006/12/04/the-incredible-case-of-phineas-gage/ 

The tamping iron and skull., in the Warren Anatomical Museum at Harvard University School of Medicine.


Amazon used well known protocols to address these issues (flying tamping irons?) including the Gossip protocols (which is a type of viral epidemic protocol).  Turns out to be complicated.

The Paxos protocol was invented in around 1990 (and was initially considered a joke, possibly because the author was trying to prove that the problem didn't have a solutio but then claimed to have found one?)  Here's a brief history:

The Paxos algorithm for consensus in a message-passing system was first described by Lamport in 1990 in a tech report that was widely considered to be a joke (see http://research.microsoft.com/users/lamport/pubs/pubs.html#lamport-paxos for Lamport's description of the history). The algorithm was finally published in 1998 in TOCS Lamport, The part-time parliament, ACM Transactions on Computer Systems 16(2):133-169, 1998, and after the algorithm continued to be ignored, Lamport finally gave up and translated the results into readable English Lamport, Paxos made simple, SIGACT News 32(4):18-25, 2001. It is now understood to be one of the most efficient practical algorithms for achieving consensus in a message-passing system with FailureDetectors, mechanisms that allow processes to give up on other stalled processes after some amount of time (which can't be done in a normal asynchronous system because giving up can be made to happen immediately by the adversary).

And:

http://lamport.azurewebsites.net/pubs/pubs.html#lamport-paxos

So AWS uses this underneath most of their services. One of the slides from their presentations asserts that a distributed system is either: Broken, uses a single source of information (i.e. an oracle), or uses Paxos as the underlying protocol.

It would therefore appear that some (all?) of the Amazon services exhibit a fractal pattern. They are either Paxos based themselves (e.g. DynamoDB), or they use a Paxos based service as an oracle (e.g. services that use DynamoDB as an Oracle, but DynamoDB isn't a single point of failure as it based on distributed Paxos ha ha). This is cool.

There is also the strong design principles that the Amazon services are designed as higher level abstractions for creating cloud applications while hiding the underlying (Paxos) complexity, also cool.

What else was interesting? They are also using transactional journals (oh, based on Paxos of course) as underlying primitives for some services (e.g. Kinesis and DynamoDB Streams).

Also EC2

Another explanation of Paxos.

And the original simple explanation.


So I'm still wondering how Blockchain differs to Paxos? Is blockchain simply a better Paxos?

http://www.finyear.com/Blockchain-distributed-ledgers-and-the-Paxos-protocol_a35554.html
https://medium.com/the-intrepid-review/what-blockchain-should-we-use-6ba9cca8df22

Anyone know? I don't (yet).

PS
And what do other NoSQL and cloud platforms use? Paxos? Something else?

https://www.cse.buffalo.edu/tech-reports/2016-02.pdf

 One answer from this paper (lots):


https://highlyscalable.wordpress.com/2012/09/18/distributed-algorithms-in-nosql-databases/
Cassandra: https://academy.datastax.com/resources/brief-introduction-apache-cassandra

Comments

  1. This is a very nice article. thank you for publishing this. i can understand this easily.AWS Online Training

    ReplyDelete
  2. Want to change your career in Selenium? Red Prism Group is one of the best training coaching for Selenium in Noida. Now start your career for Selenium Automation with Red Prism Group. Join training institute for selenium in noida.

    ReplyDelete
  3. Split Brains And Viral Epidemics >>>>> Download Now

    >>>>> Download Full

    Split Brains And Viral Epidemics >>>>> Download LINK

    >>>>> Download Now

    Split Brains And Viral Epidemics >>>>> Download Full

    >>>>> Download LINK SZ

    ReplyDelete

Post a Comment

Popular posts from this blog

Which Amazon Web Services are Interoperable?

AWS Certification glossary quiz: IAM

AWS SWF vs Lambda + step functions? Simple answer is use Lambda for all new applications.