Chapter 7: Databases and AWS (RDS)

HUGE WHEEL OF DATABASES?!

This is the chapter where I started to feel that I stuck in some nightmare game show with a spinning wheel and too many databases to choose from.

Chapter 7 covers Relational Databases (2nd cousins? never did get the hang of distant relations), Data Warehouses (why would you put data in a house?), and NoSQL Databases (this sounds better I never did get the hang of SQL). Specifically it covers Amazon RDS (guess what that stands for? Did you guess Radio Data System? Wrong, try again, it's further down the list, although not exactly (it's a Service not a System):

Acronym	Definition
RDS	Radio Data System
RDS	Royal Dutch Shell
RDS	RMS (Root-Mean-Square) Delay Spread
RDS	Respiratory Distress Syndrome
RDS	Reference Data Set
RDS	Royal Dublin Society (Ireland)
RDS	Red Dot Sight
RDS	Remote Data Service
RDS	Radio Data Service (on FM 57 kHz subcarrier)
RDS	Research Development and Statistics (UK)
RDS	Réseau des Sports (French language all-sports network - Canada)
RDS	ReizdarmSyndrom (German)
RDS	Retiree Drug Subsidy (US DHHS)
RDS	Records Disposition Schedule
RDS	Romania Data Systems
RDS	Retained Duty System (UK Fire & Rescue Service Duty System)
RDS	Rural Development Service (UK)
RDS	Random Dot Stereogram
RDS	Red Dragons (skateboarding)
RDS	Rate Determining Step
RDS	Research Defence Society (UK)
RDS	Rural Development Strategy (World Bank)
RDS	Relational Database Systems

Amazon Redshift (data warehouse, but sounds like Warp drive from Star Trek, but isn't, it's actually to do with expansion of the universe), and Amazon DynamoDB (NoSQL, how come all the cool names go to non-relational DBs?).

Redshift (and blueshift), is there a blueshift DB?), some parts of this rotating galaxy are moving away from us (red) and some towards us (blue): http://images.nrao.edu/51

And a Dynamo (why did Amazon pick Dynamo for NoSQL? Not sure but maybe because a Dynamo is a DC power generator rather than AC, so produces "simpler" power, early electrical engineers didn't know what to do with AC so had to find a way of producing DC, so they used a commutator, see http://www.edisontechcenter.org/generators.html). Here's a Dynamo from the Munich science museum (if you have a week spare it's worth a visit, or if you run (literally) you can see it in a day, don't miss the underground history of mining tunnels!):

Amazon Relational Database Service (RDS)

(Some of "Rabbit's Friends-and-Relations")

Relational databases have been around for a while and became the "default" database style for many years (I wrote one in 1982 for a university course). They use a standard Structured Query Language (SQL), and have pre-defined tables (with columns and rows) for storing the data. They are all about persistence on some sort of hard drive (increasingly SSD not magnetic), and being able to find stuff again, and ask tricky questions. Now SQL is great (if you ignore the actual SQL part, and as long as you don't have too much data, don't need high availability etc, as it's tricky, but not impossible, to scale up/out relational databases to allow for failures, very high loads, etc). Relational databases were invented and commercialised to provide a high level abstraction over the computer h/w and made it easier to find and query data compared to previous attempts (including tree/hierarchical and network/graph based). One of the key differences was how Applications could find records

Use of a primary key (typically implemented by hashing)
Navigating relationships (called sets) from one record to another
Scanning all the records in a sequential order

Relational databases made this easier. Relational databases are typically "ACID":
Relational database operations (e.g. create, read, update, delete, queries) are done in the context of a group called a "transaction". Multiple transactions can be occurring concurrently. The following properties apply to transactions.

Atomicity

Transactional, each atomic action is either "all or nothing". You can't get any "half" processed states.

Consistency

The consistency property ensures that any transaction will bring the database from one valid state to another. E.g. if you update an existing table/column/row the next read will result in the changed state not the previous state.

Isolation

The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed sequentially, i.e., one after the other). (This one turned out to be an after-thought).

Durability

The data is persistent, it doesn't go away accidentally. Once a transaction has been "committed" the data-state stays around, even in the event of power loss, crashes, or errors.

The reason Amazon RDS is a "Service" is that it's really just a management interface which allows you to create and manage an actual relational database "instance" engine (vroom vroom), currently. Amazon Aurora, MySQL, MariaDB, Oracle, SQL Server, and PostgreSQL database engines.

This is really cool as you don't need to worry about which DB engine you are using (except for programming?), and AWS handles all the nasty stuff that a DB admin would have to worry about normally (e.g. scaling, availability, backups, patches, etc).

Some of the engines have multiple editions with different performance, availability zones, and encryption (E.g. Oracle, Microsoft SQL server).

Amazon even has it's own engine, Aurora is architected using SOA and cloud prrinciples: https://aws.amazon.com/rds/aurora/, https://d0.awsstatic.com/whitepapers/getting-started-with-amazon-aurora.pdf

Why would you use Aurora? It's faster for a start, maybe other reasons...

High availability

RDS supports multi-az deployments for disaster recovery. This gives master/slave instances in different AZs. Data from the master is replicated synchronously to the slave, and the switchover form master to slave upon failure is automatic and takes 1-2 minutes. However, note that this is for disaster recovery only as both instances cannot be in use at the same time.

Scalability

It turns out that Amazon RDS is built using Amazon Elastic Block Storage (EBS) so you have to make a choice about which storage type you want (Magnetic, General Purpose SSD, Provisioned IOPS SSD. The thing that worries me about General Purpose SSD is the burst performance (for spikes), as this also means throttling and a good understanding of what your load actually looks like (which implies good monitoring and probably modelling).

You can change the storage class of a RDS instance.

Amazon RDS supports "read replicas" for horizontal scaling. Read replicas work across multiple zones or regions and update the read instances asynchronously.

As usual keep in mind pricing and limitations.

PS
See next blogs for Redshift and DynamoDB

Search This Blog

A computer scientist learns Amazon Web Services (AWS)