AWS SQS, SWF, SNS - Chapter 8

AWS SQS, SWF, SNS


Chapter 8 of the AWS Certified Solutions Architect Official Study Guide (Sybex, 2017) covers the Simple Queue Service, Simple Workflow Service, and Simple Notification Service, (i.e. pub-sub, MOM, messaging, notification etc)

First, notice that they all start with "Simple". There's a reason for this, being Web services based at "internet scale" they do all share one thing in common, which is that they are all "simpler" than some other comparable industry and enterprise standards and technologies. It is therefore likely that some sophisticated enterprise features in existing similar technologies are not present in the AWS "Simple" service version. On the other hand the AWS versions are managed services which scale (horizontally) so there shouldn't be any issues with upper scalability capacity.

SQS allows for multiple writers and readers, but (is this still true?) provides greater than or equal to one delivery semantics (so readers must check for duplicates) and the order of message delivery is not guaranteed.

For comparison here's a list of comparable technologies that I've had some experience with (evaluated, used, performance tested, etc) over the last 2 decades.

JMS, Apache ActiveMQ, Atom and Atom Publishing Protocol,  BPEL, Jabber/XMPP, WS-Eventing, WS-Notification, Open Grid Services Architecture (OGSA), Enterprise Service Buses (e.g. MULE, IBM WebSphere ESB), and Open Geospatial Consortium (OGC) Sensor Web standards including web notification service and sensor alert services (http://www.opengeospatial.org/ogc/markets-technologies/swe).

One approach is to see what other people have written about the simarities between AWS SQS (e.g.) and comparable technologies. For example, this report covers the top 10 differences between ActiveMQ and AWS SQS (but is a few years old): https://thedulinreport.com/2015/09/05/top-ten-differences-between-activemq-and-amazon-sqs/


One of the issues with "internet" scale services is that performance (along with more sophisticated enterprise features) is a likely trade-off for extreme scalability (i.e. different internal middleware architecture required for internet scalability, shared/managed commodity resources, increased latency due to use of public internet and limited locations for service locations, etc).  For any web service based messaging system one of the potential issues is how to manage essentially asynchronous protocols with a synchronous HTTP protocol. AWS SQS does this by requiring a complete request-response interaction to put each message onto the queue (with guaranteed delivery). A similar problem occurs with getting events from queues as polling is required (SQS has a mechanism to make this more efficient called "Long Polling", but it's still polling).  Putting a message onto the queue can take 20ms. In MOM times this is eternity. Many enterprise MOMs have sub microsecond queueing of messages and can handle millions of messages a second end to end. For a single client thread the maximum throughput is only 50 messages a second. This is really slow.

This is a bit like the difference in convenience and speed between being able to quickly post letters in a post box outside your house vs. having to get in your car and drive to the nearest physical post office to drop pay for a letter and register it and get a tracking number.

However, because SQS scales horizontally you can increase the number of client threads and/or do something more complex ( I guess, like have an async wrapper for the request-response call with a call back when it eventually returns etc).

I also notice that there is a JMS (1.1) client for AQS SQS available (which may already including multi-threading?): https://aws.amazon.com/blogs/aws/new-sqs-client-library-for-java-messaging-service-jms/

This is a good example of the sort of architectural tradeoff I'm interested in with Cloud/internet scale systems: Scalability vs performance, with more complexity to workaround performance limitations pushed onto the client side development and frameworks. 

Pictorially this is a big like a single high speed train vs. lots of slower steam trains and a shunting yard.







There may be more detailed discussion about performance/scalability tradeoffs in the developers material for SQS (maybe they assume that architects don't understand this level of detail?)

Based on experience with the OGC (and some commercial and open source technologies in the sensor event and complex event processing stream areas) some issues that I previously encountered are: event loops with more complex pub-sub architectures (e.g. including brokers, and mutual subscriptions to each multiple brokers can result in the same event being passed around and treated as a new event by multiple brokers resulting in an increasing event loop/storm whoops), scalability problems with increasing rate of event publications resulting a significant slow down for event matching and notifications (solutions include separate event write and read stores, Cassandra for optimisation of event writes, caching for improving scalability of complex event processing systems which also need to match events to persisted historical events), and more sophisticated event subscription which allows demand based subscriptions (combination of event logic and time and number of notifications to receive).

TODO Is there a good example of a enterprise MOM application migrated to AWS SQS/SNS somewhere?

PS (15 June)
Some questions that come up in the practice tests are related to the difference between SQS and SWF, and which support exactly once-delivery.

SQS now has FIFO queues which support in order and exactly once delivery using a hash function. 

It's harder to work out if SWF supports this or not. E.g.

To ensure that no conflicting decisions are processed, Amazon SWF assigns each decision task to exactly one decider and allows only one decision task at a time to be active in a workflow execution.

Amazon SWF assigns each activity task to exactly one activity worker. Once the task is assigned, no other activity worker can claim or perform that task.

Does this result in exactly-once-delivery? Some of the certification questions imply that SWF is the only service that does (at the time when they were written), but this is NO LONGER the case.

And FIFO queues have some limitations (given that they introduce a single queue to ensure some of delivery properties, see this blog).


And this blog has a good comparison and does claim that SWF is exactly-once delivery (but no reference to AWS docs).



Comments

Post a Comment

Popular posts from this blog

Chapter 11: AWS Directory Service, Cloud Directory

AWS Solution Architecture Certification Postscript

Chapter 2: Amazon Simple Storage Service (S3) and Amazon Glacier Storage (for storing your beer?)