Chapter 2: Amazon Simple Storage Service (S3) and Amazon Glacier Storage (for storing your beer?)

Chapter 2: Amazon Simple Storage Service (Amazon S3) and Amazon Glacier Storage



The Amazon Glacier Storage Service is designed for cheap long term archival storage, so you want things to stay put in it for a long time. It is a slightly odd choice of name these days given that global warming is a significant threat to Glaciers (Glaciers are "shrinking" and large dangerous chunks are falling off them as in the above image of the warning signs at a famous New Zealand Glacier (Franz Josef Glacier).  Maybe the Amazon "Glaciers" will be the only ones still around in a few 100 years.

I initially found it "odd" that the AWS Solutions Architect Certification book introduced storage services before compute services.  However, in hindsight this sort of makes sense as S3 is one of the original services, is core to other services, and introduces many of the AWS concepts, and is a "managed" service so it simpler and more typical in some respects.


Some of my initial notes on S3:

AWS S3 is an object store. You can have multiple "buckets", each buckets contains multiple objects. It's a sort of key-value store as each key is the "bucket name" and the value is the data stored. Objects are stored within a single region only but in > 2 locations so the data can be retrieved even if you loose 2 locations.

It has a REST API using standard CRUD semantics: PUT (create), POST (change/update), DELETE (remove), and GET (retrieve) operations.

All object have a URL made up of: bucketname.s3.amazonaws.com/keyname

Data can be deleted but there is a versioning mechanism.

Some of my initial questions were about cost, speed, and if it's transactional.

There is a potential issue with eventual consistency as it's a distributed data store so you can't have high durability with distributed data and immediate consistency. I.e. PUT, READS,and DELETE to existing objects may give stale data. But is ok for new objects.

How long does eventually consistency take in the worst case? Is there a client side mechanism that can be used to get around this issue?  Consistency is discussed in more detail here and may be region dependent? http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyMode 
Netflix has a potential solution documented here: http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
And here is a good introduction to internet scale conistency: http://cloudacademy.com/blog/consistency-models-of-amazon-cloud-services/


S3 can be used for static web hosting, this looks like an interested use case.

S3 supports multiple "storage classes". These are tradeoffs between availability, durability, read speed and cost.

S3 supports lifecycle configuration rules to allow automatic transition of objects from one class to another and eventual deletion.
Classes are standard, Infrequent Access (IA), Reduced Redundancy Storage (RRS), and Glacier.

S3 supports encryption of objects in flight and at rest.

S3 supports multi-part upload API which supports higher throughput data uploads. This should be used for data > 100MB and must be used for data > 5GB.

Cross-region replication is supported allowing async replication of objects from a bucket in one region to one other region. But this only replicates NEW objects for an existing bucket.

S3 also supports logging, event notification (to SNS, SQS and Lambda functions).

S3 supports caching, and indexing in DynamoDB.

Compared to other similar object stores, Cassandra has "tunable" consistency which may give more flexibility.

S3 doesn't appear to support "locking" (i.e. concurrent updates to the same object - latest update wins).

S3 objects have key, value, version id, metadata (set of key/value pairs), and AC information.

Performance issues? Need to randomise the key using a hash for better performance (this messes up the URL however). and CloudFront for heavy reads (how is the cache refreshed from S3?)


More on storage classes:

Infrequent Access (IA) has 99.99% availability, 128KB minimum data object size (Why?), 30 days minimum duration (what's the minimum duration for standard? Can you create lots of objects and delete them after a few minutes and only be charged for the time used?) and a fee per GB data read.

Reduced Redundancy Storage (RRS) provides "400 x disk durability" but you can only loose 1 location (i.e. >= 2 locations).  NOTE: RRS appears to be "Redundant" now as the cost if > standard.  This illustrates one of the architectural choices with AWS which is that price is a big factor and it's beyond your control so becomes a significant architectural "feature". How much would it cost to move from RRS to IA or back to standard at present?

Remember that it's free to move data in and out of the same region. This reduces the cost for applications that are just processing the data all in one region.

My standard question again is how do you architect S3 based applications taking into account Limitations and Pricing?

My suspicion is that there will be trade offs with pricing and in practice it may be necessary to build price/usage models and explore sensitivity to ensure that the solution is affordable and does't result in "bill shock" if something changes even by a small amount.

For example, an analysis of trade-off between standard and IA pricing: https://www.concurrencylabs.com/blog/save-money-using-s3-infrequent-access/

And here is a blog with some "issues" to watch out for: http://www.aws-simplified.com/aws/aws-s3/the-aws-s3-and-glacier-pitfalls/


Finally a quick look at cost.  Over the last 10 years I've worked with lots of clients looking at migrating enterprise systems to the cloud. My approach has been to use APM data to build performance and cost models to explore potential issues before they arise.   One example from a few years ago was a government application for managing workflows across whole of government focussed on document management. Typically documents are uploaded to the system and then found by other users who edit them and create new documents and the workflow keeps track of versions and all related documents. Workflows and documents can last for months.  I've created a simplified cost model of a system based on this application deployed to a single S3 region assuming the number of users doesn't increase but that the number of documents does increase linearly over time. I've looked at 2 alternatives, one uses S3 standard only, the other uses standard + IA (documents are moved to IA after 1 month). Note that the costs are sensitive to a number of factors including read/write ratios, average document size, and number of transactions per second and load growth over time.  Graphs show cost ($/month) for 12 months for the 2 options.  Main observations are that the price per month naturally increases as the number of objects grows, that standard+ia is slightly cheaper than standard only, management and operations cost is small relative to storage and data transfer (which is highest percentage).

NOTE: What other costs may be incurred? E.g. Caching, security/encryption? Also note that this is storage only as I havn't looked at running the complete workflow/document management system in AWS (maybe next blog). 






Postscript - a thought about AWS S3 API complexity and use...

I had a look at the aws documentation for AWS S3 API. For the REST API there is 1 operation for the s3 service itself, 49 operations for buckets, and 21 operations for objects, a total of 71 operations.
At one level the API documentation is really just a list of operations and detailed documentation for each. What would be useful (for me anyway) is something like a couple of UML diagrams (class and state diagrams etc) explaining the relationship between entities, operations, and what's valid/invalid to do depending on the state.   I would imagine (and looking at some code examples) that it's relatively complicated to program anything useful for S3 on the client side (and how do you keep track of state and your objects and type and contents etc?).

Another way of looking at this is to examine the documentation for a single language client API for S3. I picked JavaScript at random as it's good for client side code development. Guess how long the documentation is?  100 pages? 10,000 words? 200 pages? 20,000 words? Nope, 238 pages and 40,000 words. About PhD thesis size.

Here it is if you have time read it! http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html 

What's the easiest way of interacting with S3? Possibly using other AWS services. E.g. Lambda and Step Functions perhaps? https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-creating-lambda-state-machine.html 

According the docs S3 is "deeply integrated" with 14+ other AWS services:

"Amazon S3 is deeply integrated with other AWS services to make it easier to build solutions that use a range of AWS services. Integrations include AWS Storage Gateway,Amazon CloudFrontAmazon CloudWatchAmazon KinesisAmazon RDSAmazon GlacierAmazon EBSAmazon DynamoDBAmazon RedshiftAmazon Route 53Amazon EMRAmazon VPCAmazon KMS, and AWS Lambda."

PPS
What's the difference between a NoSQL database and AWS S3?  DynamoDB is the AWS NoSQL database.  Here's a blog which explains the differences and how they turned S3 into a NoSQL database: http://www.s3nosql.com.s3.amazonaws.com/infinitedata.html 

So here's a weird thing, AWS S3 also actually has SQL with the additon of AWS Athena (sort of, reads only): https://aws.amazon.com/blogs/aws/amazon-athena-interactive-sql-queries-for-data-in-amazon-s3/
Here's a bit more about how it works: https://aws.amazon.com/athena/faqs/ 
It appears to use Presto (which i thought only (?) worked on Hadoop? https://aws.amazon.com/emr/details/presto/ )
Presto was developed at Facebook and was also used by Netflix: https://en.wikipedia.org/wiki/Presto_(SQL_query_engine) 
It was designed for large scale queries over relational data but not to support full relational database functionality.

What are some use cases? How does it compare to Redshift? https://www.alooma.com/blog/amazon-athena
A big data example: https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
And another analysis: http://blog.octo.com/en/i-have-tested-amazon-athena-and-have-gone-ballistic/ 
And a twitter use case: 
http://northbaysolutions.com/blog/athena-beyond-the-basics-part-1/
http://northbaysolutions.com/blog/athena-beyond-the-basics-part-2/


I also wondered about which java persistence frameworks are supported for AWS databases and what an example looks like. Here's one nice example from a Java perspective (using EC2, S3 and RDS): 
http://briansjavablog.blogspot.com.au/2016/05/spring-boot-angular-amazon-web-services.html

And yet another product (Heroku, a cloud PaaS) which can interoperate with S3:
https://devcenter.heroku.com/articles/using-amazon-s3-for-file-uploads-with-java-and-play-2
https://devcenter.heroku.com/articles/s3  (which has a warning about doing large file uploads to S3 from a single threaded client programming language).
How is Heroku similar/different to AWS? Here's a recent comparison: https://dzone.com/articles/heroku-or-amazon-web-services-which-is-best-for-your-startup


PPPS - patterns

I realised that you really can't make much sense of single AWS services or combinations without thinking in terms of architectural patterns. Looks like a few people have thought about this before me, including amazon and the book:


Cloud Computing Patterns
Fundamentals to Design, Build, and Manage Cloud Applications
By: Christoph Fehling, Frank Leymann, Ralph Retter, Walter Schupeck, Peter Arbitter

with associated web site and slides etc:

http://www.cloudcomputingpatterns.org/
https://indico.scc.kit.edu/indico/event/26/session/1/contribution/12/material/slides/0.pdf 

P4S
In Australia we call AWS Glacier an "Esky" (in NZ it's a Chilly bin).

A big one (Bondi beach pool), they appear to have the wrong beverage in it!

Image result for esky  with beer

A Chilly Bin (at Xmas on the beach in NZ)

Image result for chilly bin kiwi

Comments

  1. Wonderful blog on Cloud domain, Thank you sharing the informative article with us. Hope your article will reach top of the SERP result to the familiar cloud related queries
    Regards:
    Cloud Computing Courses | Cloud computing course in Chennai

    ReplyDelete


  2. Thanks for your informative article.Its very helpful. AWS Training in Chennai

    ReplyDelete




  3. It is really a great work and the way in which you are sharing the knowledge is excellent.Amazon Web service Training in Velachery

    ReplyDelete
  4. Great article ...Thanks for your great information, the contents are quiet interesting. I will be waiting for your next post.
    AWS Training in Hyderabad

    ReplyDelete
  5. Excellent!! You provided very useful information in this article. I have read many articles in various sites but this article is giving in depth explanation about Amazon Web Services Online . Recently, I also took training on this “Amazon Web Services Online Training ” from Excelr.
    Amazon Web Services Online Training

    ReplyDelete
  6. It is really a great work and the way in which you are sharing the knowledge is excellent

    Regards
    Top aws training in chennai

    ReplyDelete
  7. This comment has been removed by the author.

    ReplyDelete
  8. Good information is very nice. And keep on giving to others. thanks....
    Microsoft azure Training in Hyderabad

    ReplyDelete
  9. Really It is very useful information for us. thanks for sharing..
    AWS Training In Hyderabad

    ReplyDelete

  10. The information which you have provided is very good. It is very useful who is looking for AWS Learning

    ReplyDelete
  11. Good Post. I like your blog. Thanks for Sharing
    AWS Training in Gurgaon

    ReplyDelete
  12. great post and creative ideas. I am happy to visit and read useful articles here. I hope you continue to do the sharing through the post to the reader. Thanks for sharing
    AWS Online Training
    AWS Training in Hyderabad

    ReplyDelete
  13. Thanks for sharing such an amazing information its very beneficial for our company. our company name is innomatics research labs we offering data science,big data and many more courses to make student career successfull and we are giving online, classroom and corporate training our website is https://www.innomatics.in

    ReplyDelete
  14. It’s really nice and meaningful. It’s really cool blog. You have really helped lots of people who visit Blog and provide them useful information. Thanks for sharing.
    Big Data Hadoop professional training in Noida

    ReplyDelete
  15. Usually I never comment on blogs but your article is so convincing that I never stop myself to say something about it. You’re doing a great job Man, Keep it up.

    AWS Training in Chennai / Best AWS Training in Chennai
    AWS Training Course in Chennai / Best AWS Training Institute in Chennai

    ReplyDelete
  16. This comment has been removed by the author.

    ReplyDelete
  17. Thanks for sharing this information!
    I totally agree with you. Your information is very interesting and important. I really like this information.
    Our easy isTop School in Mahbubnagar
    Private School in Mahabubnagar
    English Medium School in Mahabubnagar
    If you want to see our training venue then click on links:
    http://www.rainbowschoolmbnr.com/
    Call Now: 08542-245730
    Drop Mail: info@rainbowconceptschool.com

    ReplyDelete
  18. I am so happy after reading your blog. It’s very useful blog for us.

    Oracle BI training course

    ReplyDelete
  19. Really very happy to say, your post is very interesting to read. I never stop myself to say something about it. You’re doing a great job. Keep it up…

    Advance your career in Cloud Computing by doing Best AWS Training in Pune from 3RI Technologies, Pune.

    ReplyDelete
  20. Hi,Thanks for sharing nice blog posting...

    More: https://www.kellytechno.com/Hyderabad/Course/amazon-web-services-training

    AWS Training in Hyderabad

    ReplyDelete
  21. Best article, very useful and explanatory. Your post is extremely incredible. Thank you very much for the new information.
    GCP Training Online
    Online GCP Training

    ReplyDelete
  22. This comment has been removed by the author.

    ReplyDelete
  23. Thanks for posting such useful information. You have done a great job.
    AWS Training
    AWS Online Training
    Amazon Web Services Online Training

    ReplyDelete
  24. One of the most wanted cloud computing course is Amazon web services. click here to learn AWS from top tier institutes.

    ReplyDelete
  25. It's very excellent, thanks for sharing

    ReplyDelete
  26. Very informative blog and useful article thank you for sharing with us, keep posting learn more about aws with cloud computing. Learn AWS in Cognex institute in chennai. Cognex offer many courses AWS Training in chennai, microsoft training in chennai, prince2 foundation in chennai

    ReplyDelete
  27. Good Post! , it was so good to read and useful to improve my knowledge as an updated one, keep blogging. After seeing your article I want to say that also a well-written article with some very good information which is very useful for the AWS Cloud Practitioner Online Training

    ReplyDelete
  28. Very nice article,Keep Sharing more articles with us,
    Thank you.

    ServiceNow Admin Online Training

    ReplyDelete
  29. Nice tips. Very innovative... Your post shows all your effort and great experience towards your work Your Information is Great if mastered very well.
    Artificial Intelligence build and understand intelligent entities or agents 2 main approaches: “engineering” versus “cognitive modeling”
    AI Training in Bangalore

    AI Course in Bangalore

    ReplyDelete
  30. Thanks for sharing this information.
    RR technosoft DevOps online training in hyderabad .RR Technosoft offers DevOps training in Hyderabad. Get trained by 15+ years of real-time IT experience, 4+ years of DevOps & AWS experience. RR Technosoft is one of the trusted institutes for DevOps Online training in Hyderabad.

    Get more information call us 7680001943

    ReplyDelete
  31. if ur interested in learning AWS course please visit our website
    AWS Training in Hyderabad

    ReplyDelete
  32. The AWS certification course has become the need of the hour for freshers, IT professionals, or young entrepreneurs. AWS is one of the largest global cloud platforms that aids in hosting and managing company services on the internet. It was conceived in the year 2006 to service the clients in the best way possible by offering customized IT infrastructure. Due to its robustness, Digital Nest added AWS training in Hyderabad under the umbrella of other courses. www.digitalnest.in

    ReplyDelete
  33. Awesome blog. Informative and knowledgeable content. Keep sharing more stuff like this. Thank you for sharing this blog with us.
    Data Science Online Training in Hyderabad

    ReplyDelete
  34. An awesome blog for the freshers. Thanks for posting this information.
    AWS Online Training Hyderabad
    Best AWS Online Course

    ReplyDelete
  35. Amazon S3 Consulting is a highly scalable, secure, and cost-effective object storage service provided by Amazon Web Services (AWS). It is designed to store and retrieve any amount of data from anywhere on the web, making it an ideal choice for a wide range of use cases, from simple backup and archiving to complex data-intensive applications. Understanding the basics of Amazon S3 is crucial for effectively utilizing this powerful storage solution.

    ReplyDelete

Post a Comment

Popular posts from this blog

Chapter 11: AWS Directory Service, Cloud Directory

AWS Solution Architecture Certification Postscript