Chapter 2: Amazon Simple Storage Service (S3) and Amazon Glacier Storage (for storing your beer?)
Chapter 2: Amazon Simple Storage Service (Amazon S3) and Amazon Glacier Storage
The Amazon Glacier Storage Service is designed for cheap long term archival storage, so you want things to stay put in it for a long time. It is a slightly odd choice of name these days given that global warming is a significant threat to Glaciers (Glaciers are "shrinking" and large dangerous chunks are falling off them as in the above image of the warning signs at a famous New Zealand Glacier (Franz Josef Glacier). Maybe the Amazon "Glaciers" will be the only ones still around in a few 100 years.
I initially found it "odd" that the AWS Solutions Architect Certification book introduced storage services before compute services. However, in hindsight this sort of makes sense as S3 is one of the original services, is core to other services, and introduces many of the AWS concepts, and is a "managed" service so it simpler and more typical in some respects.
Some of my initial notes on S3:
AWS S3 is an object store. You can have multiple "buckets", each buckets contains multiple objects. It's a sort of key-value store as each key is the "bucket name" and the value is the data stored. Objects are stored within a single region only but in > 2 locations so the data can be retrieved even if you loose 2 locations.
It has a REST API using standard CRUD semantics: PUT (create), POST (change/update), DELETE (remove), and GET (retrieve) operations.
All object have a URL made up of: bucketname.s3.amazonaws.com/keyname
Data can be deleted but there is a versioning mechanism.
Some of my initial questions were about cost, speed, and if it's transactional.
There is a potential issue with eventual consistency as it's a distributed data store so you can't have high durability with distributed data and immediate consistency. I.e. PUT, READS,and DELETE to existing objects may give stale data. But is ok for new objects.
How long does eventually consistency take in the worst case? Is there a client side mechanism that can be used to get around this issue? Consistency is discussed in more detail here and may be region dependent? http://docs.aws.amazon.com/AmazonS3/latest/dev/Introduction.html#ConsistencyMode
Netflix has a potential solution documented here: http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html
And here is a good introduction to internet scale conistency: http://cloudacademy.com/blog/consistency-models-of-amazon-cloud-services/
S3 can be used for static web hosting, this looks like an interested use case.
S3 supports multiple "storage classes". These are tradeoffs between availability, durability, read speed and cost.
S3 supports lifecycle configuration rules to allow automatic transition of objects from one class to another and eventual deletion.
Classes are standard, Infrequent Access (IA), Reduced Redundancy Storage (RRS), and Glacier.
S3 supports encryption of objects in flight and at rest.
S3 supports multi-part upload API which supports higher throughput data uploads. This should be used for data > 100MB and must be used for data > 5GB.
Cross-region replication is supported allowing async replication of objects from a bucket in one region to one other region. But this only replicates NEW objects for an existing bucket.
S3 also supports logging, event notification (to SNS, SQS and Lambda functions).
S3 supports caching, and indexing in DynamoDB.
Compared to other similar object stores, Cassandra has "tunable" consistency which may give more flexibility.
S3 doesn't appear to support "locking" (i.e. concurrent updates to the same object - latest update wins).
S3 objects have key, value, version id, metadata (set of key/value pairs), and AC information.
Performance issues? Need to randomise the key using a hash for better performance (this messes up the URL however). and CloudFront for heavy reads (how is the cache refreshed from S3?)
More on storage classes:
Infrequent Access (IA) has 99.99% availability, 128KB minimum data object size (Why?), 30 days minimum duration (what's the minimum duration for standard? Can you create lots of objects and delete them after a few minutes and only be charged for the time used?) and a fee per GB data read.
Reduced Redundancy Storage (RRS) provides "400 x disk durability" but you can only loose 1 location (i.e. >= 2 locations). NOTE: RRS appears to be "Redundant" now as the cost if > standard. This illustrates one of the architectural choices with AWS which is that price is a big factor and it's beyond your control so becomes a significant architectural "feature". How much would it cost to move from RRS to IA or back to standard at present?
Remember that it's free to move data in and out of the same region. This reduces the cost for applications that are just processing the data all in one region.
My standard question again is how do you architect S3 based applications taking into account Limitations and Pricing?
My suspicion is that there will be trade offs with pricing and in practice it may be necessary to build price/usage models and explore sensitivity to ensure that the solution is affordable and does't result in "bill shock" if something changes even by a small amount.
For example, an analysis of trade-off between standard and IA pricing: https://www.concurrencylabs.com/blog/save-money-using-s3-infrequent-access/
And here is a blog with some "issues" to watch out for: http://www.aws-simplified.com/aws/aws-s3/the-aws-s3-and-glacier-pitfalls/
Finally a quick look at cost. Over the last 10 years I've worked with lots of clients looking at migrating enterprise systems to the cloud. My approach has been to use APM data to build performance and cost models to explore potential issues before they arise. One example from a few years ago was a government application for managing workflows across whole of government focussed on document management. Typically documents are uploaded to the system and then found by other users who edit them and create new documents and the workflow keeps track of versions and all related documents. Workflows and documents can last for months. I've created a simplified cost model of a system based on this application deployed to a single S3 region assuming the number of users doesn't increase but that the number of documents does increase linearly over time. I've looked at 2 alternatives, one uses S3 standard only, the other uses standard + IA (documents are moved to IA after 1 month). Note that the costs are sensitive to a number of factors including read/write ratios, average document size, and number of transactions per second and load growth over time. Graphs show cost ($/month) for 12 months for the 2 options. Main observations are that the price per month naturally increases as the number of objects grows, that standard+ia is slightly cheaper than standard only, management and operations cost is small relative to storage and data transfer (which is highest percentage).
NOTE: What other costs may be incurred? E.g. Caching, security/encryption? Also note that this is storage only as I havn't looked at running the complete workflow/document management system in AWS (maybe next blog).
NOTE: What other costs may be incurred? E.g. Caching, security/encryption? Also note that this is storage only as I havn't looked at running the complete workflow/document management system in AWS (maybe next blog).
Postscript - a thought about AWS S3 API complexity and use...
I had a look at the aws documentation for AWS S3 API. For the REST API there is 1 operation for the s3 service itself, 49 operations for buckets, and 21 operations for objects, a total of 71 operations.
At one level the API documentation is really just a list of operations and detailed documentation for each. What would be useful (for me anyway) is something like a couple of UML diagrams (class and state diagrams etc) explaining the relationship between entities, operations, and what's valid/invalid to do depending on the state. I would imagine (and looking at some code examples) that it's relatively complicated to program anything useful for S3 on the client side (and how do you keep track of state and your objects and type and contents etc?).
Another way of looking at this is to examine the documentation for a single language client API for S3. I picked JavaScript at random as it's good for client side code development. Guess how long the documentation is? 100 pages? 10,000 words? 200 pages? 20,000 words? Nope, 238 pages and 40,000 words. About PhD thesis size.
Here it is if you have time read it! http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/S3.html
What's the easiest way of interacting with S3? Possibly using other AWS services. E.g. Lambda and Step Functions perhaps? https://docs.aws.amazon.com/step-functions/latest/dg/tutorial-creating-lambda-state-machine.html
According the docs S3 is "deeply integrated" with 14+ other AWS services:
"Amazon S3 is deeply integrated with other AWS services to make it easier to build solutions that use a range of AWS services. Integrations include AWS Storage Gateway,Amazon CloudFront, Amazon CloudWatch, Amazon Kinesis, Amazon RDS, Amazon Glacier, Amazon EBS, Amazon DynamoDB, Amazon Redshift, Amazon Route 53, Amazon EMR, Amazon VPC, Amazon KMS, and AWS Lambda."
PPS
What's the difference between a NoSQL database and AWS S3? DynamoDB is the AWS NoSQL database. Here's a blog which explains the differences and how they turned S3 into a NoSQL database: http://www.s3nosql.com.s3.amazonaws.com/infinitedata.html
So here's a weird thing, AWS S3 also actually has SQL with the additon of AWS Athena (sort of, reads only): https://aws.amazon.com/blogs/aws/amazon-athena-interactive-sql-queries-for-data-in-amazon-s3/
Here's a bit more about how it works: https://aws.amazon.com/athena/faqs/
It appears to use Presto (which i thought only (?) worked on Hadoop? https://aws.amazon.com/emr/details/presto/ )
Presto was developed at Facebook and was also used by Netflix: https://en.wikipedia.org/wiki/Presto_(SQL_query_engine)
It was designed for large scale queries over relational data but not to support full relational database functionality.
What are some use cases? How does it compare to Redshift? https://www.alooma.com/blog/amazon-athena
A big data example: https://aws.amazon.com/blogs/big-data/analyzing-data-in-s3-using-amazon-athena/
And another analysis: http://blog.octo.com/en/i-have-tested-amazon-athena-and-have-gone-ballistic/
And a twitter use case:
http://northbaysolutions.com/blog/athena-beyond-the-basics-part-1/
http://northbaysolutions.com/blog/athena-beyond-the-basics-part-2/
I also wondered about which java persistence frameworks are supported for AWS databases and what an example looks like. Here's one nice example from a Java perspective (using EC2, S3 and RDS):
http://briansjavablog.blogspot.com.au/2016/05/spring-boot-angular-amazon-web-services.html
And yet another product (Heroku, a cloud PaaS) which can interoperate with S3:
https://devcenter.heroku.com/articles/using-amazon-s3-for-file-uploads-with-java-and-play-2
https://devcenter.heroku.com/articles/s3 (which has a warning about doing large file uploads to S3 from a single threaded client programming language).
How is Heroku similar/different to AWS? Here's a recent comparison: https://dzone.com/articles/heroku-or-amazon-web-services-which-is-best-for-your-startup
PPPS - patterns
I realised that you really can't make much sense of single AWS services or combinations without thinking in terms of architectural patterns. Looks like a few people have thought about this before me, including amazon and the book:
Cloud Computing Patterns
Fundamentals to Design, Build, and Manage Cloud Applications
By: Christoph Fehling, Frank Leymann, Ralph Retter, Walter Schupeck, Peter ArbitterFundamentals to Design, Build, and Manage Cloud Applications
with associated web site and slides etc:
http://www.cloudcomputingpatterns.org/
https://indico.scc.kit.edu/indico/event/26/session/1/contribution/12/material/slides/0.pdf
P4S
In Australia we call AWS Glacier an "Esky" (in NZ it's a Chilly bin).
A big one (Bondi beach pool), they appear to have the wrong beverage in it!
A Chilly Bin (at Xmas on the beach in NZ)
Nice Article on Amazon S3, thank you for such clear explanation.
ReplyDeleteBest Regards,
AWS Online Training
AWS Training
Amazon Web Services Online Training in Hyderabad
AWS Online Training in Hyderabad
AWS Certification Online Training
AWS Training Online
Wonderful blog on Cloud domain, Thank you sharing the informative article with us. Hope your article will reach top of the SERP result to the familiar cloud related queries
ReplyDeleteRegards:
Cloud Computing Courses | Cloud computing course in Chennai
ReplyDeleteThanks for your informative article.Its very helpful. AWS Training in Chennai
ReplyDeleteIt is really a great work and the way in which you are sharing the knowledge is excellent.Amazon Web service Training in Velachery
Great article ...Thanks for your great information, the contents are quiet interesting. I will be waiting for your next post.
ReplyDeleteAWS Training in Hyderabad
Excellent!! You provided very useful information in this article. I have read many articles in various sites but this article is giving in depth explanation about Amazon Web Services Online . Recently, I also took training on this “Amazon Web Services Online Training ” from Excelr.
ReplyDeleteAmazon Web Services Online Training
It is really a great work and the way in which you are sharing the knowledge is excellent
ReplyDeleteRegards
Top aws training in chennai
Excellent Blog
ReplyDeleteThanks For Sharing
aws training in viayawada
Thank you for sharing wonderfull blog Amazon aws online training in hyderabad
ReplyDeletet's Useful,Thanks for the Information
ReplyDeleteAWS Online Training
Thanks for sharing amazing article Amazon aws online training in hyderabad
ReplyDeleteThis comment has been removed by the author.
ReplyDeleteThanks excellent information Best amazon aws training in hyderabad
ReplyDeleteGood information is very nice. And keep on giving to others. thanks....
ReplyDeleteMicrosoft azure Training in Hyderabad
Really It is very useful information for us. thanks for sharing..
ReplyDeleteAWS Training In Hyderabad
ReplyDeleteThe information which you have provided is very good. It is very useful who is looking for AWS Learning
Good Post. I like your blog. Thanks for Sharing
ReplyDeleteAWS Training in Gurgaon
great post and creative ideas. I am happy to visit and read useful articles here. I hope you continue to do the sharing through the post to the reader. Thanks for sharing
ReplyDeleteAWS Online Training
AWS Training in Hyderabad
Thanks for sharing such an amazing information its very beneficial for our company. our company name is innomatics research labs we offering data science,big data and many more courses to make student career successfull and we are giving online, classroom and corporate training our website is https://www.innomatics.in
ReplyDeleteIt’s really nice and meaningful. It’s really cool blog. You have really helped lots of people who visit Blog and provide them useful information. Thanks for sharing.
ReplyDeleteBig Data Hadoop professional training in Noida
Usually I never comment on blogs but your article is so convincing that I never stop myself to say something about it. You’re doing a great job Man, Keep it up.
ReplyDeleteAWS Training in Chennai / Best AWS Training in Chennai
AWS Training Course in Chennai / Best AWS Training Institute in Chennai
This comment has been removed by the author.
ReplyDeleteThanks for sharing this information!
ReplyDeleteI totally agree with you. Your information is very interesting and important. I really like this information.
Our easy isTop School in Mahbubnagar
Private School in Mahabubnagar
English Medium School in Mahabubnagar
If you want to see our training venue then click on links:
http://www.rainbowschoolmbnr.com/
Call Now: 08542-245730
Drop Mail: info@rainbowconceptschool.com
NIice information.
ReplyDeleteAWS Training in Hyderabad
Best AWS Training in Hyderabad
AWS Online Training
AWS Training Online
AWS Training In Bangalore
Nice Article.
ReplyDeleteAWS Training in Hyderabad
Best AWS Training in Hyderabad
I am so happy after reading your blog. It’s very useful blog for us.
ReplyDeleteOracle BI training course
Really very happy to say, your post is very interesting to read. I never stop myself to say something about it. You’re doing a great job. Keep it up…
ReplyDeleteAdvance your career in Cloud Computing by doing Best AWS Training in Pune from 3RI Technologies, Pune.
Hi,Thanks for sharing nice blog posting...
ReplyDeleteMore: https://www.kellytechno.com/Hyderabad/Course/amazon-web-services-training
AWS Training in Hyderabad
Best article, very useful and explanatory. Your post is extremely incredible. Thank you very much for the new information.
ReplyDeleteGCP Training Online
Online GCP Training
This comment has been removed by the author.
ReplyDeleteThanks for posting such useful information. You have done a great job.
ReplyDeleteAWS Training
AWS Online Training
Amazon Web Services Online Training
Cool Post.
ReplyDeleteAWS Training in Chennai
One of the most wanted cloud computing course is Amazon web services. click here to learn AWS from top tier institutes.
ReplyDeleteGreat job. It’s amazing. You can make information unique and interesting. Thanks for sharing your blog is awesome.I gathered lots of information from this blog.
ReplyDeleteSalesforce Training in Chennai
Salesforce Online Training in Chennai
Salesforce Training in Bangalore
Salesforce Training in Hyderabad
Salesforce training in ameerpet
Salesforce Training in Pune
Salesforce Online Training
Salesforce Training
Excellent Post on AWS.
ReplyDeleteAWS Training in Chennai | AWS Training Institute in Chennai
The information which you have provided in this blog is really useful to everyone. Thanks for sharing.
ReplyDeleteDevOps Training institute in Ameerpet
DevOps Training in Hyderabad
DevOps Project Training
DevOps Training in Ameerpet
DevOps Training institute in Hyderabad
DevOps Course in Hyderabad
Thanks for delivering a good stuff, Explanation is good, Nice Article.
ReplyDeleteBest Servicenow Online Training
Servicenow Developer Training Online
Servicenow Admin Training Online
Learn Servicenow Online
Servicenow Developer Online Training
Servicenow Admin Online Training
Servicenow Online Training in India
Servicenow Online Training Hyderabad
Servicenow Online Training India
itsm Training
It's very excellent, thanks for sharing
ReplyDeleteVery informative blog and useful article thank you for sharing with us, keep posting learn more about aws with cloud computing. Learn AWS in Cognex institute in chennai. Cognex offer many courses AWS Training in chennai, microsoft training in chennai, prince2 foundation in chennai
ReplyDeleteGood Post! , it was so good to read and useful to improve my knowledge as an updated one, keep blogging. After seeing your article I want to say that also a well-written article with some very good information which is very useful for the AWS Cloud Practitioner Online Training
ReplyDeleteVery nice article,Keep Sharing more articles with us,
ReplyDeleteThank you.
ServiceNow Admin Online Training
Nice tips. Very innovative... Your post shows all your effort and great experience towards your work Your Information is Great if mastered very well.
ReplyDeleteArtificial Intelligence build and understand intelligent entities or agents 2 main approaches: “engineering” versus “cognitive modeling”
AI Training in Bangalore
AI Course in Bangalore
Thanks for sharing this information.
ReplyDeleteRR technosoft DevOps online training in hyderabad .RR Technosoft offers DevOps training in Hyderabad. Get trained by 15+ years of real-time IT experience, 4+ years of DevOps & AWS experience. RR Technosoft is one of the trusted institutes for DevOps Online training in Hyderabad.
Get more information call us 7680001943
if ur interested in learning AWS course please visit our website
ReplyDeleteAWS Training in Hyderabad
The AWS certification course has become the need of the hour for freshers, IT professionals, or young entrepreneurs. AWS is one of the largest global cloud platforms that aids in hosting and managing company services on the internet. It was conceived in the year 2006 to service the clients in the best way possible by offering customized IT infrastructure. Due to its robustness, Digital Nest added AWS training in Hyderabad under the umbrella of other courses. www.digitalnest.in
ReplyDeleteAwesome blog. Informative and knowledgeable content. Keep sharing more stuff like this. Thank you for sharing this blog with us.
ReplyDeleteData Science Online Training in Hyderabad
An awesome blog for the freshers. Thanks for posting this information.
ReplyDeleteAWS Online Training Hyderabad
Best AWS Online Course
Thank you for introducing this tool. keep it updated.
ReplyDeleteBest AWS Training Online
Aws Devops Training Online
best seo company in chennai
ReplyDeletebest seo company in chennai
Amazon S3 Consulting is a highly scalable, secure, and cost-effective object storage service provided by Amazon Web Services (AWS). It is designed to store and retrieve any amount of data from anywhere on the web, making it an ideal choice for a wide range of use cases, from simple backup and archiving to complex data-intensive applications. Understanding the basics of Amazon S3 is crucial for effectively utilizing this powerful storage solution.
ReplyDelete