AWS Solution Architect Practice Test - Where did I go wrong? Things to look at more closely

Image result for sherlock holmes magnifying glass

Here are some of the things I was a bit uncertain about in the real practice tests and should follow up on (i.e. why is this the correct answer, how would I work it out next time, etc).

QN 1: AWS Signature V4? What's this? I don't seem to have come across this in the book. (And I checked by searching the google version of the book, not there).

Signing AWS API Requests

When you send HTTP requests to AWS, you sign the requests so that AWS can identify who sent them. You sign requests with your AWS access key, which consists of an access key ID and secret access key. Some requests do not need to be signed, such as anonymous requests to Amazon Simple Storage Service (Amazon S3) and some API operations in AWS Security Token Service (AWS STS) such as AssumeRoleWithWebIdentity.

Signature V4

Signature Version 4 is the process to add authentication information to AWS requests. For security, most requests to AWS must be signed with an access key, which consists of an access key ID and secret access key.

And what about V2??? TODO

QN 2: CIDR what's the min number of IP addresses for a subnet? 8 was one option. Obviously too small as 8-5=3 which isn't many. (5 is number reserved by AWS, see below).

VPC and Subnet Sizing for IPv4

You can assign a single CIDR block to a VPC. The allowed block size is between a /16 netmask and /28 netmask. In other words, the VPC can contain from 16 (i.e. /28) to 65,536 (i.e. /16) IP addresses.

VPC and Subnet Sizing for IPv6

You can associate a single IPv6 CIDR block with an existing VPC in your account, or when you create a new VPC. The CIDR block uses a fixed prefix length of /56. You cannot choose the range of addresses or the IPv6 CIDR block size; we assign the block to your VPC from Amazon's pool of IPv6 addresses.

I also find this comment amusing in the docs:
There are many tools available to help you calculate subnet CIDR blocks; for example, see http://www.subnet-calculator.com/cidr.php. Also, your network engineering group can help you determine the CIDR blocks to specify for your subnets.
So why is this in the solution arch certification????

QN 3: How many IP addresses does AWS reserve in a subnet? 5 (there's no way to work it from 1st principles, just remember it):

The first four IP addresses and the last IP address in each subnet CIDR block are not available for you to use, and cannot be assigned to an instance. For example, in a subnet with CIDR block 10.0.0.0/24, the following five IP addresses are reserved:
  • 10.0.0.0: Network address.
  • 10.0.0.1: Reserved by AWS for the VPC router.
  • 10.0.0.2: Reserved by AWS. The IP address of the DNS server is always the base of the VPC network range plus two; however, we also reserve the base of each subnet range plus two. For more information, see Amazon DNS Server.
  • 10.0.0.3: Reserved by AWS for future use.
  • 10.0.0.255: Network broadcast address. We do not support broadcast in a VPC, therefore we reserve this address.

QN 4: AWS Trusted Advisor has how many checks? For all customers there are 4:

Access to the four core Trusted Advisor checks to help increase the security and performance of your environment. Checks include:
Security: Security Groups - Specific Ports Unrestricted, IAM Use, MFA on Root Account
Performance: Service Limits

QN 5: For ELB, what parts of the OSI stack does it cover? 
On the Level
Per the well-known OSI model, load balancers generally run at Layer 4 (transport) or Layer 7 (application).
A Layer 4 load balancer works at the network protocol level and does not look inside of the actual network packets, remaining unaware of the specifics of HTTP and HTTPS. In other words, it balances the load without necessarily knowing a whole lot about it.
A Layer 7 load balancer is more sophisticated and more powerful. It inspects packets, has access to HTTP and HTTPS headers, and (armed with more information) can do a more intelligent job of spreading the load out to the target.

BUT The answer seems to depend on which ELB you are using?
Today we are launching a new Application Load Balancer option for ELB. This option runs at Layer 7 and supports a number of advanced features. The original option (now called a Classic Load Balancer) is still available to you and continues to offer Layer 4 and Layer 7 functionality.

I.e. Application Load Balancer, layer 7 (only?)
Classic Load Balancer, layer 4 and 7.

QN 6: ELB How does the Proxy Protocol work? Actually I think I knew this, the question confused me when it only talked about the human readable header! I don't think this is the important concept at all.

Starting today, Elastic Load Balancing (ELB) supports Proxy Protocol version 1. You can now identify the originating IP address of a client connecting to your servers using TCP load balancing. Client connection information, such as IP address and port, is typically lost when requests are proxied through a load balancer. This is because the load balancer sends requests to the server on behalf of the client, making your load balancer appear as though it is the requesting client. Having the originating client IP address is useful if you need more information about visitors to your applications. For example, you may want to gather connection statistics, analyze traffic logs, or manage whitelists of IP addresses.
Until today, ELB allowed you to obtain the clients IP address only if you used HTTP(S) load balancing, which adds this information in the X-Forwarded-For headers. Since X-Forwarded-For is used in HTTP headers only, you could not obtain the clients IP address if the ELB was configured for TCP load balancing. Many of you told us that you wanted similar functionality for TCP traffic, so we added support for Proxy Protocol. It simply prepends a human readable header with the clients connection information to the TCP data sent to your server. The advantage of Proxy Protocol is that it can be used with any protocol layer above TCP, since it has no knowledge of the higher-level protocol that is used on top of the connection. Proxy Protocol is useful when you are serving non-HTTP traffic. Alternatively, you can use it if you are sending HTTPS requests and do not want to terminate the SSL connection on the load balancer. For more information, please visit the Elastic Load Balancing Guide.

QN 7: Which AWS services work with multi-AZ???

This is confusing, not all the above services have AWS documentation that mentions multi-AZ support (e.g. DynamoDB).  TODO Also note that for RDS databases the multi-AZ is for a master/slave database instance (only 1 can be in use at a time for users), whereas for Redis and DynamoDB I think the replication is automatic and transparent (should it be called multi-AZ?)

QN 8: What's the difference between S3 storage classes?  One question asked how you could reduce costs for S3 storage without reducing durability and still be able to retrieve data quickly (I thought seconds). The "correct" answer was Standard - IA. However, given the drop to 3 9s availability from 4 9s, I don't think this answer is correct. In the worst case it could take close to 2 minutes (per day) to retrieve data as:

The main difference between standard and Standard - IA is the availability, 4-9s vs 3-9s. 
This is the difference between 10s and 2 minutes unavailability on average per day. 

Also min object size, time and retrieval costs.

Ah, misread the question it actually said:

Your company stores documents in Amazon Simple Storage Service (Amazon S3), but it wants to minimize cost. Most documents are used actively for only about a month, then much less frequently. However, all data needs to be available within minutes when requested. How can you meet these requirements?

However, I'm still not convinced that Standard - IA satisfies this, as over a year the unavailability is about 9 hours. There's no guarantee this is spread over the whole year equally, it may be during the period you want to access the data in which case it won't be available in minutes.

I'm also now curious to know why Standard - IA is reduced availability but not durability? I.e. why might it be offline more often than Standard?



QN 9: Another question about about how data is replicated in RDS from master to read replicas.

One answer is asynchronously (and the one they expected):

When you create a read replica, you specify an existing DB Instance as the source. Amazon RDS takes a snapshot of the source instance and creates a read-only instance from the snapshot. For MySQL, MariaDB and PostgreSQL, Amazon RDS  uses those engines' native asynchronous replication to update the read replica whenever there is a change to the source DB instance.

This is interesting, read replicas can be promoted to master:

Read replicas in Amazon RDS for MySQL, MariaDB, and PostgreSQL provide a complementary availability mechanism to Amazon RDS Multi-AZ Deployments. You can promote a read replica if the source DB instance fails. You can also replicate DB instances across AWS Regions as part of your disaster recovery strategy. This functionality complements the synchronous replication, automatic failure detection, and failover provided with Multi-AZ deployments.

And multi-AZ:

Amazon RDS Multi-AZ deployments provide enhanced availability and durability for Database (DB) Instances, making them a natural fit for production database workloads. When you provision a Multi-AZ DB Instance, Amazon RDS automatically creates a primary DB Instance and synchronously replicates the data to a standby instance in a different Availability Zone (AZ). Each AZ runs on its own physically distinct, independent infrastructure, and is engineered to be highly reliable. In case of an infrastructure failure, Amazon RDS performs an automatic failover to the standby (or to a read replica in the case of Amazon Aurora), so that you can resume database operations as soon as the failover is complete. Since the endpoint for your DB Instance remains the same after a failover, your application can resume database operation without the need for manual administrative intervention.

Synchronous replication is used here as you want to be 100% certain that a transaction involving writes has been "saved" to 2 locations before completing, so if the master dies you can immediately swap to the slave.
And finally?
Amazon RDS allows you to use read replicas with Multi-AZ deployments. In Multi-AZ deployments for MySQL, MariaDB, Oracle, SQL Server, and PostgreSQL, the data in your primary DB Instance is synchronously replicated to to a standby instance in a different Availability Zone (AZ). Because of their synchronous replication, Multi-AZ deployments for these engines offer greater data durability benefits than do read replicas. (In all Amazon RDS for Aurora deployments, your data is automatically replicated across 3 Availability Zones.)
You can use Multi-AZ deployments and read replicas in conjunction to enjoy the complementary benefits of each. You can simply specify that a given Multi-AZ deployment is the source DB Instance for your Read replicas. That way you gain both the data durability and availability benefits of Multi-AZ deployments and the read scaling benefits of read replicas.
Note that for Multi-AZ deployments, you have the option to create your read replica in an AZ other than that of the primary and the standby for even more redundancy. You can identify the AZ corresponding to your standby by looking at the "Secondary Zone" field of your DB Instance in the AWS Management Console.

So looks like you can combine Multi-AZ master/slaves and read replicas, but Mult-AZ is always replicated synchronously, and read replicas are always replicated asynchronously.

Another way of thinking about this is related to load on the target servers. For the master/slave replication there is no load on the slave until the master fails, so synchronous updates to the slave will in theory be fast. And you want these to be 100% reliable.

For read-replicas on the other hand, there may be an arbitrary number (so async is "obvious"), and some/all of them may be flat out processing read requests. This could delay synchronous updates (and I have seen this happen in a sensor web scenario that used synchronous  writes to the read replicas). So again this makes sense, when you have an arbitrary number of hings to send a message to and they may be busy so can't process the request instantly, you have to use async messaging (or a broker :-)

QN 10:  My next problem was SQS Delay Queues (and Visibility Timeouts?)
Delay queues let you postpone the delivery of new messages in a queue for the specified number of seconds. If you create a delay queue, any message that you send to that queue is invisible to consumers for the duration of the delay period. You can use the CreateQueue action to create a delay queue by setting the DelaySeconds attribute to any value between 0 and 900 (15 minutes). You can also change an existing queue into a delay queue using the SetQueueAttributes action to set the queue's DelaySeconds attribute.

Is this the same as this???
Amazon SQS Message Timers
Amazon SQS message timers allow you to specify an initial invisibility period for a message that you add to a queue. For example, if you send a message with the DelaySeconds parameter set to 45, the message isn't visible to consumers for the first 45 seconds during which the message stays in the queue. The default value for DelaySeconds is 0.

What's the difference???

And Visibility Timeouts???
Delay queues are similar to visibility timeouts because both features make messages unavailable to consumers for a specific period of time. The difference between delay queues and visibility timeouts is that for delay queues a message is hidden when it's first added to queue, whereas for visibility timeouts a message is hidden only after a message is consumed from the queue. The following figure illustrates the relationship between delay queues and visibility timeouts.


What are they used for???

Visibility timeout is a workaround for the fact that SQS doesn't guarantee exactly-once-delivery like most enterprise queue systems do:

When a consumer receives and processes a message from a queue, the message remains in the queue. Amazon SQS doesn't automatically delete the message: Because it's a distributed system, there is no guarantee that the component will actually receive the message (the connection can break or a component can fail to receive the message). Thus, the consumer must delete the message from the queue after receiving and processing it.
Immediately after the message is received, it remains in the queue. To prevent other consumers from processing the message again, Amazon SQS sets a visibility timeout, a period of time during which Amazon SQS prevents other consuming components from receiving and processing the message.
Note
For standard queues, the visibility timeout isn't a guarantee against receiving a message twice. For more information, see At-Least-Once Delivery.

On the other hand the Delay Queue concept is not AWS specific. It can be used for rate limiting, ensuring delayed events are processed in order, task timers (from gaming?)

And in Java 7, and a subclass of Blocking Queues. Blocking queues in general are used when you want to ensure that you have > 1 event in the queue before removal and processing (maybe waiting until a message of the desired type is found). Still not 100% why and when they were first invented.
Similar to a command queue perhaps! Possibly to wait until a certain number of events, or combination or events, or a "trigger" event has been received before processing all the events in the queue.

Note that this is a relatively sophisticated computer science concept, and it seems odd to be in this certification (particularly with no further explanation of why/when it's useful).

Delay Queue also appears to be a misnomer as it's still at the message level only (not the whole queue - I think).

QN 11: What do you need to delete a SQS message? This was a trick question!

To delete a message, you must send a separate request which acknowledges that you no longer need the message because you've successfully received and processed it.

It just implies you can select a a message and delete it. Sort of, this is just for the console, there's nothing in the console docs that mention the receipt handle (as this isn't visible at that level!)



DeleteMessage

Deletes the specified message from the specified queue. You specify the message by using the message's receipt handle and not the MessageId you receive when you send the message. Even if the message is locked by another reader due to the visibility timeout setting, it is still deleted from the queue. If you leave a message in the queue for longer than the queue's configured retention period, Amazon SQS automatically deletes the message.
Note
The receipt handle is associated with a specific instance of receiving the message. If you receive a message more than once, the receipt handle you get each time you receive the message is different. If you don't provide the most recently received receipt handle for the message when you use the DeleteMessage action, the request succeeds, but the message might not be deleted.
For standard queues, it is possible to receive a message even after you delete it. This might happen on rare occasions if one of the servers storing a copy of the message is unavailable when you send the request to delete the message. The copy remains on the server and might be returned to you on a subsequent receive request. You should ensure that your application is idempotent, so that receiving a message more than once does not cause issues.

But this is mentioned below the console delete documentation:

Java
To specify the message to delete, provide the receipt handle that Amazon SQS returned when you received the message. You can delete only one message per action. 


QN: Some questions and some reference architectures refer to SimpleDB! What's that? It' NOT IN THE BOOK.
It's another NoSQL DB.

Q: How does Amazon DynamoDB differ from Amazon SimpleDB? Which should I use?
Both services are non-relational databases that remove the work of database administration. 
Amazon DynamoDB focuses on providing seamless scalability and fast, predictable performance. It runs on solid state disks (SSDs) for low-latency response times, and there are no limits on the request capacity or storage size for a given table. This is because Amazon DynamoDB automatically partitions your data and workload over a sufficient number of servers to meet the scale requirements you provide. 
In contrast, a domain in Amazon SimpleDB has a strict storage limitation of 10 GB and is limited in the request capacity it can achieve (typically under 25 writes/second); it is up to you to manage the partitioning and re-partitioning of your data over additional SimpleDB tables if you need additional scale.
Perhaps this is why it's not mentioned? Has it been deprecated? Looks like it. Blogs suggest that it's more expensive and doesn't appear on the AWS service overviews. E.g. this April 2017 AWS Services overview.

PS
The other somewhat disturbing observation I make is that when I did one of the practice tests before I started studying for this certification I got something like 60% correct with no study, just general common sense and several years high level architectural knowledge of AWS. I.e. most of the questions were trivial. I can now get about 75% correct, so not much improvement. The majority of the questions are still trivial, and the ones I get wrong require detailed pedantic knowledge of numbers or network or security terminology and protocols etc, which apparently I haven't acquired by studying the material yet.


Comments

  1. Nice article, users are attracted when they see your post thanks for posting keep updating
    AWS Online Training

    ReplyDelete
  2. Want to change your career in Selenium? Red Prism Group is one of the best training coaching for Selenium in Noida. Now start your career for Selenium Automation with Red Prism Group. Join training institute for selenium in noida.

    ReplyDelete

Post a Comment

Popular posts from this blog

AWS Certification glossary quiz: IAM

AWS SWF vs Lambda + step functions? Simple answer is use Lambda for all new applications.

Chapter 11: AWS Directory Service, Cloud Directory