AWS: Time to get practical - running a big simulation in a hurry (trying to)

SAM I AM (Who Does like Green eggs and Ham?!)




Help!  Can you guys run a big simulation on the home gaming computer??? My older son is at uni and studying mechanical engineering (materials?).  I actually find it very interesting that mechanical engineers make use of modelling and simulation extensively. After all, who wants to build something and have it break or cost more than it needs to etc? They are actually substantially more mature than software engineers, given that we found it very difficult to introduce our software performance modelling technology into common practice.

Anyway, my son was trying to run a big simulation on his biggish laptop, but it kept failing after running for hours just before spitting out the "answer" (poorly designed if it doesn't keep track of intermediate results for checkpointing?)

It's an open source program from sam.nrel.gov

The System Advisor Model (SAM) is a performance and financial model designed to facilitate decision making for people involved in the renewable energy industry:

SAM makes performance predictions and cost of energy estimates for grid-connected power projects based on installation and operating costs and system design parameters that you specify as inputs to the model. Projects can be either on the customer side of the utility meter, buying and selling electricity at retail rates, or on the utility side of the meter, selling electricity at a price negotiated through a power purchase agreement (PPA).

It looks pretty cool. We decided to try it on AWS (EC2).

First problem was that it's not obvious if is scales linearly with increasing cores or not, or what the min/max systems requirements are, if it's been run on AWS EC2 before, or what the most demanding resource is (e.g. cpu, memory, disk). So, difficult to know what sort of EC2 instance is best.

I used the AWS tutorial to start an instance.

https://aws.amazon.com/getting-started/tutorials/launch-windows-vm/

A number of observations. The console isn't very "GUI" based, it would be nice to have a proper graphical model (e.g. using network relationships between components) to visualise and control your resources. I.e. AWS resources have relationships and dependencies but this is not shown visually in the AWS console/dashboard. 

The price is not a 1st order piece of information when you go to select and launch a VM.

The whole create key pair, get password, decrypt, save to a  "secure place", etc is more complicated than it needs to be surely? Couldn't a simple application handle all this?

Ok, I picked an EC2 instance at random (m4.4xlarge) running in Sydney.
m4.4xlarge1653.564EBS Only$1.736 per Hour

Running some version of Windows server.

It takes a while to start an instance and be able to get the password and logon (5 minutes).

I wasn't sure at this stage if the version of windows server supplied would work with SAM or not.

Starting up IE on the new instance, the 1st problem was getting SAM to download. IE comes locked down and it took some effort to allow files from remote sites to be able to be downloaded. Finally got this to work, downloaded and installed SAM.

Next problem is that I had to give an email address to ket the SAM key. I used gmail. gmail doesn't work with this version of IE, wonderful. Tried downloading Chrome, no luck, Firefox, no luck. Not sure how to get any other useful browser. Bother.

Ended up switching to my laptop and copying the key from gmail into this blog (duh).
This worked ok and I could start the SAM program running on.

Next problem is that my son had given me a simulation data file to run via facebook. Again the problem was how to get the link to open in IE to download. Gave up using IE and downloaded the file on my laptop. Tried using Google drive to copy the file to AWS EC2 instances. No luck with that. Silly me, looks like you can actually just use copy/paste using the clipboard from/to windows to EC2, so I got it on the EC2 instance ok. Started the SAM program running (with 13,104 simulations to run) 47 minutes ago and it's finished the initial setup and has started to run the simulations.

Task manager reports:

SAM says 11 processes running
10% avg CPU (20% peak) from 16 virtual processors
60% RAM (40GB)
CPUs are Xeon E5-2686-v4 running at 2.3GHz

So there are a few problems here.  The CPU utilisation is well under 100% so it looks like all the cores can't be utilised by SAM.  It looks like only 2-4 cores are being used(if that, task manager doesn't report how many cores I've really got, I guess because I'm stuck in a VM. Actually it does, but need to go into details with "Open Resource Monitor"). I didn't need this much memory either. The other annoying thing is that 2.3GHz is pretty slow for cpus. Our gaming machine and my laptop (ok, it's also a gaming machine) can overclock to 4-5 GHz (8 cores).

Xeon E5-2686-v4 cpus are actually pretty fast in terms of multi-core

http://www.cpubenchmark.net/high_end_cpus.html

But pretty slow for single threaded (which is close to what I'm getting bother)

https://www.cpubenchmark.net/singleThread.html


Where do you see how many hours your instances have been running for, and what the current total price is?

Doesn't appear to work during the 1st hour?  I think you have to enable reporting, which I've done, but a message popped up saying it takes 24 hours - why so slooooooooooooow??? Maybe they don't have any spare cpu ha ha.  Why isn't billing enabled by default? I can see how much of the free tier I've used (not much I just spun a free instance up and shut it down), but not how much of the $50 voucher I've used for the non-free tier, odd.

Another odd thing, I tried using the AWS Windows EC2 documentation (from the desktop icon) and it doesn't work, comes up with an error about Javascript must be enabled!)

SAM has settled down to using 3 threads, 6% cpu average and 10MBs disk I/O.
RAM has 100 hard faults per second (which seems high?) Actually all svchost.exe (windows).
After about 2 hours running the page faults have jumped to 1,200 per second. I wonder what's going on? sam.exe also keeps getting suspended for a few seconds a time, something to do with virtualisation? Nope, turns out it had finished the simulation and was just trying to bootstrap itself into self-consciousness or whatever programs do when they have spare time.

Maybe I'll try a memory optimised instance type next time, say this one:

r3.2xlarge826611 x 160 SSD$1.25 per Hour


I noticed another odd thing. In the top right of the windows desktop there's summary information of the instance.  Under Architecture is says AMD64.  I thought (and windows thinks) that it's running on Xeons!
Not that I have anything against AMD cpus. I bought an Acer Ferrari in 2004 with AMD cpu, fastest laptop in the world in 2004, but had to have external cooling otherwise it would just freeze after half an hour. It cost 1000 pounds (english $), and it still works :-)

The ferrari specs.

And it was beautiful!

There is a logical explanation. Turns out AMD was the first to have a 64-bit "Intel" architecture!

Note
Amazon EC2 instances run on 64-bit virtual Intel processors as specified in the instance type product pages. For more information about the hardware specifications for each Amazon EC2 instance type, see Amazon EC2 Instances. However, confusion may result from industry naming conventions for 64-bit CPUs. Chip manufacturer Advanced Micro Devices (AMD) introduced the first commercially successful 64-bit architecture based on the Intel x86 instruction set. Consequently, the architecture is widely referred to as AMD64 regardless of the chip manufacturer. Windows and several Linux distributions follow this practice. This explains why the internal system information on an Ubuntu or Windows EC2 instance displays the CPU architecture as AMD64 even though the instances are running on Intel hardware.

PS
Day 2
Trying this instance type:
r4.2xlarge82761EBS Only$1.006 per Hour




Not as many cores and same memory.
It turns out there are 4 different simulations I have to run. I only ran 1 yesterday. It looks as if each simulation has different resource usage mixes and run times.  The 1st simulation took about an hour (wasn't sure when it had actually completed) and used 60% CPU and 40GB Ram. 11 threads and NO page faults and NO disk activity odd. So it's still only using 2 cores (I guess 1 physical cores which looks like 2 hyperthreads?).

Another observation is that using "shared" (multi-tenancy) cloud instances is a painful experience for anything interactive and computationally demanding. Mouse clicking and typing is painful and NOT interactive most of the time. It's like using a heavily used time-sharing system from the 80's with 100's of student compiling programs at the same time yuck.

Running 2nd simulation now...

I had a closer looks on what SAM is doing. It runs a series of models for 1 year (365 days) with 1 hour resolution.   I think I'm trying to run:

4 x 365 * 24 * 13104 simulations in total = 459, 164, 160 = a lot

The simulations are being run for 2 different locations, Sydney and Oban (fishing village in Scotland!) Cool, we went there for a holiday in 2004 on the way to Mull (famous harbour, too many Midges for camping next to a stream in Spring in Scotland, DON'T)

Here's Oban (it has a folly on the hill)


Each simulation took about an hour and I got the result sent off (turns out this is just step 1 of a 3 step project, so more simulations may be required). What was it all about? It was an evaluation of 2 different solar water heater technologies, one designed for low/medium solar areas and another for medium/high solar areas. We were running simulations to determine the total energy produced from each in the 2 locations and also what time of year the most energy was needed vs. produced. You can get twice as much solar energy in Sydney c.f. Oban.   Also wanted to know the optimal angles to point the solar heater (up/down, compass direction).

Here are the result for both locations.  Naturally you have to point them in different directions depending on which hemisphere you are in!



And this link is a map with different solar radiation data for the planet.


P2S
Silly me, I could have been running multiple EC2 instances concurrently (1 per simulation). However, as I was experimenting it was probably sensible to start simply. Although I wonder how scalable this all is? What if I needed to run 4, 8, 16 or more instances? There must be automation for this. E.g. Create an AMI with the SAM software on it ready to go?  Is it the Launch more like this? Silly name, clone is better.

What sort of simulation throughput did I achieve in total?

459, 164, 160/4 hours = 31.888 simulations/second

P3S

Interactive performance on EC2 instances? Yes this can and will be a problem.
Datadog has a long report on EC2 performance, worth a read. They say that:

AWS Guarantees Capacity, Not Performance EC2 instance types and other services offered by AWS offer guarantees for resource capacity such as compute, memory, disk size, etc. Because of multi-tenancy, AWS offers few guarantees of performance. While you may have the raw capacity promised, these resources may not be running at the performance levels you desire

Also from AppDynamics:
https://blog.appdynamics.com/product/a-guide-to-performance-challenges-with-aws-ec2-part-1/
https://blog.appdynamics.com/product/a-guide-to-performance-challenges-with-aws-ec2-part-2/


During benchmarking recently I noticed performance issues with smaller EC2 instances (unpredictable, and throttling/bursting) but the larger instances (which I've been using for these sims) didn't have these problems.

Dynatrace also has insights in AWS EC2 performance (both from monitoring other apps on AWS and also from running Dynatrace on AWS). Based on my experience with a few APM products Dynatrace would be the best starting point for application monitoring for AWS applications (most detailed transactional data for every transaction, and best UEM monitoring which would tell how much of the problem is network or browser related). 

Also NewRelic.

I'm still keen to try out AWS X-Ray as this may help with monitoring inside AWS services that other APM vendors probably can't see into.


A few opinions on X-Ray:

https://logz.io/blog/aws-xray-new-relic-alternative/
https://www.instana.com/blog/amazon-commoditizes-tracing-new-aws-x-ray/ 

P4S
Running more simulations with SAM today. Turns out not all simulations are the same. This one is using 8 cores and 24 threads on same size instance (r4.2xlarge) as used last time, but only 4GB Ram.  So an instance with less RAM and more Cores would have been better.


Comments

  1. Want to change your career in Selenium? Red Prism Group is one of the best training coaching for Selenium in Noida. Now start your career for Selenium Automation with Red Prism Group. Join training institute for selenium in noida.

    ReplyDelete

Post a Comment

Popular posts from this blog

Chapter 11: AWS Directory Service, Cloud Directory

AWS Solution Architecture Certification Postscript

Chapter 2: Amazon Simple Storage Service (S3) and Amazon Glacier Storage (for storing your beer?)