Chaos Sloths? (or Chaos Sheep?) Move over Chaos Monkeys! A new type of chaos engineering "agent" for performance engineering?

I recently came across this post on Chaos Engineering which got me thinking again about cloud complexity.  And this article. The idea behind both is that current software systems are so complex you just can't test them before production, you have to test them during production (and try not to annoy too many paying or advert clicking customers, or just use Kiwis as Canaries).
Has anyone thought of and tried doing chaos engineering for performance engineering? I.e. introduce performance problems on purpose in production systems and see how quickly they are detected and if the system is self-healing enough to prevent any SLAs being violated? What would you call this? Chaos Sloths?! No one appears to have thought of this before, surprising!  You would need very good APM, baseline performance data (e.g. Blue/Green), and predictive analytics etc to detect any problems before they cause major issues (e.g. SLA violations).


And "why did the Sloth cross the road"? Because he had a few hours to kill? and lots of traffic to annoy.


In New Zealand we don't have Sloths we have Sheep, lots of them (on roads often). Hint: Don't honk your horn, it won't make them go any faster. And there's safety in numbers (the sheep that is).


So maybe "Chaos Sheep?

Not much has been written about performance monitoring and testing for Canay testing, odd.
Here's something from Hawkular (Open source APM?) 
An implementation of Open Tracing, a vendor-neutral open standard for distributed tracing.
A bit more, it seems to be API based.

We have some experience with this sort of APM approach with a client. Unfortunately it has it's limitations c.f. Agent based (e.g. Dynatrace, AppDynamics, etc), as all you get is timing data (elapsed time only, not detailed time breakdowns including say CPU, IO, suspension, wait, sync, throttled etc) from one point in the code to another (where developers have made the correct call), and if they have bothered to use it at all (i.e. gaps are common). Also doesn't work for UEM (e.g. browsers).

PS
Perhaps Chaos Sloths for introducing performance delays?
And Chaos Sheep for introducing unexpected extra load?

Comments

  1. Want to change your career in Selenium? Red Prism Group is one of the best training coaching for Selenium in Noida. Now start your career for Selenium Automation with Red Prism Group. Join training institute for selenium in noida.

    ReplyDelete

Post a Comment

Popular posts from this blog

Which Amazon Web Services are Interoperable?

AWS Certification glossary quiz: IAM

AWS SWF vs Lambda + step functions? Simple answer is use Lambda for all new applications.