Big Data/Data Analytics Performance and Scalability: Opportunities for Performance Modelling Data Analytics Platforms and Applications
Big Data/Data Analytics Performance and Scalability: Opportunities for Performance Modelling Data Analytics Platforms and Applications
Paul Brebner, Draft V1 February 2016, Draft V2 April 2017
Data Analytics and Performance Modelling
My last 10 years R&D with NICTA and then start-up CTO experience resulted in a Software Performance Modelling technology, which is itself an example of data analytics. The most recent innovation is the ability to automatically build performance models from software monitoring data (lots of it). Our processing pipeline (and experience with client technologies) includes commercial and open source data analytics tools (e.g. SPLUNK, Hive, R, Cassandra, Amazon cloud, etc).
·
I have recently applied automatic performance
modelling to a client problem (Department of Immigration, Visa Risk System) involving
predicting real time analytics performance and scalability issues due to code
changes during their DevOps lifecycle.
·
Some recent published research concludes that
there are potentially big problems but also significant opportunities with the
performance and scalability of Big Data/Data Analytics. Some problems run up to 10 times slower with
different configurations. Some problems
just won’t run at all on given infrastructure. Other problems will cost more to
solve.
·
Apache Spark was motivated/invented through
performance modelling of previous approaches TODO track down the paper again, here are some on performance modelling Spark:
http://ieeexplore.ieee.org/document/7336160/
And this one (not modelling) on performance:
https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-ousterhout.pdf
http://ieeexplore.ieee.org/document/7336160/
And this one (not modelling) on performance:
https://www.usenix.org/system/files/conference/nsdi15/nsdi15-paper-ousterhout.pdf
·
Some previous colleagues and co-authors have
published in this area including Professor Wolfgang Emmerich on Data Analytics
for business innovation, and Ian Gorton on Data Analytics Architecture:
An example of Performance Modelling for Data Analytics
·
Following is a simple example data analytics
performance model. It shows the impact
on total time and cost for a DataAnalytics problem with performance data
borrowed from a publication. The problem
is a MapReduce problem with 1TB input data, 100 cores available, and running on
a public cloud such as Amazon EC2 at 10c/core/hour.
· It
shows the difference
between same job run on Hadoop and then SPARK architecture (single pass). Speedup of about 20% and reduced cost of $10
per run. Over a year assuming a typical business use case of 1 run/day this
amounts to a cost saving of $3,650.
Similar modelling may be used to optimise more complex problems and
configuration options, including checking if workloads and architectures will
scale up to signicantly larger sizes (e.g. What if 1PetaByte of data? What if
10,000 cores?, what is the optimal price/performance, etc.
· Given that performance models can now be routinely and automatically built for problems such as this for arbitrary
sized/complexity systems from APM vendor data (e.g. Dynatrace), it would be
relatively straightforward to investigate some typical Data Analytics
applications and platforms, and determine if there were performance and
scalability problems or possible optimisations, etc.
· There is also the
substantial opportunity to invent a new better performing/scalable/cost
effective/cloud capable etc Data Analytics solution through the use of
performance modelling of existing platforms and alternatives to solve specific
novel problems (e.g. in areas of Big Data, IoT, Application/Cloud Performance
Management - real-time autonomic systems, e.g. consuming AWS X-Ray monitoring
data and automatically scaling AWS services and infrastructure as load/problems
changes over time.
· There is still a lot of
manual configuration and perforamance engineering, particularly for databases,
for AWS which is time consuming, error prone, and insfuficiently elastic and
automatic for a cloud platform and which could be potentially addressed through
a combination of monitoring and performance modelling and prediction.
HADOOP Model
The
following performance model is for the standard MapReduce HADOOP solution.
SPARK Model
The
following performance model is for the Apache SPARK version of the same problem
(in memory, and single pass processing).
And the results:
Get expert Question & Answer Dumps PDF Online AWS Certified Professional Exams. We Provide latest IT Certification Exams PDF for preparation Study Guide Test Practice for Success in exams. This is an online education portal
ReplyDeleteThank you for your valuable info keep sharing the valuable information like this..
ReplyDeleteml model performance monitoring
ml model monitoring
machine learning monitoring tools
ml monitoring on azure
data quality on azure
data quality on aws
data quality monitoring tools
piperr
Thank you for your valuable info..
ReplyDeletepayroll software
visitor management software
mobile app development
mobile app development company
android app development company
ios app development company
vehicle tracking software
fleet management software
Very nice article,Thank you for sharing this awesome Blog.
ReplyDeleteKeep updating.....
Big Data Hadoop Training
Big Data Online Training
Thanks for the detailed blog.The blog consist of informational data about what a user basically serach.You may visit to the Global Tech Council to get the best deal.
ReplyDeleteVisit-Big data analytics certification
Really awesome blog. Your blog is really useful for me. Thanks for sharing this informative blog. Keep update your blog.
ReplyDeleteData Analytics Certification
I am very happy when reading this blog post because the blog post writes in a good manner. Thanks for sharing valuable information.
ReplyDeleteOnline Big Data Hadoop Training Cost
Big Data is the most popular tech stack that every business is taking into consideration. Thanks for sharing.
ReplyDeleteLooking for Business Data Analytics Company? Reach Way2Smile DMCC.
I would like to thank you for the efforts you have made in writing this interesting and knowledgeable article. You can also check info about Big Data Analytics Systems & Solutions
ReplyDeletebig data customer analytics
I loved your post.Much thanks again. Fantastic.
ReplyDeleteMuleSoft training
python training
Angular js training
selenium trainings
sql server dba training