Who am I? What did a Computer Scientist do during a typical career: Artificial Intelligence, Machine Learning, Data Analytics
Here's the 2nd of my "cover letters" on what I've done during a typical (?) Computer Science career focussing on:
Artificial Intelligence, Machine Learning, Data Analytics
10 years study in Computer Science, Philosophy, AI and
Machine Learning (Waikato University 1980-1985, UNSW 1986-1990). MSc (1st class honours, Waikato).
Thesis on Autonomous Paradigm-directed learning in the domain of a “child” learner
in the robot block stacking world (with simulated naïve physics, a robot arm,
and blocks of different sizes and shapes which could be stacked up but which
would eventually fall down). Software
architecture involved learning algorithm from past experience which would
attempt to develop a theory which was consistent with a current “paradigm”
(belief about what was important in the world), which would be tested by a
sub-program designed to find the next best experiment to conduct (based on
current state of the world, past experiments, current theory etc), and a
sub-program which would revise the theory based on the and the current
paradigm, and finally another sub-program which was constantly trying to
develop better paradigms and compare them until a new “winner” emerges. I recently found my MSc thesis and scanned a few sample pages in:
The program had a high level architecture as follows:
1 Simulation of the robot arm, blocks, and naive block stacking physics. The world had a number of different sized and shaped blocks, the arm could simply pick one up and put it somewhere (e.g. on top of another one), and the naive physics determined if it would stay put or if the whole pile would collapse (for most children this is the fun part). The blocks also had other features such as colour, texture, patterns, etc. None of these features "mattered". What mattered was the size, shape, and location of blocks relative to each other (i.e. a naive centre of gravity). There was no wind or annoying little sisters.
2 Related to (1) there was also a graphical representation (simplified) of the robot arm and current state of the world. You could watch the arm pick up a block, move it somewhere, drop it, and if the resulting state wasn't stable there would as a "crash" and the blocks in the pile would end up randomly scattered around the "floor".
3 The "brain" had several modules including:
3a Theory formation (inductive concept formation): Based on remembering previous actions and the results of them, develop a new theory that is consistent with the evidence AND consistent with the current paradigm. If this is not possible then throw away the current paradigm and decide on another one (3b). Note that a theory was a set of causal laws which predicted if a block would stay on after stacking or not. It involved time (using Allen's temporal logic), actions (what the arm did), and what the properties and relationship of the blocks in the current pile were (ignoring other blocks from memory).
3b Paradigm-generator: Given (some) memory of what has already happened (paradigms, actions and results, theories, but not complete knowledge as the "paradigm" colours or limits what it thought was interesting and therefore what it remembered), choose another paradigm to try. Note: A paradigm was highly simplified, it was just a set of "properties" and "relationships" that were currently deemed interesting. There were about 10 properties (e.g. colour, size, shape, etc) and relationships for horizontal and vertical block positions.
3c Choose the next action: Based on current paradigm, current theory, and memory of actions and results performed so far (but only partial information) see (3b) choose an action that maximises the chance of learning something "interesting", either by attempted refutation (best) or confirmation of the theory. The idea of refutation of theories as preferable came from Popper.
3d I think there was another module which kept track of what's was happening and provided a stream of consciousness explanation (in a text bubble on the screen) so I knew what was going on (this may have been part of the other modules). I.e. it could also express "emotions", it was "happy" when an experiment had gone well (i.e. prediction was confirmed and the blocks either stayed put or crashed depending on the prediction), or "annoyed" when this didn't happen (actually more "puzzled").
Now, if I'd implemented all this as a single monolithic Prolog program it would probably have "worked", but would have been hard to write and modify. So I wrote it as a series of sub-modules loosely coupled around the above functional divisions. In practice there was communication required between each module, and data which needed to be shared as well. In the long run I ended up with the largest known Prolog program at the time (10kLOC?), and a working program (that took 3 days of VAX 11/780 time to run through a couple of paradigm shifts and a few dozen actions while everyone else was away on holidays).
And the architecture diagram:
The program had a high level architecture as follows:
1 Simulation of the robot arm, blocks, and naive block stacking physics. The world had a number of different sized and shaped blocks, the arm could simply pick one up and put it somewhere (e.g. on top of another one), and the naive physics determined if it would stay put or if the whole pile would collapse (for most children this is the fun part). The blocks also had other features such as colour, texture, patterns, etc. None of these features "mattered". What mattered was the size, shape, and location of blocks relative to each other (i.e. a naive centre of gravity). There was no wind or annoying little sisters.
2 Related to (1) there was also a graphical representation (simplified) of the robot arm and current state of the world. You could watch the arm pick up a block, move it somewhere, drop it, and if the resulting state wasn't stable there would as a "crash" and the blocks in the pile would end up randomly scattered around the "floor".
3 The "brain" had several modules including:
3a Theory formation (inductive concept formation): Based on remembering previous actions and the results of them, develop a new theory that is consistent with the evidence AND consistent with the current paradigm. If this is not possible then throw away the current paradigm and decide on another one (3b). Note that a theory was a set of causal laws which predicted if a block would stay on after stacking or not. It involved time (using Allen's temporal logic), actions (what the arm did), and what the properties and relationship of the blocks in the current pile were (ignoring other blocks from memory).
3b Paradigm-generator: Given (some) memory of what has already happened (paradigms, actions and results, theories, but not complete knowledge as the "paradigm" colours or limits what it thought was interesting and therefore what it remembered), choose another paradigm to try. Note: A paradigm was highly simplified, it was just a set of "properties" and "relationships" that were currently deemed interesting. There were about 10 properties (e.g. colour, size, shape, etc) and relationships for horizontal and vertical block positions.
3c Choose the next action: Based on current paradigm, current theory, and memory of actions and results performed so far (but only partial information) see (3b) choose an action that maximises the chance of learning something "interesting", either by attempted refutation (best) or confirmation of the theory. The idea of refutation of theories as preferable came from Popper.
3d I think there was another module which kept track of what's was happening and provided a stream of consciousness explanation (in a text bubble on the screen) so I knew what was going on (this may have been part of the other modules). I.e. it could also express "emotions", it was "happy" when an experiment had gone well (i.e. prediction was confirmed and the blocks either stayed put or crashed depending on the prediction), or "annoyed" when this didn't happen (actually more "puzzled").
Now, if I'd implemented all this as a single monolithic Prolog program it would probably have "worked", but would have been hard to write and modify. So I wrote it as a series of sub-modules loosely coupled around the above functional divisions. In practice there was communication required between each module, and data which needed to be shared as well. In the long run I ended up with the largest known Prolog program at the time (10kLOC?), and a working program (that took 3 days of VAX 11/780 time to run through a couple of paradigm shifts and a few dozen actions while everyone else was away on holidays).
And the architecture diagram:
A few more pages including the cartoon robot thinking to itself:
And cartoon version of the robot doing experiments, thinking aloud, changing theories and paradigms etc (not actual screen shots). Drawn on the 1st apple Macintosh computer the department had (it had a mouse! You had to book days in advance for 1/2 hour slots, the IBM PC didn't have any booking sheet ha ha). Notice the change in the robots "emotions" as it is surprised by things.
Then followed 5 years PhD study at UNSW (1986-199) developing new machine
learning algorithms for autonomous learning programs in temporal first-order
domains. Developed several promising heuristic incremental algorithms for
learning first-order horn-class logic. Worked part-time for the Sydney Expert
Systems research group and developed one of the 1st first-order
clustering algorithms (e.g.) Completion
of PhD research was disrupted by the publication in 1988 of the "definitive" algorithm
by Muggleton for Inductive Logic Programming.
Several AI ideas from my PhD studies were incorporated into
subsequent commercial programs including:
automatic generation of test-cases from specifications, automatic
generation of protocols from specifications, temporal logic based multi-media
file system (ABC D-Cart), and use of expert-systems techniques to enhance the
usability of a program for soil modelling (e.g. use of constraints to allow
user selection of sub-sets of available or desired inputs, models, and outputs
for automatic highlighting of possible solutions).
Managed a detailed architectural and technical evaluation of
the Open Geospatial Consortium (OGC) Sensor Web Enablement Services (CSIRO ICT
Centre, 2006/2007), including setting up a testbed and application (in the
domain of water resource monitoring and management) for evaluating data
analytics services around streaming/sensor/complex event processing
pipelines. Also included an evaluation
of complementary commercial/open source products including Esper and Coral8
(CEP).
For the last 10 years I’ve been working on R&D,
productization and consulting in the software architecture/performance
engineering space of Performance Modelling.
Our tool, approach and client problems involve techniques and technologies
from model-based systems, data analytics and machine learning. The tool automatically
consumes, pre-processes, build models, and runs simulations and graphs results from
large amounts of client data collected by/from multiple monitoring products.
I have solved architectural and performance engineering
challenges for clients including:
predicting the impact of new analytics “R” models for Immigration VISA risk
assessments for DevOps pipelines, and architectural advice for a financial risk
assessment client who was migrating to open source data analytics frameworks and
development approaches (e.g. Data Lake, Apache Kafka, Elasticsearch, Greenplum
data warehouse, Oracle, blue/green deployments, Kappa architecture, cloud).
The SaaS Performance modelling tool consists of a model-driven
architecture framework (meta-model with model-driven transformation - model-XML,
model-model, model-to-run-time-solvers transformations), custom model-driven
run-time discrete event simulation (patented) and analytical solvers, GUI
including support for model development, validation, network graphing and
visualization. More details as follows:
- The technology stack consists of javascript (Dojo, D3, React, mobx, NPN, gulp), Grails, Java 8, groovy, grails, Gradle, MySQL, Cassandra, Docker.
- For automatic modelling from large quantities of APM data: REST APIs to Dynatrace XML and Compass/AppDynamics JSON, Apache Hive and Spark processing of APM data
- Apache Maths library for linear and non-linear regression analysis, error margin statistics, and correlation analysis for load dependent analysis.
- Other techniques include generative modelling for exploring long-tail/network-scale effects, Monte Carlo simulation to reduce modelling space sizes, automatic sensitivity analysis, bin-packing algorithms over multiple variables for deployment optimization, Markov chains for modelling load arrival rate distribution, complex business processes and user web interactions.
- Also conducted prototyping experiments with: Spline interpolation and Bayesian inference for extrapolation of data, approaches for sampling and/or incremental (approximate) approaches to model building, model-free approaches to performance prediction (e.g. data-to-data transformations), discrete-event simulation free solver using Markov models directly (accelerated with GPUs), and sophisticated incremental/dynamic data sampling strategies to build models from the smallest possible sample size given constraints such as enormous data sizes (TB), data from long time periods (e.g. months), or a time limit to build models in (similar in many respects to autonomous undirected experiment planning above).
However, data science is fundamentally multidisciplinary (hence my current revitalised interest in it) and includes elements of AI, machine learning, statistics, data analytics, software engineering, big data, distributed systems, data visualisation, sensor networks, data database, cloud computing, high performance computing (HPC), computer science, and more! E.g.
And this Venn Diagram which includes Business, IT and Data Science skills:
PS
I discovered a cool looking Java based Data Analytics and Visualisation tool called DataMelt, but I havn't had time to check it out yet.
Another graphing tool I've used for prototyping mainly are the Google Sankey Diagrams. They support multi-level Sankeys which are great for visualising "flows" over time or over different "components" (e.g. for a performance model we have transaction types on left, software components (composite services and atomic services in the middle), and "Agents", VMs, and finally physically servers on the right. I think I read about using Sankey diagrams in the context of performance engineering from a Netflix blog, maybe this one?
Comments
Post a Comment