A Data Bog (something I appear to have invented accidentally)

A B(l)og Monster

In my blog on Amazon Redshift I got sidetracked by Data Lakes and Data Bogs, here's the extract.

Another service is Redshift Spectrum which supports SQL queries against S3 "Data Lakes". The Data Lake concept worries me slightly as without proper drainage and flow a Lake can very quickly end up as a putrid Bog, complete with a (dancing) Bog (Blog?) Monster!



It appears that I've accidentally invented the term "Data Bog", or Databog (which I've been using for at least a year or 2 after hearing about Data Lakes working on a client project).   If you do a google search the word ""Databoge" comes up  which is German for "Databook".  Now that I've "invented" it I guess I should start to try and define it eventually.

First, Data Lake is a now common term for a body of data which isn't as organised or structured as other types of Data storage architectures, but which is more organised that a "Sea of Data" (given that according to Revelation there won't be any Seas in heaven only Rivers and Lakes of Fire), and has meta data and is structured upon consumption by applications (I think).

Also possible (and references) are "Data Puddles" (isolated splashes of data not connected or well organised?),  Data buckets (think S3), Data Ponds (holding ponds for data as it enters the Data Lake?), Data droplets (from Semantic Web terminology?) , Data stream (that's been around for a while), Data Swamp (What you get if you don't keep track of where you data comes from in a Data Lake).

Some of these terms refer to Wetlands and include Marsh, Swamp, Bog, Fen. The categories of Wetlands in Australia are more complex.  The main feature of a Bog appears to be that it fills with rainwater, and has no outlet (it only drains through the soil).

What sort of things do you find in Bogs? According to the a children's poem and book, a dog and a frog and a hog are all on a log on a bog, but then end up in the bog (except the log I guess).

I had some first hand experience of swamps and bogs on my honeymoon. We went bushwalking (with tents and 7 days worth of food) along the East coast track on Hinchinbrook Island. On the first night we encountered a bog which we had to cross to get to the camp site. I crossed it 5 times as my wife "couldn't see the bottom" and was afraid something would eat here. She almost ended up in the bog, as by the final trip across - after first carrying my pack, then her pack, then finally her - I was ready to "accidentally" drop her in! And then the same again the next day (to get back onto the main track making a total of 10 times). A few days later we had a choice between crossing a narrow looking creek (but with a warning sign) or spending an extra day going around it via a swamp in order to avoid potential crocs in the creek (we went around).



What do I learn from this? That bogs are unpleasant but won't kill you, whereas swamps may have nasty bitey things (e.g. crocs).

Bogs are also known to preserve things well, for example Bog people (bodies, mummies) have been found in Bogs, often 1000s of years after they fell (or were pushed?) in.


Even odder is the idea (from above link) that:

Bogs were both resources and possibly ominous supernatural portals to/from other worlds.


It appears that gifts (even offerings of people?) may have been made in exchange for the valuable resource of peat obtained from bogs. Maybe this suggests a possible definition for "Data Bog"?!


Definition 1: A Data Bog is a data architecture for both querying and storing results, and the data/results are preserved but potentially transformed over a long period of time (what you get out over time isn't necessarily the same as what you put in?!)

Or:

Definition 2: A Data Bog is a data architecture that requires offerings of data to be put into in proportion to the quantity and quality of the data that you take out (otherwise you may get some very unpleasant things out!)  I.e. if you don't pay sufficient attention to what's going in the quality of what comes out may deteriorate. 

PS
Looks like I fell into the Bog in this Blog post. Why did I pay so much attention to this topic? Maybe because I had the interesting experience in the late 1990s of being a CSIRO cross-divisional Software Process Engineering advisor to multiple natural science related software projects particularly in the area of plants, soils and water (I ended in the CSIRO Soils and then Land and Water for several years, and managed/architected one of the 1st trial Java projects in CSIRO using cutting edge tools and methods for scientific s/w development for a Soil Hydraulic properties modelling/prediction application, my Sparc station , Ultra I think had a couple of 100 MHz CPU wow fast, was even called "Humus" - the organic top level of soil)

Comments

Post a Comment

Popular posts from this blog

Which Amazon Web Services are Interoperable?

AWS Certification glossary quiz: IAM

AWS SWF vs Lambda + step functions? Simple answer is use Lambda for all new applications.