Chapter 11: Is Kinesis really suitable for Real Time Stream Processing??? Is there anybody out there?

Is there anybody out there? Are polling and real-time stream processing compatible?





At the recent AWS Summit in Sydney in May I attended an interesting talk on Kinesis.

The speaker (Johnathon Meichtry) implied that the only way get data out of Kinesis (streams?) was by polling, which when I talked with him afterwards he didn't seem to understand the potential limitations with this for real time stream processing, or be familiar with other event/stream processing platforms that didn't have this limitation. My previous experience with CEP and stream processing was that for real-time requirements polling is "evil" and that the event processing rules/applications should be triggered immediately by new events coming into the stream (i.e. the semantics are that each new event fires the processing immediately and once per event).

So what are the actual requirements for streaming data processing? The 8 requirements of Real-Time Stream Processing paper and this blog make a good case for some basic features.  I agree with one of their observations, that Polling shouldn't be a feature. But SQL? No thanks. (So maybe you can pick and choose? Maybe polling is ok in some situations?)

And it does look as if polling is a given for Kinesis (given that it's distributed and RESTful there is perhaps no way around it). But maybe Lambda is a work around?

Based on this I would say that Kinesis is NOT A REAL TIME STREAM PROCESSING SYSTEM (pity). I would like to be wrong...?

This blog looks at Kinesis in production (c.f. Kaftka). They conclude that latency and throughput are ok for a distributed real-time stream processing system (but not perfect, particularly 5 reads per second limit).

Are other stream processing platforms "real-time" or not? E.g. Spark, Storm, Kaftka?

This blog on Spark says:

However, if you look at the architecture for the demonstration, there is no poll-wait at any stage. The closing piece of advice is thus: maintain a push model throughout the architecture.

And polling problems with similar systems.

Not sure if Kaftka requires polling or not? Possibly supports long polling which is also how Kinesis works.

Comments

  1. Want to change your career in Selenium? Red Prism Group is one of the best training coaching for Selenium in Noida. Now start your career for Selenium Automation with Red Prism Group. Join training institute for selenium in noida.

    ReplyDelete

Post a Comment

Popular posts from this blog

Chapter 11: AWS Directory Service, Cloud Directory

AWS Solution Architecture Certification Postscript

Chapter 2: Amazon Simple Storage Service (S3) and Amazon Glacier Storage (for storing your beer?)