SPODSKAK Architecture : A real-time analytics architecture that supports AI/ML and can scale.

Pramod Vemulapalli
3 min readMay 13, 2021

Recently, I was trying to design a real-time analytics stack as a part of work and as I started researching the topic, I found an interesting stack emerge from a number of different examples that I saw in Airbnb, Uber, Lyft, Task Human, Redbus and other places.

The core advantages of this stack are:

  1. Open source with Apache or similar licenses
  2. Real-Time analytics: The data is available in the warehouse pretty much instantaneously after the transaction is done.
  3. Massively Scalable via Containerization. Can support hundreds of terabytes to petabytes.
  4. Machine Learning Capable

I call this the ‘SpodSkak’ stack :-) and it consists of the following :

SpodSkak = Superset + Presto + Druid + Spark + Kafka + Airflow + Kurbernetes

  • The idea is to use Kafka for streaming and Spark Streaming for transformations before getting the data into Druid. I have also seen Flink used in these stream processing applications instead of Spark streaming.
  • Use Druid and have presto on top of it to do table joins over druid tables and use the whole setup as the Datawarehouse
  • Use Spark as a machine learning system and deploy ML models created from Spark via Flask
  • Use Apache Superset as the BI layer
  • Use Airflow for all the scheduling and Kubernetes to containerize and manage all these applications

I was partly inspired at coming up with this stack after looking at Airbnb’s architecture and seeing different versions of this stack across the web. I would love to hear your thoughts and comments about the architecture and if you are an engineer that is excited to work on this, then please reach out to me or Sandipto Banerjee at Saviynt.

Here are some of the data architectures that I found across the web.

https://medium.com/airbnb-engineering/druid-airbnb-data-platform-601c312f2a4c

https://imply.io/videos/lyfts-modern-data-architecture-feat-apache-druid-kafka-flink

https://aws.amazon.com/blogs/startups/redbus-building-a-data-platform-with-aws-apache-software-foundation/

https://medium.com/taskhuman/how-taskhuman-supercharges-its-bi-platform-with-apache-superset-9c65facf7577

--

--

Pramod Vemulapalli

A product guy by day, tinkerer by night, and a dreamer by nature