​ ​

Cluster Foundry

is a Big Data company, specializing in building distributed data systems. We build data stores, data pipelines and visualization storyboards. Our data engineers have built both distributed batch and real-time streaming data pipelines.

Contact us

Hadoop​ ​&​ ​Spark

On the Hadoop stack, our expertise is on MapReduce, Apache Hive, HBase and Drill.We have helped clients to build data pipelines around these technologies. We specialize in development & testing of Hadoop data systems. We have migrated data-warehouses from Teradata to Hadoop.​ ​We have built in-memory, fast data pipelines on Spark, Spark Streaming & HBase. We have built datawarehouse ETL pipelines using Spark. We have built near real-time event processing pipelines on Spark Streaming & Kafka.

MPP​ ​Databases

At the current state, we strongly believe SQL-on-Hadoop is still in its infancy. Lot of work needs to be done to match-up to the decades of SQL optimization work that has gone into the relational databases world. We build all the ETL work in Hadoop and then offload the “reporting datamart” back to a columnar, MPP database. We have worked on IBM PureSystems (erstwhile Netezza), Amazon Redshift and Teradata.


Our engineers & business analysts can look at your data and help create storyboards to derive insights or actions. Our engineers are skilled in Qlikview and Tableau.

​ ​

Our differentiation

Our 3-in-1 model makes our engineers skilled on the following:

  • Distributed Data Pipelines
  • Storyboarding
  • SQL & Databases


​years​ ​of​ ​building​ ​data​ ​systems


​years​ ​of​ ​Hadoop​ ​experience


years​ ​of​ ​MPP​ ​experience


years​ ​of​ ​Storyboarding

   Contact Us