The Data Engineering Ecosystem: An Interactive Map

David Drummond and John Joo

March 6, 2015

David Drummond
Insight Data Engineering
Program Director
John Joo
Insight Data Engineering
and Data Science
Program Director
Companies, non-profit organizations, and governments are all starting to realize the huge value that data can provide to customers, decision makers, and concerned citizens. What is often neglected is the amount of engineering required to make that data accessible. Simply using SQL is no longer an option for large, unstructured, or real-time data. Building a system that makes data usable becomes a monumental challenge for data engineers.

There is no plug and play solution that solves every use case. A data pipeline meant for serving ads will look very different from a data pipeline meant for retail analytics. Since there are unlimited permutations of open-source technologies that can be cobbled together, it can be overwhelming when you first encounter them. What do all these tools do and how do they fit into the ecosystem?

Insight Data Engineering Fellows face these same questions when they begin working on their data pipelines. Fortunately, after several iterations of the Insight Data Engineering Program, we have developed this framework for visualizing a typical pipeline and the various data engineering tools. Along with the framework, we have included a set of tools for each category in the interactive map.

Of course, there are more tools than we can possibly cover in a single chart, and many of them cannot be strictly categorized. However, based off several metrics1 and our experience with Fellows and industry mentors, we developed a map of the most widely used tools that represent the broad ecosystem. We hope that it will also help you make sense of the zoo of tools used in the field of data engineering.

View the interactive data engineering ecosystem map.

Interested in transitioning to career in data engineering?
Find out more about the Insight Data Engineering Fellows Program in New York and Silicon Valley, apply today, or sign up for program updates.

Already a data scientist or engineer?
Find out more about our Advanced Workshops for Data Professionals. Register for two-day workshops in Apache Spark and Data Visualization, or sign up for workshop updates.

1 The metrics used to help us choose which tools to include were the number of hits in a Github search for the tool, the number of stars that the project has on Github, and the number of job posts on Indeed in the San Francisco Bay area mentioning the tool.