How to build a Data Stack?
7min • Mar 14, 2024 • Last updated on Sep 20, 2024
Alexandra Augusti
Strategy & Operations Manager
The big data industry has rapidly evolved in recent years, with a growth of +62% over the last five years. Yet, only 12% of companies actually use their data for their business use cases (marketing, sales, support, etc.)
The Modern Data Stack (MDS) represents a major evolution in the way companies manage and utilize their data. The trend is towards highly customizable data architectures, where each company can build its own "Ă la carte" solution based on its specific needs, while maintaining flexibility and adaptability.
👉🏼 Overview of what MDS is, its components, and its advantages in this blog post.
Definition of the Modern Data Stack
Today, the trend is more towards storing as much data as possible and thinking about the use cases that can be implemented afterwards. To meet this challenge, the MDS consists of specific bricks used to enhance the use of data.
đź’ˇ The Modern Data Stack perfectly fits into the shift from an IT-centered vision to a business-centered vision in companies and allows everyone to use data to increase the overall performance of the company.
Each brick in the MDS fulfills a specific function, from data ingestion to transformation and visualization. This modularity ensures better controllability, ease of implementation, and scalability, as each component can be adapted or changed according to needs without impacting the whole system.
The Modern Data Stack is characterized by its modularity, allowing companies to select the tools most suited to their specific needs. The technologies used in the MDS emphasize an excellent user experience, which facilitates their adoption by both technical and non-technical teams.
đź’ˇ We talk about the Modern Data Stack today because the solutions used are radically different from those used in the past.
Advantages of the Modern Data Stack
Agile, modular, and scalable, the Modern Data Stack offers a more efficient and flexible data management compared to traditional architectures.
The MDS is often less expensive than traditional data stacks, as cloud-based solutions adopt a pay-as-you-go model.
Moreover, data stacks are no longer limited by data types. They can now easily handle structured, semi-structured, and unstructured (raw) data. This facilitates the use of various sources for analysis and implementation of different use cases.
Finally, the modularity of the Modern Data Stack offers great flexibility. Each company can build its own Data Stack by choosing from a multitude of available tools. Each tool can be easily replaced when it no longer suits. The technologies composing the MDS are generally easy to set up and use, with an intuitive user interface.
The 7 Steps of a Modern Data Stack Project
Audit
The first thing to do is to perform an audit of the current situation and the data maturity of one's company to start a Modern Data Stack project.
The audit goes through several phases:
Identification of existing tools and teams: This step is crucial to understand the current environment and the skills available in organizations. For example, if the company has several data engineers vs. few technical resources, the choice of tools will not necessarily be the same.
Analysis of use cases to be implemented: This step helps to determine how the Modern Data Stack can support the company's objectives: what data to collect (and what are their sources), what data to transform (and how to do it), how to activate its data, etc.
Planning: To successfully carry out a Modern Data Stack project, it is obviously essential to include deadlines and legal or operational constraints, such as GDPR, to choose compatible solutions.
⚠️ However, it is not necessary to choose all the bricks of one's MDS at once. It is relevant to analyze the impact of each tool on one's organization to make the next decisions with as much information as possible.
Data Centralization
Data warehouses, data lakes, databases, on-premises, cloud hosted. The list of data storage solutions is expanding over the years.
However, storing one's data and having a "single source of truth" must be a priority for companies. Centrality is essential for enabling comprehensive and integrated analyses.
With the increase in available data volumes, it becomes necessary to choose a high-performance, scalable (in terms of volume and price), and secure data storage solution.
At DinMo, we recommend using a cloud data warehouse as the single source of truth, for scalability and performance reasons. Technologies like BigQuery, Snowflake, or Redshift can be considered.
đź’ˇDinMo is now Google Cloud Ready certified, allowing better data activation from BigQuery to all business platforms.
Data Ingestion
Once the storage solution is determined, it's important to integrate all its data automatically, coming from multiple sources.
ETL (Extract, Transform and Load) or ELT (Extract, Load, Transform) tools allow retrieving data from sources (emails, CRM, applications, etc.) and storing it in the cloud data warehouse.
This type of process allows for:
Combining all data into a unified view
Improving data productivity
Ensuring a data history
When choosing data ingestion tools, it's essential to consider the implementation and operation costs for custom integrations vs. existing ETL/ELT solutions. For example, tools like Fivetran or Airbyte simplify data ingestion, reducing the need for technical expertise.
Data Orchestration
Data orchestration is essential for planning and managing workflows, replacing manual interventions, and monitoring task execution. Orchestrators allow planning, organizing, and monitoring complex data pipelines.
Some well-known tools are Apache Airflow (an open-source tool) and Dagster (a cloud-native SaaS platform).
Data Transformation
Once the raw data is consolidated and hosted, transforming it is essential to ensure it's ready to be used for modeling or analysis.
Transformation helps companies better organize their data and ensure their quality and ease of use.
Transformations can be done through internal processes, or SaaS or Open source tools. The most well-known tools are, for example, dbt or Talend.
Data Activation
Initially, data use was primarily limited to "visualization" capabilities to assist each team in reading results and making decisions. By reducing the technical barriers of visualization tools (no need for SQL code), data exploration and analytics have become accessible to everyone.
Tools such as Looker Studio, PowerBI or Qlik now allow creating actionable dashboards in a few hours.
However, data activation does not stop here. Visualizing it to make strategic decisions is great, but being able to use it directly in third-party tools is even better.
Reverse ETL addresses this issue by sending the segmented data stored in the data warehouse, to all operational tools. The use cases allowed by Reverse ETL are multiple for marketing, CRM, sales, or support teams. All without needing to code in SQL!
Observability
Observability is often forgotten in the Modern Data Stack, yet it is essential for monitoring its health and performance. For this, it is possible to use data quality monitoring tools (for example, via tools like Sifflet), data catalogs (for example, CastorDoc), and data usage tracking to maintain the efficiency of the stack:
Some solutions offer robust security protocols, encryption techniques, and access controls to protect data
The Data Catalog provides users with the list of available data, details on their content, their context, and metadata (such as descriptions, schemas, properties, and tags).
Some Tips for Building Your Modern Data Stack
With many "blocks" composing a Modern Data Stack, the project can quickly become complicated. Here are our two main recommendations:
Measure ROI: Focus on specific use cases to measure the direct impact of the Modern Data Stack and proceed iteratively, adding or modifying components based on needs and results.-
Iterative Approach: Thoroughly test each new element before proceeding to its large-scale deployment, thus allowing a smooth and controlled evolution of your stack.
The Modern Data Stack website presents many examples of Data Stacks implemented in companies, which can be a source of inspiration if you don't know where to start. The Data landscape map provides a simple but effective starting point to help you get started quickly!
Data landscape 2023. Credit: https://mad.firstmark.com/
Conclusion
The Modern Data Stack represents a major evolution in the way companies manage and use their data, offering more efficient and flexible data management. In a rapidly evolving ecosystem, it's a constant journey of adaptation and evolution, but the potential benefits for a company are huge.
The data world of tomorrow will be modular. We are convinced that you will no longer need to go through traditional CDP actors but can rely on your existing architecture to build your own modular CDP.
In the end, the goal is to turn data into a competitive asset for the company, while simplifying data operations and enabling better collaboration between teams. If you have any further questions regarding MDS and especially data activation, do not hesitate to contact us!