Chapter 4. Building a Data-Driven Organization

The previous chapters define what a data-driven organization is, and they discuss the importance of building one to remain competitive. Now, we’ll spend some time talking about the practical steps needed to build a data-driven organization.

In our opinion, a data-driven organization should possess three things:

A culture in which everyone buys into the idea of using data to make business decisions
An organizational structure that supports a data-driven culture
Technology that supports a data-driven culture and makes data self-service

We cover the first two points in this chapter and discuss the roles and responsibilities of the employees who form the vital cogs in the engine that drives the data-driven organization—from data producers, to data scientists, to engineers, to analysts, to business users.

The next chapter devotes itself to the technology needed to support a data-driven culture.

Creating a Self-Service Culture

The most important—and arguably the most difficult—aspect of transitioning to a data-driven organization that practices DataOps is the cultural shift required to move to a data mindset. This shift entails identifying and building a cultural framework that enables all the people involved in a data initiative—from the producers of the data, to the people who build the models, to the people who analyze it, to the employees who use it in their jobs—to collaborate on making data the heart of organizational decision-making. Though the technology that makes this collaboration and data access easy is very important, it is just one of the considerations. A key focus area in this transition are the employees and the organization. After you achieve a true self-service, data-driven culture, as discussed in Chapter 1, you should experience a significant competitive boost to your business.

Fostering a Culture of Data-Driven Decision-Making

To succeed at becoming a data-driven organization, your employees should always use data to start, continue, or conclude every single business decision, no matter how major or minor.

What drives companies to have a successful data-driven culture? It’s important to understand that it’s not necessarily about the data itself. That’s secondary. The technology itself comes in third. Data-driven decision-making is first and foremost about the organization.

Regardless of whether you have acknowledged it, your business already has a culture of decision-making. That culture might not be geared toward a data-driven approach. All too many companies subscribe to the “HIPPO” (highest-paid person in the office) method of decision-making, whereby the senior person in the meeting gets to make the final choice. Needless to say, this HIPPO can be wrong. But unless you have the data as well as the permission coming from the very top of the organization to argue back, that decision stands.

And herein lies the key: to succeed at becoming a data-driven organization, your employees should always use data to start, continue, or conclude every single business decision, no matter how major or minor. This kind of inquisitive culture should drive everyone on the data team—including IT, data engineers, data scientists, and data analysts—to continually enhance and refine the tools that business users need to inform their decisions. Because data is accessed and used a lot in this type of environment, the organization should encourage and deploy people, processes, and technologies that minimize barriers to this access.

You know that you have successfully shifted to a data-driven culture when data-driven initiatives begin coming from the bottom of the organization rather than the top. It is common in the beginning to find a top-level sponsor in an organization to bless a data project or a change that incorporates the use of data in certain functions. For example, a common function in enterprises that is rapidly shifting to being data driven is marketing. A CMO of a company can set that tone by making it mandatory that new creatives and campaigns be experimented with and tested to gather data on their effectiveness as opposed to just relying on gut-feeling and intuition. That message gives primacy to data, and that sentiment then flows to the rest of the marketing organization. The initial data project also helps as a proving ground of the importance of data to the executives and helps them to see the immediate benefits of using data. In addition, a successful initial data project becomes a role model for how you can use data in other projects.

Although such top-down measures or projects are necessary to initiate the change, eventually, a company truly becomes data driven when there is a bottom-up demand for self-service data access. For that to happen, it is necessary that the tools and mechanisms are there to support this bottom-up interest among employees. For example, after the self-service tools and processes are in place, employees in the marketing department might actually use the data collected from previous campaigns to come up with a hypothesis of what a new campaign or message should look like. In essence, data begins to become part of the muscle memory of the entire department.

A key driver to enable a data culture is to make it easy for the data team to capture all of the data in the organization. An enterprise has a plethora of data sources, both internal and external. These can range from different business applications, product applications, public and private customer interaction points, monitoring systems, third-party data providers, and many others. These systems are set up for operational reasons, with collecting data for analytics being an afterthought. As a result, the natural tendency is to not capture any of this data, far less consolidating it in one place. The valuable data from all of these sources, therefore, continues to remain in its silo. In the process, the organization loses many opportunities of deriving insights or optimizations by putting data from different sources together.

The first step toward overcoming this challenge is to take an inventory of all your data sources and create a common data-capture infrastructure that is standard across the company and that lays out the correct way to capture and log the data. Everyone should use those standards. The next step is to consolidate all of the data so that all consumers of data in the organization know where to go to find it. This is what we did at Facebook, and, indeed, all of Facebook’s massive data stores are still in a centralized location.

Creating a consolidated data repository helps everyone to collaborate around data. You will have data analysts who analyze the data and feed the results back into the business. They ask questions like, “How do I use data to improve my products?” and further, “How do I get data back from my customers to change the features of the product, perhaps personalize them?” With standardization of capture and consolidation of data, it becomes very easy for data engineers to write self-service applications that support that feedback loop. The business users then use those analyses to make strategic decisions.

The executive team also will benefit from standardized and centralized data. Top people in the company typically don’t have enough time, or the skills, to analyze data themselves. Yet they need data to inform their decisions, perhaps more than anyone else in the company. So, it makes sense to build business dashboards so that the executive team can access the data in a timely and self-service manner. For example, for any given company, there will be business metrics that reflect important trends and statistics in the business: sales, prices, customer churn, and so on. Keeping these statistics up-to-the-minute and accessible to senior executives can benefit your business enormously.

It’s important to understand that different stakeholders will buy into using data for different reasons. You must first identify who all the stakeholders are. Then, you must understand what will motivate them to begin using data to make decisions.

Tips on Building a Data-Driven Culture

Following are five tips on how to build a data-driven culture.

1. Hire data visionaries

You need people who see the “big picture” and understand all the ways that employees can use data to improve the business. Although this certainly includes analyzing marketing, sales, and customer data, it doesn’t end there. Data-driven decisions can help with internal operations, such as making customer service and support more efficient, and cutting costs from inventory, for example. And it all begins by hiring people who are open minded about what the data will tell them regarding the way forward—people who have a vision.

2. Organize your data into a single data store accessible to everyone

All of the data in the universe won’t help if that data is inaccessible to the people who need it to make business decisions. A data-driven company consolidates its data while keeping it continuously up to date so that employees have access to the most accurate information at any given point in time. This means eliminating data silos and effectively democratizing data access. There are, of course, always data security and compliance issues, which we discuss in Chapter 6 But making data available to everyone is an important feature of a self-service data culture. Always allow employees to see the data that affects their work. They need to see this not only at a granular level, but also in a holistic way that helps them to understand the bigger picture. Doing this will make your employees more informed, skilled, and enthusiastic about using data to improve the business.

3. Empower all employees

All employees should feel comfortable taking initiative when it comes to suggesting ways that data can be used. This kind of mentality goes well beyond just using data, of course. If you build a company where all employees feel free to give opinions—as long as they are backed up by data—even if those opinions contradict senior executives’ assumptions, you are building an organization where the best ideas will naturally gravitate to the top and keep you competitive in even the fastest-moving markets.

4. Invest in the right self-service data tools

Your data, even if readily accessible, won’t help your business much if most of your employees can’t understand it or don’t apply it to business problems. You can solve this problem by investing in the right data tools, which we discuss in more detail in Chapter 5. You should pick tools based on your goals, but as a starting point, your tools should make it easy for your employees to access, share, and analyze data. You might want tools that can be directly embedded into the business tools you already use; for example, Excel and Tableau. And make sure to invest in training for these tools. Having an “intuitive interface” isn’t enough. Do your employees understand basic principles of data analysis, transformation, statistics, and visualization? To achieve return on investment on your tools, your employees must understand exactly what capabilities each tool offers. Training can be live, video-based, or online, and should use a shared data store so that employees can compare their data discoveries and explorations with one another.

5. Hold employees accountable

Technology will take you only so far. You also need to put incentives in place to encourage employees to use the technology and tools. You also should have a way to measure and grade progress toward a self-service data culture. This means holding employees accountable for their actions and progress when they effectively use data to drive business decisions. Only when you reward employees for actions based on data will you achieve true cultural transformation.

The collaborative, social dimension of a self-service, data-driven culture is also not to be underestimated. Without it, you will fail, and your investments in software, data processing tools, and platforms will be wasted. Yet, although many organizations pay lip service to this notion of collaboration and openness, few follow through with the appropriate actions. Keep in mind that data doesn’t belong to IT, data scientists, or analysts. It belongs to everyone in the business. So, your tools need to allow all employees to create their own analyses and visualizations and share their discoveries with their colleagues.

Potential Roadblocks to Becoming a Self-Service Data Culture

The most common roadblock to becoming a self-service data culture is that you’ll face resistance from the team of people who have traditionally been the conduit between the data and the users. They might say, “We can’t let everyone access the data. There are security issues and compliance issues.” Although these are valid issues to raise, you can solve them by means of technology. Today, technologies that tie user identity to access control policies as well as technologies that capture audit logs can easily address such objections. Don’t let these issues become crutches that prevent you from transforming into a true self-service culture.

Another potential challenge—we saw this at Facebook—is that as you open up your data stores to everyone, you might find that you don’t have the infrastructure to support such broad-based access. There are either limits of scale or it becomes extremely expensive to process all the queries coming in. You need to address this issue by using infrastructure that can scale in a cost-effective manner. (More on this in Chapter 5.)

But, keep in mind that most of the roadblocks will be put up by the traditional centralized data team being hesitant to give up control over other users. Because this is the most problematic challenge, companies need to focus on this team, perhaps reorganizing or retooling it. Remove any bottlenecks and make it possible for this centralized team to become the heroes in the self-service culture as opposed to the obstructionists. So, it’s really a psychological as well as an organizational challenge.

Creating a data-driven culture is not always easy, but the benefits it provides are real and significant. Big data is transforming the ways that organizations conduct business, so it should come as little surprise that it has a role to play in changing your culture, as well.

Organizational Structure That Supports a Self-Service Culture

Organizationally, how do you support a data-driven company? In most successful data-driven organizations there is a central data team that publishes data and manages the infrastructure used to publish that data. In others, there might be multiple data teams embedded in different departments, each catering to the needs of that department. Ironically, the latter model is typically less successful in creating a data-driven culture, even though data teams are there in each department. The reason is simple: such an organization creates low connectivity between the different departments and ends up creating data silos. A strong, functional, central data team is therefore extremely important in creating connectivity between the different departments of an organization. They usually publish the most important datasets, making sure that there is a single source of truth that underpins the analyses.

Consuming these datasets are the analysts that are typically embedded in the different departments of the organization, helping those departments to ask questions from the datasets. Think of this as a hub-and-spoke structure, as depicted in Figure 4-1. The embedded data analysts have the domain knowledge about the business function and also understand the datasets that can help them answer those questions. They have the ability to convert the language of systems to the language of the business.

Organizational hub-and-spoke model (source: Qubole)

This skill is critical because the two languages are very different. The business wants to ask questions such as the following:

Which geographic regions of my business are the best to invest in?
What is the size of the market?
Who is the competition?
What are the best opportunities today?

You then need analysts who can take those business questions and convert them into a series of questions to ask the data. Thus, data analysts would translate these questions into SQL or other commands to pull the relevant data from the data stores.

At Facebook, we had a centralized data team. Then, we had analysts embedded in every product team. We also took care that all the analysts had a central forum at which they could meet and communicate what they were doing, allowing data intelligence to flow through the entire organization. Essentially, this model transmitted the data-driven DNA of the self-service organizational culture throughout the company.

How the Hub-and-Spoke Model Works

The people embedded in the business units can be either data analysts or data scientists, depending on how sophisticated the business unit’s requirements are. If deep learning or machine learning is required at the business-unit level, you need data scientists in that role; if business users requires reporting and answers to business questions, data analysts are more appropriate to embed.

Domain knowledge of individual business functions is essential for the data analysts and data scientists embedded in the functional teams. Analysts need to be outstanding at understanding a domain and converting it to technical questions to be asked of the data. They become the bridge between the data and the line-of-business users. Data analysts, as a result, get a lot of exposure to the functional area on one side, and to the metadata about the datasets on the other. They understand how those are interlinked, a crucial skill to be able to use the data effectively to answer business questions.

For example, at Facebook, we had data analysts embedded in the product teams: within Growth, Ads, and so on. Those data analysts quickly got up to speed on the specific issues facing the teams in which they were embedded.

The coupling between the analysts themselves and the central data team is quite important. There is a natural dependency of the analysts on the data team because the latter is the primary publisher of curated datasets. However, equally important is the interaction between the analysts in the different functions. Such interactions help them in pushing for collaboration and data sharing as opposed to creating data silos. A typical way of achieving this interaction is by creating a forum within the central data team that allows all data professionals, from data scientists to data engineers to data analysts, to discuss their data usage and what they are trying to do.

Finally, these professionals are supported by the central data team. The latter are eventually responsible for maintaining the infrastructure and providing the access to datasets needed by the data users. If any link in this “value chain” of data is weak, friction arises in the path to attainment of a true data-driven culture. Chief data officers are usually the folks with the ultimate responsibility of nurturing and growing this value chain of data.

Training Is Essential for Data Analysts

Because they need to have equally deep knowledge of the data, the data tools, and the domain of the function in which they are embedded, data analysts require extensive training. The training can be formal classroom training and also involve shadowing of more experienced analysts. A key aspect of the job is learning the Key Performance Indicators (KPIs) for the particular domain. How does the marketing department, for example, measure success? Or the key goals of the operations teams and how they measure success? Understanding the goals, strategies, and tactics used to achieve those goals and finally the measurement of those achievements are essential for the data analysts to understand in order to be effective in translating the business questions into data queries.

Roles and Responsibilities

Now that you understand what a self-service culture is and how to organize your data professionals into a hub-and-spoke model, it’s time to examine more closely how the specific roles and responsibilities of data professionals are organized.

Naturally, you want the best people on the job. And every business, of course, has different needs and goals, meaning each data team that is assembled will be composed of different types of people with different capabilities. However, some things are common to all data teams no matter what business they support. Carefully specifying the roles and responsibilities of each member of the team will avoid conflict and inefficiency.

In this section, we outline the best way to assemble a data team that can meet the challenges you face in the big data world.

Figure 4-2 shows the major personas that are a part of the data team in a data-centric business.

Core personas of the data team (source: Qubole)

What’s a persona?

The persona is a profile of a job role. It is not a person or a job title. In fact, one employee might represent multiple personas (in a small company). Alternatively, one persona might be split across multiple people in a larger enterprise. Understanding personas is important because it allows you to comprehend their pain points, what they care about, and their ultimate responsibilities.

If you understand personas, you can pinpoint which messages are likely to be the most relevant to the person you are talking to as you try to effect a transformation into a data-driven enterprise. It’s very important to understand that this is not a hierarchy, but a collaborative team that works together in the hub-and-spoke model described in the previous section. Here are the personas you’re likely to come across:

Data analysts: These data professionals are typically embedded in line-of-business or functional groups. Their job is to transform business questions into queries on the available data sets. They are data users, but not programmers (as opposed to data scientists—read on). These are the people who are most familiar with how the business is run, what the strategic objectives are, and how data fits in. They have a deep understanding of the practices that will help achieve executives’ goals and can effectively utilize frontend tools. They might not fully understand the inner workings of a data analytics algorithm, but they know how to apply those functions and algorithms to the questions they are trying to answer. Data analysts are able to use the data and communicate with the senior management the findings from their research.
Data engineers: These data professionals are responsible for getting data into the platform so that the data scientists and analysts can get to it. They are responsible for providing mechanisms to capture data from different sources in the organization including end products. In addition, they are typically responsible for publishing core datasets in the organizations after cleaning and transforming any captured data. In short, data engineers make sure that the data is available, curated, and cleansed. They’re generally a part of IT, reporting to the CIO or VP of engineering. Because the data engineers are so intrinsically linked to data ingestion and processing, it’s vital that they keep open lines of communication with data scientists and analysts in order to fully enable analytics further downstream. They might also be dependent on application developers in order to put instrumentation in the products for capturing data.
Data scientists: These are the members of the data team involved in statistics, machine learning, and deep learning. They rely on advanced data mining tools and are the ones responsible for putting together predictive and prescriptive analytics. Tools such as R, Python, and Scala are their favorite environments, and they possess deep understanding of deep learning and machine learning toolkits. In some industries, they might even be referred to as quants because they tend to have a strong mathematics background.
Chief data officer: The person who oversees operations of the entire data team and who typically reports directly to the CEO or the CTO.
Compliance and security teams: These professionals ensure that compliance mandates like HIPAA are met through periodic audits.
Data platform administrators: These data professionals manage the data infrastructure. They are responsible for production DBMSs, data warehouses, and big data infrastructure. Administrators manage the infrastructure so that it is functioning well, has enough capacity, gives adequate quality of service to the different teams using the infrastructure, and so on. They also are the people who control access rights to data. They establish the implementation of access control policies and security policies of an organization. They are also ultimately tasked with providing the infrastructure and tools in the most economical manner.
Line-of-business users: The ultimate users of the data to make decisions. They are provided with reports, ad hoc analysis tools, and so on. These are the people that take analysis and act on it; for example, in the case of marketing, campaign managers might look at the ROIs of certain campaigns and decide to move their budgets accordingly.

Of course, companies that sell data as products are different. When a business’s products—what it makes or sells—are primarily made up of data, the personas will be much more product-centric. IT is less important, and product managers are much more important in such environments. Everyone in the organization is much more technical, so data knowledge is typically distributed through the lines of business.

A Central Forum for Coming Together

Despite the distributed nature of the data users in the organization, it’s important to understand that there is no hierarchy. Instead, the organization provides a mechanism for coming together and sharing, collaborating, and learning from one another. All these roles have their place in strategic data initiatives. For example, Facebook announced a growth initiative, and the data scientists developed the models through which data analysts, and, ultimately, business users could complete analyses that would help the company grow, whether by campaigns to acquire new users, new product development, or other means.

This central forum enabled the Facebook data users to speak a common language and provided common definitions of data, such as what constituted an address or phone number. The team also could engage in cross-functional analyses, in which an analysis developed for and embedded in the Ads team could use data analysis results created by the Groups team.

The hub-and-spoke structure facilitates these conversations and disperses the knowledge throughout the organization. This is much more effective than a traditional command-and-control model.

Summary

In this chapter, we discussed ways to transform your business into a data-driven organization with a self-service culture. We discussed the different organization roles and responsibilities, and presented the very important hub-and-spoke organizational structure needed to support such a transformation. By learning and implementing these concepts, your business will be on its way to becoming truly data driven.

Get Creating a Data-Driven Enterprise with DataOps now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Creating a Data-Driven Enterprise with DataOps by Ashish Thusoo, Joydeep Sen Sarma