Chapter 1. The Emerging Data Challenge and Opportunity

In 2006, Clive Humby declared, “Data is the new oil,”¹ and the power of that quote caught the attention of many business leaders worldwide. Humby was a co-founder of the leading data analytics firm dunnhumby, who in 1994 delivered a revolutionary customer loyalty scheme for a leading British supermarket, Tesco.² The traditional retailer flourished due to the use of data analytics through customers receiving personalized coupons to spend in stores. dunnhumby had shown the power of data, so when Humby spoke those immortal words, you’d have been wise to listen, and the same is still true today.

The pressure for you to create value from your data didn’t just come from Humby. Conference talks, business publication covers, or management literature may have brought other trends to your attention. Big data, the Internet of Things (IoT), machine learning (ML), and artificial intelligence (AI) are all terms that might keep you up at night. The development of terminology about data solutions seems to be increasing at a pace that is matched only by the speed of the technological development powering it.

If you have reached for this book because you feel behind the curve, you match the majority of stakeholders we talk to every day. Everyone seems to fear that their competitors are ahead of them in terms of collecting and utilizing their data. If you are already utilizing data, you may fear your competitors are using data in a more advanced way that could lure your data gurus away. After all, every tech geek wants to work on the latest challenges using the most advanced technology.

There are many valid reasons why you may be feeling the pressure to up your organization’s data capabilities, but we will let you in on a secret…don’t believe everything you hear. Not every competitor has gained full control of their data and empowered each individual in their organization to use this capability. Some might be further ahead in their ability to refine their data resource, but their journey is likely far from complete.

A study by Accenture showed how far organizations have to go in their quest to make the most of their data.³ Just 21% of the global workforce is confident in their data literacy skills. Research by Forrester found that between 60 and 73% of enterprise data has never been analyzed.⁴ This is not likely to be the same picture you had in your head when reaching for this book, as you are seeing data being increasingly used, but that doesn’t mean there aren’t further opportunities to harness data. This book will show why data is an asset that you can use to support each decision in your organization and how validating your decisions with data will prevent choosing directions based on incorrect assumptions. The value of being right more of the time is enormous.

This book will introduce you to the key terminology, concepts, and challenges you will encounter when developing a data culture in your organization. To lead the variety of changes required, you need to be able to ask the right questions of many individuals. This chapter will introduce you to the challenges of working with data, why those challenges exist, and how to navigate data projects successfully.

Rapidly Evolving Challenges

Managing change is a given in a modern organization. You are unlikely to have reached your position without being able to deliver projects in a dynamic environment. Developing your organization’s data capabilities will also require change management, of both people and processes. Since Humby proclaimed the value of data in 2006, the speed of technical development in data processing, analysis, and regulation has meant change isn’t delivered just once but is a flood of constantly evolving capabilities.

When working with data, you will be asked to consider a number of different parts of the end-to-end processes involving data. You will need to think about how to best source, store, clean, analyze, and communicate data. Once you have your solutions in place for any of these challenges within your organization’s process, you will quickly find faster and more user-friendly solutions from the new products that continually emerge in the market. As an example, databases used to be the best technology and conceptual approach to hold your organization’s data, until the volume of data flowing into organizations each day grew to the point where people found it necessary to create data lakes. If you had invested in a database, suddenly you were under pressure to adopt a data lake to address changing needs. Recently, data lakehouses have emerged as an option to handle the next evolution of challenges posed by working with data in modern organizations. We’ll discuss the growth-of-data challenge in more detail in the next section.

Data has become an asset that can create revenue and opportunities to develop your company. As with any possibility for improvement, there are a number of challenges to solve to gain the benefits on offer. Opportunities arising from data are no different in this respect. Products and propositions powered by data can be created from data to allow organizations to develop new revenue streams or deeper relationships with their customers. A data product uses data to form the basis of an application or item that wouldn’t be possible without its use. A data proposition doesn’t create a tangible product, but allows for a new or enhanced service.

Let’s go into more detail about the challenges you are likely to face when pursuing the opportunities created by data. You will need to keep in mind that even once developed, the products and services themselves present their own set of challenges regarding keeping up-to-date and maintaining the ability to meet the needs of users.

The Growth in Data Volume and Velocity

The last few decades have seen dramatic changes for those working with data, and it would be foolish to expect any slowdown in this level of change, especially in technology. Trying to keep up with the overall market is virtually impossible, and this is one reason why most data specialists focus on just a handful of tools. The data specialists responsible for working through the end-to-end lifecycle of data in an organization will likely have one tool to support the extraction of data from its source, one tool for storage, and one tool for analysis of the stored data. The need for these advancements is like any technological development; as soon as a solution is created, that solution is used to develop the field you’re operating in even further. Most technical developments create new challenges to solve as yesterday’s solutions become today’s problems, and each new tool rarely delivers a comprehensive result, meaning that ongoing development is required.

Big data is a term that is commonly used when referring to many of the challenges posed by working with data in the 21st century. The definition of big data is not easy to tie down, and it’s unlikely that most people agree on it. Three common concepts cited in most definitions stem from Gartner analyst Doug Laney’s “3 Vs” in his definition of big data:⁵

Volume: amount of data
Velocity: speed of data creation/transfer
Variety: different data available

In 2013, another Gartner analyst, Svetlana Sicular, highlighted that the 3 Vs wasn’t the only part of Laney’s definition of big data. Sicular cites the importance of the phrase following the 3 Vs that describes the technical solutions required to deal with the challenges. The entire sentence she cites reads, “‘Big data’ is high-volume, -velocity and -variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.”⁶

Other challenges can be characterized by:⁷

Veracity: level of trust in the data
Variability: uses and formats of the data
Value: business value of the data

For the common terms, each of these characteristics has increased dramatically over the last two decades with the growth of propositions leveraging the internet and digital connectivity. The volume of data produced globally has grown from 2 zettabytes in 2010 to 26 zettabytes in 2017, and is estimated to have grown to 120 zettabytes by 2023.⁸ Your organization will not access all of this data, or even close to it, but the growth pattern is indicative of the volume increase in your own organization. Software that is used to process or analyze data is often referred to as tools. Whilst data tools have existed for decades, they have all had to develop the ability to process larger data sets or become redundant. Data tools refers to the software used to form, analyze, and communicate data. Their technical development has enabled organizations to do a lot more with data than ever before.

There is never as much time to work with the data as you might want. As data sets have grown, the time available in the working day has stayed the same. Being able to handle the greater velocity needed to absorb, process, and analyze the data has required improved technology and infrastructure. Whereas a bank would once simply collect data on each transaction I made as a customer, my bank now captures data each time I log in to their website or mobile application, capturing the time of the login and what services I used during that interaction. Increasing the number of data points captured about each transaction and interaction means there is more data to process than before, but still within the same amount of time. The data tools we work with require greater speed in collecting and storing this data.

As you will learn in the next chapter, data comes in many formats, and the growth of volume and velocity results in an increasing variety of data forms. Being flexible in what data sets work with your software has become a key requirement.

We’ll continue to reference the 3 V challenges posed by big data, as they’ve become a constant part of the environment all organizations work in. These challenges require rapid technological development to facilitate and respond to the evolving environment. Data specialists have had to acquire new skills and learn new techniques to maintain their specialties.

Data science has become a popular term that encompasses many of these new specialties, such as machine learning and artificial intelligence. In the 2010s, just managing and understanding the data assets built up by organizations was the focus of most data specialists’ work. Over the last few years, as this understanding of data assets has developed, specialists have emerged to project forward the information and insights discovered. The development of new tools and technologies has allowed specialists to perform and optimize these predictive and optimization tasks.

New Regulations and Government Mandates

Change hasn’t been driven only by technological developments. As data has become an increasing part of more people’s roles and decision-making processes, regulation has increased to protect individuals from potential negative consequences of data analysis.

Working with data now requires understanding the concept of Personally Identifiable Information (PII). Government identification codes, postal and email addresses, and phone numbers are just a few examples of data points that organizations must handle very carefully. Just having access to PII isn’t the issue—the data that is held alongside it is the important factor. Political affiliation, sexuality, or health information are all sensitive data points that should be tied to individuals only when absolutely necessary.

Regulations like the General Data Protection Regulation (GDPR) in Europe have clarified definitions of PII. Since its introduction in 2016, GDPR has caused many organizations to alter how they hold their customers’ and users’ data. GDPR has seven principles⁹ that set a strong standard for how you should store any data within a data project:

Lawfulness, fairness, and transparency: The data adheres to all legislation, does not cause an undue detrimental harm to the subject of the data, and they understand why it has been collected.
Purpose limitation: Data is only collected for a specific purpose and that is clear to the subject of the data.
Data minimization: The minimal amount of detail is kept to meet the purpose of the data collection.
Accuracy: The data collected must remain accurate and be deleted once it no longer is.
Storage limitation: A finite time should be set as to how long the data should be held for.
Integrity and confidentiality: Security measures should be in place to protect the data held.
Accountability: You must have clear processes in place to demonstrate your compliance.

The intent of these principles is to ensure that data held about any individual is only for a set purpose, is held for a set time, and remains accurate. An example of why these rules are important is the Cambridge Analytica scandal that came to the public’s awareness in 2018. The scandal centered on how a data analysis company used data acquired from Facebook to create links between people and their political preferences. Eighty-seven million people, who just happened to be friends of 270,000 users of a third-party application on Facebook, had their data collected for this purpose, even though they had never agreed to this analysis or use of their data.¹⁰

Regulations like GDPR also focus on how securely the data is held in the integrity and confidentiality principle. Data breaches and hacks are common news items and can cause issues for millions of people at a time. Passwords and other personal details are frequent, valuable targets for hackers looking to impersonate others or commit crimes like fraud. The security of your data sources is critical. As the volume of data collected has grown with global services like music streaming, activity tracking applications, and the continued use of email, the value of such data security services increases.

In many industries, regulation about data is still forming and evolving and will be a constant source of changes that you will need to balance when developing data solutions.

Improving Decision-Making by Democratizing Data

For decades, organizations have based their decisions on the experience and expertise of their employees. With the fast pace of technological innovation and increasing scale and scope of most roles in organizations, it’s becoming increasingly difficult to just rely on your experience to make the right decisions. Data is not just being requested by your organization’s team in the form of financial accounts, but by everyone covering all aspects of the organization, covering customers, operations, and employees.

Banks used to use branch managers to decide who qualified for a loan and who didn’t. The bank manager used their own local knowledge, relationship with the individual, and knowledge of the individual’s financial attitudes to decide whether they were a safe person to loan money to. This model clearly has a lot of bias, so it was far from ideal to being with, but other issues were also present. What happened when the branch manager left their role? What if a customer was new to the area? The evaluation of loan applications provides one example of the ways in which data available has significantly impacted organizational practices and outcomes.

Companies like Experian use data to form credit ratings that allow financial service providers to assess a person’s creditworthiness. A quick look at someone’s credit score is often enough to approve or deny a loan request. This is great if you have a strong credit score and a stable financial history, but this isn’t the case for everyone. New financial providers have appeared that are focused on those individuals who don’t have a good enough credit score and will look at other factors instead to determine if a loan can still be offered. That’s right—these providers developed a data solution for a problem that was caused by another data solution, using wider sources of information.

Every organization isn’t evaluating loan applications, but each person in your organization is likely making decisions every day that affect your customers, partners, and bottom line. If you have experienced staff, they are likely to make decisions based on their long tenure, and that can work out fine. However, many decisions are made based on new and developing situations or by someone who is still gaining experience. This is where data can add a more rounded perspective on a situation. If you are a senior manager in your organization, you are likely to have the data you need because you know who to ask, and that they will get it for you. However, many junior members and even middle managers in the organizations that I work with struggle to get the information they need to make data-informed decisions.

Democratizing data means getting the right data into the hands of everyone in your organization. This is a key focus of this book and has been a focus throughout our careers. By building data products and propositions that can be used by different groups at different levels of your organization, you will be empowering them to make better decisions.

When analyzing data, it is important to truly understand the core concern being addressed by the analysis. The best people to understand the nature of the key question to be answered are the subject matter experts in the business. Even this is not an easy task to compile, as each individual will likely have different questions they are trying to answer. This means a lot of different data sets, held at different levels of aggregation (a concept we’ll come back to in Chapter 2), and different tools may be needed to answer each question. It might seem daunting to provide clarity, focus, and direction amidst all the data and competing priorities, but this book will help you understand how to approach this task and how to be successful in democratizing data.

Developing New Products and Propositions

Increasingly, data sets aren’t just used for informing decision-making but have become products in their own right. New products and propositions have become possible as the value of the information in the data sets has been found and refined.

Take Beeline, for example. Beeline is a cycling navigation product that simplifies navigation on bikes in urban areas, pointing you towards your destination rather than specifying each left and right turn you need to take. Beeline’s initial product was a physical device that fit to your bike’s handlebars, with a small screen to act as a compass directing you to your end destination and any waypoints that you specify in the application before you set off. The linked phone application stores your speed, route, and many other data points. This provides the user with a comprehensive view of their trip.

Beeline has monetized this same data by aggregating it across all users and anonymizing it to provide additional data to a diverse set of organizations. Transport planners, local councils, and even retailers can use the data to understand when and where people cycle and how smooth a journey they have. This can inform where investment in dedicated cycling infrastructure is needed, or where to locate new stores to serve those who are cycling to the area. I (Carl) know I appreciate the choice of cafes at the end of my cycling commute to refuel before I start my working day.

As discussed earlier in “New Regulations and Government Mandates”, GDPR forces the possessor of the data to ensure it is held only when there is a clear purpose specified to the end user. This makes forming data products and propositions from the data harder, as they almost need to be scoped out before the data set gets collected. But as long as the subjects of the data are clear on what you are doing with it, data propositions do become possible to create. Maybe Humby was right after all when he said that data is as valuable as oil? Just like oil, the data needs to be refined (read, cleaned, and prepared), but many data sets have multiple ways in which they can be valuable to different organizations.

Building a Common Understanding

With so many possible approaches for so many people in your organization, it can be difficult to get the right data, at the right time, and in the right way for them all. In many organizations, data is provided by central teams who maintain data sources. With the increasing demand for data to inform decisions, these teams are often stretched thin, with limited time to understand how to best meet your requirements.

Depending on the size of the organization, the central data teams can be very removed in terms of subject matter knowledge and physical location from the requestor. To collect the requirements and then deliver upon them, most central teams have traditionally worked in the form of waterfall-style projects, especially for larger requests involving creating data sources. Waterfall-style projects involve stage gates, a term that refers to the process of signing off on progress in stages as the project develops. Whilst the waterfall model once had its benefits for many IT projects, it has become less relevant to data projects that require significant iteration. Waterfall projects need the questions to be answered by the data set to be established at the start of a project. However, when working with data, as soon as you find an insight, you will discover that different questions arise that then need to be answered. It is difficult to predict what these questions might pertain to, and it’s therefore often not possible to include them in those initial requirements. This means that waterfall-style projects often deliver outcomes that quickly become redundant as requirements develop. This can lead to new projects being raised, or stakeholders may stop using data to guide their decisions.

Traditionally, special teams were also set up to provide reports and insights to the subject matter experts (SMEs) in the business to save the data experts in IT teams for more architectural projects. This organizational structure was created because data tools were complex to learn to use and required a skill set that many people didn’t have; thus, all business users couldn’t be trained to utilize the tools.

What the requestor needs is the ability to ask and answer questions rapidly and iterate quickly themselves. For example, if you are looking at the causes of customer service complaints in your organization, you would want a report counting the root causes for each complaint. You might want a report that breaks these numbers down by product, over time and by location. But as soon as you find the causes of the complaints, you will probably have many more questions that are hard to predict without the answer to the initial question. As you gain more experience working with data and as your organization’s data teams work more closely with business SMEs, you will begin to ask better questions.

In my experience, few of the people requesting data insights are comfortable asking for what they really want, for a number of reasons:

They don’t want to look stupid if what they are asking for doesn’t exist.
They haven’t seen what the art of the possible is with the data tools, so they don’t know what can be done with data.
They may ask for only a small part of their overall needs because they assume it takes too much effort to achieve the whole request.
They may not have the resources available to complete the full request, so they ask only for part of their needs.

Challenging Data Issues

As soon as you can navigate the challenges that come with working with data in modern organizations, a larger one looms—people. As with most changes you look to make in your organization, shifting people’s attitudes and behaviors is likely to be the hardest part. Change related to data has an additional barrier, as there is a common perception that anything involving data is difficult, technical, and specialized.

I’ve spent the majority of the last decade battling this perception of data. There is no one single reason why data is seen this way, but the lack of presence of data in primary school and college education is a big factor. Filling the knowledge gaps left by academia over the previous few decades is one of the largest challenges currently faced by organizations.

Imagine that your team is unable to read and understand the words used in emails and other written messages. How much impact would this have? Emails wouldn’t be understood, reports would be left unread, and learning would be impeded by the inability to pick up a book to learn from. Why would we expect data to be any different? This demonstrates the importance of being able to understand and communicate with data.

Simply working with data isn’t enough in the modern organization. Cultivating a culture that supports challenging common conceptions and norms regarding data is important to making any progress towards empowering people with data. From the executive to the front line, having an expectation that data be present in decision-making is just as important as using any other experience when establishing courses of action. Ensuring there are positive attitudes towards using data when making decisions is vital in creating a strong, progressive culture, as what has always been used in the past—in other words, experience—needs to be tested against empirical evidence.

Even where data is pervasive in decision-making, care should be taken as to how fragmented the data ecosystem is within your organization. The data ecosystem is how all of the data sources, analysis, and products work together. Not all sources of data are easy to use across different tools. Data specialists often prefer certain tools, and this can create some tribalism within your organization, preventing the collaboration you desire.

Data Fluency

Being fluent means you have the ability to express yourself with ease. Data fluency refers to the ability to express yourself with data. In a modern organization, what does this look like?

Reading data involves many different skills, depending on how refined the data is. If the data has come directly from the source that created it, reading it often involves processing it before using a tool to look at the data more closely. Processing data involves cleaning, merging, and restructuring it to prepare it for analysis. If the data comes from a more refined source, it’s likely to be easier to read and to have a clear structure or a graphical output. The second chapter of this book will cover the types of data files and how you might work with them.

Creating graphical outputs from data sources has been written about widely (including by Carl Allchin in Communicating with Data, published by O’Reilly) but is a critical skill for everyone in the modern world. To be able to create effective charts and graphs, you first need to be able to read data and understand what makes a chart effective and what doesn’t. The understanding of data visualization and communication effectiveness is an emerging field, but general best practices are already known. Data fluency is often referred to as understanding these best practices and principles.

Just understanding how to read graphical outputs of data sets is not enough; you need to be able to communicate your findings. This communication is often achieved through creating your own charts and graphs to highlight the insights you’ve uncovered, as visual understanding and pattern recognition are highly efficient interpretive skills for humans.

We’ve used the word understanding a lot in this section of the book and, for us, this is the key part of data literacy. Reading and writing are important, but it is the understanding you take from the data that is most important.

Being able to critically analyze the data outputs used in your organization is vital to getting the value out of the data assets within it. Just as with written literature, going beyond the words on the page is important to see the true meaning, and you will benefit from being skeptical about what data products are shared with you.

Data visualization is a powerful communication tool, but it is difficult to produce without bias. This isn’t to say you shouldn’t trust the data outputs in your organization, but you should try to understand their source, figure out the intent behind their production, and identify anything that could have been left out.¹¹

Should you ignore data because of its potential flaws? In a book called Data Curious, you probably can guess the answer to this question—no! W. Edwards Deming’s quote, “Without data you’re just another person with an opinion,”¹² is one that I (Carl) personally love. In all organizations, it’s difficult to separate personal opinions and politics when making decisions. Having evidence to back up your opinions is vital when challenging the status quo and truly trying to find weaknesses in your organization.

Therefore, creating a data-fluent workforce is important to ensure everyone is able to read, question the information that is shared with them, and express their own points with data.

Data Culture

Creating a data-fluent workforce is not enough to gain the benefits of data. You need to create an organizational culture that looks to actively use data in its decision-making processes. If your organization doesn’t have the inclination to support business cases and new proposition ideas with data, it can be quite a challenge to promote this.

Data cultures are often formed when new leaders come into organizations. These leaders frequently come from organizations where strong data cultures are present, and they expect the same capabilities to be present in their new teams. If you are used to having information to support business cases, it can feel lacking not to have them.

Data-informed decision-making is where data is used to support or challenge business cases and propositions. Informed is the key word, as decisions aren’t led by data, but additional evidence is offered to help make the right decision.

Using data to support decision-making can be pushed further, to where data-driven decision-making occurs. This is where empirical data evidence takes precedence over non-data arguments like someone’s experience. There is a fear frequently found in organizations with weaker data cultures that if the data points to a certain decision, it can’t be overruled. This fear can prevent people from being open to using data at all if they feel like their ideas or experience will be ignored completely. This concept should be dismissed as quickly as possible to prevent resistance to using any data at all.

The creation of a data culture must be fostered by at least a mid-level manager but often needs leadership sponsorship. This is due to the cost involved, of software solutions, additional hires, or the expert support you might have to leverage to help establish data solutions or form the strategy required.

An organization’s leadership can provide crucial momentum towards more data-informed decisions by asking for data to support the decisions they are being asked to make. This is ultimately the main factor in creating a strong data culture. Organizations are designed to respond to the requests of the leadership, so if they request data products, then they are likely to get them. Middle managers can show the value of using data to help guide their decisions to their leaders. This doesn’t mean it will be easy to form what they ask for, especially if their teams aren’t data literate.

The more requests for data that are made, the more the organization will see data as part of the decision-making process. Practice might not always make perfect, but it is definitely a step in the right direction.

Conflicting Preferences

As data becomes more frequently used across your organization, you may start to experience other challenges. There are many data tools to choose from when preparing, storing, or analyzing the data. To create a strong data culture that involves sharing and collaborating data sets to help improve your organization’s decisions, you need to limit the number of similar data tools used. This is for a number of reasons:

User interfaces: Unsurprisingly, each piece of software will have different user controls and processes. A different screen layout can reduce efficiency when working. Any barrier to a user getting what they need as quickly as they expect will likely stop the tool from becoming the go-to option it might be otherwise. When using different interfaces, training has to be offered, which incurs cost.
Technical differences: Not every data tool in the same category, i.e., acquisition, storing, or analysis, is functionally identical. Tools have differences, from the calculations you can produce to what data can be easily connected to. This can lead to duplications of data sources, as each tool may need its own version.
Knowledge sharing: This is a key aspect of working with data but can get lost when different tools create different factions of data workers. Knowledge sharing about how to optimize the use of the tool is very useful, but not sharing the insights found is a concern of even greater significance.
Purchasing: By using multiple tools in your organization, you are less likely to be able to generate economies of scale. Most software purchases are made on the basis that the larger the purchase is in volume, the lower the value per license or credit will be.

None of these issues is an absolute showstopper. Having different tools can help; different tools can make hiring easier, as you have a wider pool of talent to recruit from. However, if you’ve put the time and effort in to raise data literacy levels and build a data culture, you will want to harness as much benefit as possible, so focusing on a few key pieces of software will help considerably.

What Does Data Empowerment Look Like?

Managing change and the process of improving your organization’s data skills can seem to be a long, challenging journey, but it is worth the effort. Microstrategy has reported that organizations using data analytics have “faster, more effective decision making.”¹³ The same report also noted the following benefits:

Improved efficiency and productivity (64%)
Better financial performance (51%)
Identification and creation of new product and service revenue (46%)
Improved customer acquisition and retention (46%)
Improved customer experiences (44%)

All of these attributes are clearly attractive to any organization. Data empowerment is not just about making faster, more effective decisions; it’s about giving people the data access to form the information they need to make those decisions, find revenue opportunities, and improve customer retention. Access to data sets alone will not enable your people to be empowered with data, but giving them access to easy-to-use tools and the knowledge of how to work with data can. Ensuring people see the role that data has in decision-making at all levels will encourage them to challenge the status quo when analysis and insights point to the benefit of doing so.

Let’s get into more detail about what data empowerment looks like for your people and the processes they will work on.

People

The 2020 Microstrategy report also shares that respondents said that 76% of executives have easy access to data, but only 52% of frontline employees do. Clearly, it’s important to support strategic decisions with data. However, if everyone isn’t able to do so, missteps are still likely being made, or opportunities are not being taken. Coupled with inadequate data literacy skills for their roles, there are a lot of potential improvements to be made.

In Communicating with Data, I (Carl) wrote about the growing importance of being able to influence change through the use of data, in the same way that words and numbers have been used to form arguments supporting decisions in organizations since their inception. Data, in the form of charts, graphs, and summary numbers, have become increasingly important parts of forming influential arguments. Therefore, it’s important for all levels of the organization to be able to access data and have the skills to analyze it, or some may become unable to influence as strongly as others.

Data access goes beyond just accessing the data sets created from the systems they are originally captured in and involves having the skills and tools to form analyses. If tools are intuitive and easy to access, your teams at all levels of the organization will be able to use data to supplement their knowledge and expertise. Being data-empowered means not only that issues can be found but that ideas can be supported by using data as evidence for why something should be done. Communicating with data should make the articulation of ideas far clearer and empirically supported, removing politics and influencing skills from decision-making at more levels of the organization.

If your team feels they can challenge processes, validate decisions, and innovate, they will be more engaged with the organization, rather than feeling frustrated or ignored.

Processes

Organizations have used business analysts, somewhat effectively, to find and solve problems for a long time. Empowering all your team with data could create an army of business analysts rather than having to bring in people with those skills.

With increased access to data and skills, data can be utilized to move fluidly through changes required by your organization, find causes of known problems, and propose solutions that wouldn’t have been possible otherwise. Common techniques like Six Sigma and many other forms of process improvement rely on putting data in the hands of the people who understand the processes where issues or inefficiencies exist. The data points are used to identify where customers are asked to take unnecessary steps or where clients have to take the same steps repeatedly due to poor organizational processes. Being able to access the relevant data allows people to measure the impacts of the inefficiencies they find in their work or hear about from customers in their interactions.

When I (Carl) worked for Barclays Bank, we saw a dramatic improvement in reducing the number of complaints and resolving the issues they caused by making data more accessible to key decision-makers and leaders.¹⁴ We used interactive dashboards to allow everyone from the executive team to the frontline operational teams to access daily trends in what issues had arisen and caused complaints. Before these were in place, it took weeks to turn round the analysis that pinpointed where in the organization the complaints were originating from so they could be addressed. This meant that the issues creating complaints continued to happen before finally being addressed weeks later, after even more complaints had arisen.

For any improvements that are made, measuring the impact is just as important as the change itself, to ensure that they are as effective as intended. Although executives make the decision to implement a solution, the actual implementation is conducted by frontline teams. When frontline staff are allowed to see the impacts that changes have, they are more likely to support them or suggest improvements if the changes are not having the desired effect.

Data-Informed Decision-Making

One fear about using data to make decisions has to do with where to draw the line between relying on human experience and thoughts versus just looking for answers from the data available. We use the term data-informed decision-making to highlight the role we believe data should play when making decisions in organizations. Fear can center on data pointing in one direction and being forced to follow that direction unquestionably. This is commonly called data-driven decision-making. Following anything unquestionably is never a good thing, as common conventions will never be a perfect fit forever.

With data solutions becoming more intelligent, there is a temptation to lean in the direction of data-driven decision-making, but the full context and situation are rarely captured by the data available. There is a lot of growth in data science, machine learning, and artificial intelligence, but before these solutions can be fully leveraged, the data sources need to be collated and validated. Most organizations are currently struggling to gather all the data necessary and then check that the data sets used to make decisions are correct.

Even with comprehensive data sources, human thinking is still ahead of computer-based decision-making in many cases. The gap is reducing rapidly, but having human skepticism by questioning the models used to make data-based decisions is still important. Data can be an additional factor in making decisions but shouldn’t be the only one. This merging of data with other factors is what we refer to as data-informed decision-making.

With the clear benefits of informing decisions with data for both people and processes established, your next question is, How do I get started? Depending on the state of your data sources and levels of analysis, there are many routes available to creating a culture of data-informed decision-making.

The rest of this book will look at the following key questions:

What is data, where is it created, and what can you make with it?
How do you acquire, store, curate, and share data products?
How do you build analytical products?
How do you set up a team to deliver this?

¹ Michael Kershner, “Data Isn’t the New Oil—Time Is,” Forbes Magazine Council Post (July 15, 2021), https://oreil.ly/JiJai.

² “About Us,” dunnhumby, https://oreil.ly/bL8YU.

³ “The Human Impact of Data Literacy,” Accenture, 2020, https://oreil.ly/R7qnk.

⁴ “Closing the Data-Value Gap: How to Become Data-Driven and Pivot to the New,” Accenture, 2019, https://oreil.ly/VdmN1.

⁵ Svetlana Sicular, “Gartner’s Big Data Definition Consists of Three Parts, Not to Be Confused with Three ‘V’’s,” The Gartner Blog Network, November 11, 2013, https://oreil.ly/If8EA.

⁶ Sicular, “Gartner’s Big Data Definition Consists of Three Parts.”

⁷ Bridget Botelho and Stephen J. Bigelow, “Definition: Big Data,” TechTarget: Data Management, January 2022, https://oreil.ly/7rPOf.

⁸ Petroc Taylor, “Volume of Data/Information Created, Captured, Copied, and Consumed Worldwide from 2010 to 2020, with Forecasts from 2021 to 2025,” Statista, September 2022, https://oreil.ly/NZ3js.

⁹ “A Guide to the Data Protection Principles,” Information Commissioner’s Office, May 2023, https://oreil.ly/wDvYV.

¹⁰ Kurt Wagner, “Here’s How Facebook Allowed Cambridge Analytica to Get Data for 50 Million Users,” Vox, March 17, 2018, https://oreil.ly/3pYwe.

¹¹ Ben Jones, in How to Avoid Data Pitfalls (O’Reilly, 2019), and Alberto Cairo, in How Charts Lie (O’Reilly, 2019), have written excellent books on avoiding these issues when building data visualizations.

¹² Milo Jones and Philippe Silberzahn, “Without an Opinion, You’re Just Another Person with Data,” Forbes, March 15, 2016, https://oreil.ly/bTKXG.

¹³ “2020 Global State of Enterprise Analytics,” MicroStrategy, https://oreil.ly/rmwQY.

¹⁴ “Putting the Data in the Hands of Stakeholders Using Parameters at Barclays,” Tableau, June 2, 2014, https://oreil.ly/0Nx7p.

Get Data Curious now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.

Start your free trial

Data Curious by Carl Allchin, Sarah Nabelsi