Chapter 4. The Impact of Artificial Intelligence on Databases

Things are moving fast with artificial intelligence, with continuing growth in areas spanning from embedded intelligence to the democratization of AI through platforms such as ChatGPT. This is changing the roles and functionality of databases both from an operational and service-delivery standpoint, resulting in a mutually beneficial relationship. First, AI can be employed to boost database performance, enabling autonomous and near-autonomous operations and delivery of data services. Second, databases serve as the lifeblood of AI and ML, elevating the roles of databases to manage and provide the right data, at the right time—data that is trustworthy and of the highest quality.

AI for Better Database Performance

From a performance perspective, AI and ML promise to deliver significant gains for databases of all types. AI can play a role in discovering, processing, and searching data sets, delivering rapid results. According to Thomas Davenport and Thomas Redman, writing in MIT Sloan Management Review, “Artificial intelligence is quietly improving the management of data, including its quality, accessibility, and security.”1 They continue: “Managing data…is a labor-intensive activity: it involves cleaning, extracting, integrating, cataloging, labeling, and organizing data, and defining and performing the many data-related tasks that often lead to frustration among both data scientists and employees without ‘data’ in their titles.”

The challenge for today’s data managers is to deliver enhanced data capabilities, with strained or relatively stagnant budgets. Organizations are sourcing and ingesting more data than ever before—now within the multiterabyte and gigabyte range—that needs to be available, on demand, to business users, data scientists, and mission-critical applications. AI changes the equation for today’s databases, helping to autonomously enhance database query development and performance, as well as managing the day-to-day operation, provisioning, and security of databases.

Emerging methodologies that promote the use of AI in database management include AIOps, in which AI is applied to streamline and automate data operations, DataOps, the application of intelligent collaboration and automation to data pipelines, and DataSecOps, which involves data security operations on cloud native databases.

Applying AI to database functions will free up data engineers, architects, administrators, and scientists to concentrate on bigger and more meaningful tasks beyond day-to-day maintenance, such as digital transformation and innovation, which are essential to operating in today’s hypercompetitive environment.

Databases as the Lifeblood of AI

Without well-managed databases, there can be no AI—it’s as simple as that. To succeed, AI depends on meaningful and relevant data. Put another way, a quality data set is the foundation of AI; and AI models and algorithms are only as good as the data they receive. Furthermore, organizations depend on databases operating at peak performance to deliver the timely and relevant data needed for training data sets and large language models.

Going forward, enterprises and data managers need to identify the data essential for training models, as well as address a potential lack of data for sustaining these models. Data feeding AI systems must be fresh and relevant—often, in real time—for the business problems at hand. In addition, the data must be of the highest possible quality and trustworthiness.

Data used by ML models is often “raw” or unstructured, and this is where content delivery networks may be required as part of a high-performance data architecture. The data can be a simple time-series data, which is suitable for accumulating and storing in a database. But training using audio or image data often falls outside the capabilities of databases—and this is where a content delivery network, consisting of interconnected servers that cache such assets close to applications or end users, may be more suitable.

Databases employed to support AI initiatives must also be capable of managing a wide range of data types, from structured to unstructured data. Distributed SQL databases containing HTAP capabilities fit this need, delivering real-time analytical data of all types when and where it is needed.

What Generative AI Brings to the Table

Generative AI—delivered with OpenAI’s ChatGPT, Google’s Bard, or Microsoft’s Bing Chat, among others—promises to upend many aspects of the database world. From an operational point of view, generative AI can be used to create code for applications or scripts that enhance database performance and integration. This enables database developers, architects, engineers, and administrators to conduct higher-level tasks and respond more quickly to business demands.

Generative AI also has the potential to assist in database configuration, as well as play an assistant role in designing a high-performance data architecture, drawing on patterns and experiences either stored locally or from across the network.

From a service-delivery standpoint, today’s databases will be tasked with maintaining the data employed within large language models for enterprise-specific instances of generative AI. This data provides recommendations not only to database teams but also across the wider business.

The New Landscape of SQL Development with AI Innovation

The advent of AI means widely expanded capabilities for databases and those working closely with databases. With AI, simple SQL queries can be built automatically through natural language processing prompts, with little or no coding required. Through this process, an AI-driven SQL interface can also provide recommendations for queries based on analysis of the backend database.

Generative AI, for instance, has a lot to offer for ad hoc queries or natural language queries created by nontechnical users. Even for programmers, AI is proving to be very good at generating syntactically correct windowing functions, which are tedious for programmers to create, and is beyond the capabilities of most businesspeople. ML approaches can be used to generate and produce simple queries for nonexperts: queries that can easily be verified to produce correct results. AI is already proven to be able to understand natural language queries that assist programing on MySQL—which makes it a preferable protocol, due to the availability of more training data. AI can understand schema and apply the best practices for SQL. At the same time, AI cannot effectively discriminate between transaction and analysis types of queries or be sensible to cross-sharding consistency. This is why AI assistant programming requires a more versatile database that’s easy to use and flexible.

The architectural approach that is emerging is in support of delivering real-time insights and capabilities, leveraging AI. Databases are forming the foundation of real-time AI, employed in conjunction with streaming technologies.

AI means new approaches to building and managing databases, as well as an escalation of the roles of databases themselves. Enterprises need to prepare for, and embrace the power of, AI with high-performance data architectures that are scalable, able to process mixed workloads, highly available, and capable of delivering intelligence on demand.

1 Thomas H. Davenport and Thomas C. Redman, “How AI Is Improving Data Management”, MIT Sloan Management Review, December 20, 2022.

Get High-Performance Data Architectures now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.