A DevOps approach to data management
A multi-model approach to transforming data from a liability to an asset.
Deriving knowledge from data has become a key competency for many—if not most—businesses. With the right data, and the right tools to handle it, businesses can gain keen insights into a variety of metrics, including operations, customer activity, and employee productivity.
In the free O’Reilly report, Defining Data-Driven Software Development, author Eric Laquer explores the advancements of DevOps and applies those lessons to managing data, addressing the challenges involved in handling business data and examining ways to fulfill the various needs of different stakeholders. Laquer also illustrates how using a multi-model approach allows for a variety of data types and schemas to operate side by side.
Utilizing data in its natural form
Multi-model technology accepts that our data comes in many types. It enables a variety of representational forms and indexing techniques for more creative and effective approaches to problems that might otherwise present major challenges in the confining, single-schema world of tables, rows, and columns.
As many of us know, today’s data does not tend to be homogenous, fitting easily into a relational schema based on rows and tables. Modern applications must process data that includes records, documents, videos, text, and semantic triples in a variety of formats, including XML, JSON, text, and binary. A majority of the data that analysts work with today is document based, with inconsistent schema and metadata from document to document, even within the same collection. Businesses typically have documents that fulfill similar purposes, but are structured differently for historical or other reasons; this is especially true with large organizations that have merged with other organizations over time.
However, since documents are not the only type of data that businesses must deal with, pure document-oriented database systems are not the best solution for large-scale data storage. Other options include a semantic triple store or a graph database, to surface useful facts about how entities are connected. In some cases, it may also be necessary to handle certain types of data (i.e., transactional data), using more conventional relational technologies.
Abandoning rigid schemas
Flexibility has become the key to handling all of the different types of data present in organizations today; however, achieving the necessary level of flexibility requires abandoning rigid schema definitions. Without rigid schemas, software developers can examine and work with data as it is, avoiding the common practice of writing application code to munge and reformat data to fit schemas. The ability to deal with data in its “natural” form, and mark it up as needed, makes development more nimble.
In the free report, Laquer examines some of the conflicting needs around data and proposes as a solution a DevOps model of data management, combined with a flexible, multi-model database system. Developers need flexibility and the ability to build out systems with as little administrative overhead as possible. DBAs, analysts, compliance officers, and other stakeholders also have their own needs in relationship to a database system. While new technology can help normalize the interactions between different types of data and the needs of various stakeholders, we also need a new model for data management. The DevOps movement can provide such a model.
To learn more, download the free O’Reilly report “Defining Data-Driven Software Development.”
This post is a collaboration between O’Reilly and MarkLogic. See our statement of editorial independence.