Chapter 7. Manage Data: Improve Lifecycle Management
Imagine that you have a data catalog that has been successfully implemented. Data from data sources is gradually pushed/pulled into the data catalog, and the data catalog is being used by everyone in your organization. The data catalog is growing organically, with strong, decentralized nodes like a social network. Assets will get metadata—glossary terms, descriptions, ownership, and so on—assigned to them, and the IT landscape of your company is becoming discoverable.
Now that you have a working data catalog, you can use it to perform better data lifecycle management of the data in your IT landscape. This is a little bit of a revolution, and once you get this up and running, it will pay off. Accordingly, this chapter covers:
-
Management of the lifecycle of data in the IT landscape with the data catalog
-
Management of the lifecycles of data assets, terms, and more in the data catalog
At the end of this chapter, I will also discuss data observability, which will push management of data into an even earlier stage of the data lifecycle.
The Value of Data Lifecycle Management and Why the Data Catalog Is a Game Changer
In data science, computer science, data engineering, and adjacent disciplines, the data engineering lifecycle is well understood. This lifecycle is about getting data from source systems to serve in use cases of machine learning, business intelligence, and more. But, as pointed out in Fundamentals of Data Engineering ...
Get The Enterprise Data Catalog now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.