Book description
Today, the world is trying to create and educate data scientists because of the phenomenon of Big Data. And everyone is looking deeply into this technology. But no one is looking at the larger architectural picture of how Big Data needs to fit within the existing systems (data warehousing systems). Taking a look at the larger picture into which Big Data fits gives the data scientist the necessary context for how pieces of the puzzle should fit together. Most references on Big Data look at only one tiny part of a much larger whole. Until data gathered can be put into an existing framework or architecture it can’t be used to its full potential. Data Architecture a Primer for the Data Scientist addresses the larger architectural picture of how Big Data fits with the existing information infrastructure, an essential topic for the data scientist.
Drawing upon years of practical experience and using numerous examples and an easy to understand framework. W.H. Inmon, and Daniel Linstedt define the importance of data architecture and how it can be used effectively to harness big data within existing systems. You’ll be able to:
- Turn textual information into a form that can be analyzed by standard tools.
- Make the connection between analytics and Big Data
- Understand how Big Data fits within an existing systems environment
- Conduct analytics on repetitive and non-repetitive data
- Discusses the value in Big Data that is often overlooked, non-repetitive data, and why there is significant business value in using it
- Shows how to turn textual information into a form that can be analyzed by standard tools
- Explains how Big Data fits within an existing systems environment
- Presents new opportunities that are afforded by the advent of Big Data
- Demystifies the murky waters of repetitive and non-repetitive data in Big Data
Table of contents
- Cover
- Title page
- Table of Contents
- Copyright
- Dedication
- Preface
- About the Authors
- 1.1: Corporate Data
- 1.2: The Data Infrastructure
- 1.3: The “Great Divide”
- 1.4: Demographics of Corporate Data
- 1.5: Corporate Data Analysis
- 1.6: The Life Cycle of Data – Understanding Data Over Time
- 1.7: A Brief History of Data
- 2.1: A Brief History of Big Data
- 2.2: What is Big Data?
- 2.3: Parallel Processing
- 2.4: Unstructured Data
- 2.5: Contextualizing Repetitive Unstructured Data
- 2.6: Textual Disambiguation
- 2.7: Taxonomies
- 3.1: A Brief History of Data Warehouse
- 3.2: Integrated Corporate Data
- 3.3: Historical Data
- 3.4: Data Marts
- 3.5: The Operational Data Store
- 3.6: What a Data Warehouse is Not
- 4.1: Introduction to Data Vault
- 4.2: Introduction to Data Vault Modeling
- 4.3: Introduction to Data Vault Architecture
- 4.4: Introduction to Data Vault Methodology
- 4.5: Introduction to Data Vault Implementation
- 5.1: The Operational Environment – A Short History
- 5.2: The Standard Work Unit
- 5.3: Data Modeling for the Structured Environment
- 5.4: Metadata
- 5.5: Data Governance of Structured Data
- 6.1: A Brief History of Data Architecture
-
6.2: Big Data/Existing Systems Interface
- Abstract
- The Big Data/Existing Systems Interface
- The Repetitive Raw Big Data/Existing Systems Interface
- Exception-Based Data
- The Nonrepetitive Raw Big Data/Existing Systems Interface
- Into the Existing Systems Environment
- The “Context-Enriched” Big Data Environment
- Analyzing Structured Data/Unstructured Data Together
- 6.3: The Data Warehouse/Operational Environment Interface
- 6.4: Data Architecture – A High-Level Perspective
- 7.1: Repetitive Analytics – Some Basics
- 7.2: Analyzing Repetitive Data
- 7.3: Repetitive Analysis
-
8.1: Nonrepetitive Data
- Abstract
- Inline Contextualization
- Taxonomy/Ontology Processing
- Custom Variables
- Homographic Resolution
- Acronym Resolution
- Negation Analysis
- Numeric Tagging
- Date Tagging
- Date Standardization
- List Processing
- Associative Word Processing
- Stop Word Processing
- Word Stemming
- Document Metadata
- Document Classification
- Proximity Analysis
- Functional Sequencing within Textual ETL
- Internal Referential Integrity
- Preprocessing, Postprocessing
- 8.2: Mapping
- 8.3: Analytics from Nonrepetitive Data
- 9.1: Operational Analytics
- 10.1: Operational Analytics
- 11.1: Personal Analytics
- 12.1: A Composite Data Architecture
- Glossary
- Index
Product information
- Title: Data Architecture: A Primer for the Data Scientist
- Author(s):
- Release date: November 2014
- Publisher(s): Morgan Kaufmann
- ISBN: 9780128020913
You might also like
book
Data Architecture: A Primer for the Data Scientist, 2nd Edition
Over the past 5 years, the concept of big data has matured, data science has grown …
book
Data Science Bookcamp
Learn data science with Python by building five real-world projects! Experiment with card game predictions, tracking …
book
Data Fabric as Modern Data Architecture
Data fabric is a hot concept in data management today. By encompassing the data ecosystem your …
book
Hands-On Big Data Modeling
Solve all big data problems by learning how to create efficient data models Key Features Create …