Book description
Entity resolution is a key analytic technique that enables you to identify multiple data records that refer to the same real-world entity. With this hands-on guide, product managers, data analysts, and data scientists will learn how to add value to data by cleansing, analyzing, and resolving datasets using open source Python libraries and cloud APIs.
Author Michael Shearer shows you how to scale up your data matching processes and improve the accuracy of your reconciliations. You'll be able to remove duplicate entries within a single source and join disparate data sources together when common keys aren't available. Using real-world data examples, this book helps you gain practical understanding to accelerate the delivery of real business value.
With entity resolution, you'll build rich and comprehensive data assets that reveal relationships for marketing and risk management purposes, key to harnessing the full potential of ML and AI. This book covers:
- Challenges in deduplicating and joining datasets
- Extracting, cleansing, and preparing datasets for matching
- Text matching algorithms to identify equivalent entities
- Techniques for deduplicating and joining datasets at scale
- Matching datasets containing persons and organizations
- Evaluating data matches
- Optimizing and tuning data matching algorithms
- Entity resolution using cloud APIs
- Matching using privacy-enhancing technologies
Publisher resources
Table of contents
- Preface
- 1. Introduction to Entity Resolution
- 2. Data Standardization
- 3. Text Matching
- 4. Probabilistic Matching
- 5. Record Blocking
- 6. Company Matching
- 7. Clustering
- 8. Scaling Up on Google Cloud
- 9. Cloud Entity Resolution Services
- 10. Privacy-Preserving Record Linkage
- 11. Further Considerations
- Index
- About the Author
Product information
- Title: Hands-On Entity Resolution
- Author(s):
- Release date: February 2024
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781098148485
You might also like
book
The Self-Service Data Roadmap
Data-driven insights are a key competitive advantage for any industry today, but deriving insights from raw …
book
Data Management at Scale
As data management and integration continue to evolve rapidly, storing all your data in one place, …
book
Communication Patterns
Having a great idea or design is not enough to make your software project succeed. If …
audiobook
The Art of Leadership
Many people think leadership is a higher calling that resides exclusively with managers who practice or …