Data Wrangling

Book description

DATA WRANGLING

Written and edited by some of the world’s top experts in the field, this exciting new volume provides state-of-the-art research and latest technological breakthroughs in data wrangling, its theoretical concepts, practical applications, and tools for solving everyday problems.

Data wrangling is the process of cleaning and unifying messy and complex data sets for easy access and analysis. This process typically includes manually converting and mapping data from one raw form into another format to allow for more convenient consumption and organization of the data. Data wrangling is increasingly ubiquitous at today’s top firms.

Data cleaning focuses on removing inaccurate data from your data set whereas data wrangling focuses on transforming the data’s format, typically by converting “raw” data into another format more suitable for use. Data wrangling is a necessary component of any business. Data wrangling solutions are specifically designed and architected to handle diverse, complex data at any scale, including many applications, such as Datameer, Infogix, Paxata, Talend, Tamr, TMMData, and Trifacta.

This book synthesizes the processes of data wrangling into a comprehensive overview, with a strong focus on recent and rapidly evolving agile analytic processes in data-driven enterprises, for businesses and other enterprises to use to find solutions for their everyday problems and practical applications. Whether for the veteran engineer, scientist, or other industry professional, this book is a must have for any library.

Table of contents

  1. Cover
  2. Series Page
  3. Title Page
  4. Copyright Page
  5. 1 Basic Principles of Data Wrangling
    1. 1.1 Introduction
    2. 1.2 Data Workflow Structure
    3. 1.3 Raw Data Stage
    4. 1.4 Refined Stage
    5. 1.5 Produced Stage
    6. 1.6 Steps of Data Wrangling
    7. 1.7 Do’s for Data Wrangling
    8. 1.8 Tools for Data Wrangling
    9. References
  6. 2 Skills and Responsibilities of Data Wrangler
    1. 2.1 Introduction
    2. 2.2 Role as an Administrator (Data and Database)
    3. 2.3 Skills Required
    4. 2.4 Responsibilities as Database Administrator
    5. 2.5 Concerns for a DBA [12]
    6. 2.6 Data Mishandling and Its Consequences
    7. 2.7 The Long-Term Consequences: Loss of Trust and Diminished Reputation
    8. 2.8 Solution to the Problem
    9. 2.9 Case Studies
    10. 2.10 Conclusion
    11. References
  7. 3 Data Wrangling Dynamics
    1. 3.1 Introduction
    2. 3.2 Related Work
    3. 3.3 Challenges: Data Wrangling
    4. 3.4 Data Wrangling Architecture
    5. 3.5 Data Wrangling Tools
    6. 3.6 Data Wrangling Application Areas
    7. 3.7 Future Directions and Conclusion
    8. References
  8. 4 Essentials of Data Wrangling
    1. 4.1 Introduction
    2. 4.2 Holistic Workflow Framework for Data Projects
    3. 4.3 The Actions in Holistic Workflow Framework
    4. 4.4 Transformation Tasks Involved in Data Wrangling
    5. 4.5 Description of Two Types of Core Profiling
    6. 4.6 Case Study
    7. 4.7 Quantitative Analysis
    8. 4.8 Graphical Representation
    9. 4.9 Conclusion
    10. References
  9. 5 Data Leakage and Data Wrangling in Machine Learning for Medical Treatment
    1. 5.1 Introduction
    2. 5.2 Data Wrangling and Data Leakage
    3. 5.3 Data Wrangling Stages
    4. 5.4 Significance of Data Wrangling
    5. 5.5 Data Wrangling Examples
    6. 5.6 Data Wrangling Tools for Python
    7. 5.7 Data Wrangling Tools and Methods
    8. 5.8 Use of Data Preprocessing
    9. 5.9 Use of Data Wrangling
    10. 5.10 Data Wrangling in Machine Learning
    11. 5.11 Enhancement of Express Analytics Using Data Wrangling Process
    12. 5.12 Conclusion
    13. References
  10. 6 Importance of Data Wrangling in Industry 4.0
    1. 6.1 Introduction
    2. 6.2 Steps in Data Wrangling
    3. 6.3 Data Wrangling Goals
    4. 6.4 Tools and Techniques of Data Wrangling
    5. 6.5 Ways for Effective Data Wrangling
    6. 6.6 Future Directions
    7. References
  11. 7 Managing Data Structure in R
    1. 7.1 Introduction to Data Structure
    2. 7.2 Homogeneous Data Structures
    3. 7.3 Heterogeneous Data Structures
    4. References
  12. 8 Dimension Reduction Techniques in Distributional Semantics: An Application Specific Review
    1. 8.1 Introduction
    2. 8.2 Application Based Literature Review
    3. 8.3 Dimensionality Reduction Techniques
    4. 8.4 Experimental Analysis
    5. 8.5 Conclusion
    6. References
  13. 9 Big Data Analytics in Real Time for Enterprise Applications to Produce Useful Intelligence
    1. 9.1 Introduction
    2. 9.2 The Internet of Things and Big Data Correlation
    3. 9.3 Design, Structure, and Techniques for Big Data Technology
    4. 9.4 Aspiration for Meaningful Analyses and Big Data Visualization Tools
    5. 9.5 Big Data Applications in the Commercial Surroundings
    6. 9.6 Big Data Insights’ Constraints
    7. 9.7 Conclusion
    8. References
  14. 10 Generative Adversarial Networks: A Comprehensive Review
    1. List of Abbreviations
    2. 10.1 Introductıon
    3. 10.2 Background
    4. 10.3 Anatomy of a GAN
    5. 10.4 Types of GANs
    6. 10.5 Shortcomings of GANs
    7. 10.6 Areas of Application
    8. 10.7 Conclusıon
    9. References
  15. 11 Analysis of Machine Learning Frameworks Used in Image Processing: A Review
    1. 11.1 Introduction
    2. 11.2 Types of ML Algorithms
    3. 11.3 Applications of Machine Learning Techniques
    4. 11.4 Solution to a Problem Using ML
    5. 11.5 ML in Image Processing
    6. 11.6 Conclusion
    7. References
  16. 12 Use and Application of Artificial Intelligence in Accounting and Finance: Benefits and Challenges
    1. 12.1 Introduction
    2. 12.2 Uses of AI in Accounting & Finance Sector
    3. 12.3 Applications of AI in Accounting and Finance Sector
    4. 12.4 Benefits and Advantages of AI in Accounting and Finance
    5. 12.5 Challenges of AI Application in Accounting and Finance
    6. 12.6 Suggestions and Recommendation
    7. 12.7 Conclusion and Future Scope of the Study
    8. References
  17. 13 Obstacle Avoidance Simulation and Real-Time Lane Detection for AI-Based Self-Driving Car
    1. 13.1 Introduction
    2. 13.2 Simulations and Results
    3. 13.3 Conclusion
    4. References
  18. 14 Impact of Suppliers Network on SCM of Indian Auto Industry: A Case of Maruti Suzuki India Limited
    1. 14.1 Introduction
    2. 14.2 Literature Review
    3. 14.3 Methodology
    4. 14.4 Findings
    5. 14.5 Discussion
    6. 14.6 Conclusion
    7. References
  19. About the Editors
  20. Index
  21. Also of Interest
  22. End User License Agreement

Product information

  • Title: Data Wrangling
  • Author(s): M. Niranjanamurthy, Kavita Sheoran, Geetika Dhand, Prabhjot Kaur
  • Release date: July 2023
  • Publisher(s): Wiley-Scrivener
  • ISBN: 9781119879688