Book description
Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.
Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.
- Use graphics to describe data with one, two, or dozens of variables
- Develop conceptual models using back-of-the-envelope calculations, as well asscaling and probability arguments
- Mine data with computationally intensive methods such as simulation and clustering
- Make your conclusions understandable through reports, dashboards, and other metrics programs
- Understand financial calculations, including the time-value of money
- Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations
- Become familiar with different open source programming environments for data analysis
"Finally, a concise reference for understanding how to conquer piles of data."--Austin King, Senior Web Developer, Mozilla
"An indispensable text for aspiring data scientists."--Michael E. Driscoll, CEO/Founder, Dataspora
Publisher resources
Table of contents
- Dedication
- A Note Regarding Supplemental Files
- Preface
- 1. Introduction
-
I. Graphics: Looking at Data
- 2. A Single Variable: Shape and Distribution
- 3. Two Variables: Establishing Relationships
- 4. Time As a Variable: Time-Series Analysis
- 5. More Than Two Variables: Graphical Multivariate Analysis
- 6. Intermezzo: A Data Analysis Session
-
II. Analytics: Modeling Data
- 7. Guesstimation and the Back of the Envelope
- 8. Models from Scaling Arguments
- 9. Arguments from Probability Models
- 10. What You Really Need to Know About Classical Statistics
- 11. Intermezzo: Mythbusting—Bigfoot, Least Squares, and All That
-
III. Computation: Mining Data
- 12. Simulations
- 13. Finding Clusters
- 14. Seeing the Forest for the Trees: Finding Important Attributes
- 15. Intermezzo: When More Is Different
-
IV. Applications: Using Data
- 16. Reporting, Business Intelligence, and Dashboards
- 17. Financial Calculations and Modeling
- 18. Predictive Analytics
- 19. Epilogue: Facts Are Not Reality
- A. Programming Environments for Scientific Computation and Data Analysis
- B. Results from Calculus
- C. Working with Data
- D. About the Author
- Index
- About the Author
- Colophon
- Copyright
Product information
- Title: Data Analysis with Open Source Tools
- Author(s):
- Release date: November 2010
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9780596802356
You might also like
book
Machine Learning and Big Data with kdb+/q
Upgrade your programming language to more effectively handle high-frequency data Machine Learning and Big Data with …
book
Statistics for Machine Learning
Build Machine Learning models with a sound statistical understanding. About This Book Learn about the statistics …
book
Predictive Analytics: Data Mining, Machine Learning and Data Science for Practitioners, 2nd Edition
Use Predictive Analytics to Uncover Hidden Patterns and Correlations and Improve Decision-Making Using predictive analytics techniques, …
book
Hands-On Exploratory Data Analysis with Python
Discover techniques to summarize the characteristics of your data using PyPlot, NumPy, SciPy, and pandas Key …