Chapter 6. Data Loading, Storage, and File Formats
Accessing data is a necessary first step for using most of the tools in this book. I’m going to be focused on data input and output using pandas, though there are numerous tools in other libraries to help with reading and writing data in various formats.
Input and output typically falls into a few main categories: reading text files and other more efficient on-disk formats, loading data from databases, and interacting with network sources like web APIs.
6.1 Reading and Writing Data in Text Format
pandas features a number of functions for reading
tabular data as a DataFrame object. Table 6-1 summarizes some of them, though
read_csv
is likely the one you’ll
use the most.
Function | Description |
---|---|
read_csv | Load delimited data from a file, URL, or file-like object; use comma as default delimiter |
read_fwf | Read data in fixed-width column format (i.e., no delimiters) |
read_clipboard | Version of read_csv
that reads data from the clipboard; useful for converting tables
from web pages |
read_excel | Read tabular data from an Excel XLS or XLSX file |
read_hdf | Read HDF5 files written by pandas |
read_html | Read all tables found in the given HTML document |
read_json | Read data from a JSON (JavaScript Object Notation) string representation |
read_msgpack | Read pandas data encoded using the MessagePack binary format |
read_pickle | Read an arbitrary object stored in Python pickle format |
read_sas | Read a SAS dataset stored in one of the SAS system’s custom ... |
Get Python for Data Analysis, 2nd Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.