Chapter 15. Getting Your Data into Shape
When it comes to making graphs, half the battle occurs before you call any graphing commands. Before you pass your data to the graphing functions, it must first be read in and given the correct structure. The data sets provided with R are ready to use, but when dealing with real-world data, this usually isn’t the case: you’ll have to clean up and restructure the data before you can visualize it.
Data sets in R are most often stored in data frames. They’re typically used as two-dimensional data structures, with each row representing one case and each column representing one variable. Data frames are essentially lists of vectors and factors, all of the same length, where each vector or factor represents one column.
Here’s the heightweight
data
set:
library(
gcookbook)
# For the data set
heightweightsex ageYear ageMonth heightIn weightLb f
11.92
143
56.3
85.0
f12.92
155
62.3
105.0
...
m13.92
167
62.0
107.5
m12.58
151
59.3
87.0
It consists of five columns, with each row representing one case: a
set of information about a single person. We can get a clearer idea of how
it’s structured by using the str()
function:
str(
heightweight)
'data.frame'
:236
obs. of5
variables:$
sex : Factor w/
2
levels"f"
,
"m"
:1
1
1
1
1
1
1
1
1
1
...
$
ageYear : num11.9
12.9
12.8
13.4
15.9
...
$
ageMonth: int143
155
153
161
191
171
185
142
160
140
...
$
heightIn: num56.3
62.3
63.3
59
62.5
62.5
59
56.5
62
53.8
...
$
weightLb: num85
105
108
92
112
...
The first column, ...
Get R Graphics Cookbook now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.