Book description
It’s tough to argue with R as a high-quality, cross-platform, open source statistical software product—unless you’re in the business of crunching Big Data. This concise book introduces you to several strategies for using R to analyze large datasets, including three chapters on using R and Hadoop together. You’ll learn the basics of Snow, Multicore, Parallel, Segue, RHIPE, and Hadoop Streaming, including how to find them, how to use them, when they work well, and when they don’t.
With these packages, you can overcome R’s single-threaded nature by spreading work across multiple CPUs, or offloading work to multiple machines to address R’s memory barrier.
- Snow: works well in a traditional cluster environment
- Multicore: popular for multiprocessor and multicore computers
- Parallel: part of the upcoming R 2.14.0 release
- R+Hadoop: provides low-level access to a popular form of cluster computing
- RHIPE: uses Hadoop’s power with R’s language and interactive shell
- Segue: lets you use Elastic MapReduce as a backend for lapply-style operations
Table of contents
- Parallel R
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- A Note Regarding Supplemental Files
- Preface
- 1. Getting Started
-
2. snow
- Quick Look
- How It Works
- Setting Up
-
Working with It
- Creating Clusters with makeCluster
- Parallel K-Means
- Initializing Workers
- Load Balancing with clusterApplyLB
- Task Chunking with parLapply
- Vectorizing with clusterSplit
- Load Balancing Redux
- Functions and Environments
- Random Number Generation
- snow Configuration
- Installing Rmpi
- Executing snow Programs on a Cluster with Rmpi
- Executing snow Programs with a Batch Queueing System
- Troubleshooting snow Programs
- When It Works…
- …And When It Doesn’t
- The Wrap-up
- 3. multicore
- 4. parallel
- 5. A Primer on MapReduce and Hadoop
- 6. R+Hadoop
- 7. RHIPE
- 8. Segue
- 9. New and Upcoming
- About the Authors
- SPECIAL OFFER: Upgrade this ebook with O’Reilly
- Copyright
Product information
- Title: Parallel R
- Author(s):
- Release date: October 2011
- Publisher(s): O'Reilly Media, Inc.
- ISBN: 9781449320331
You might also like
book
Advanced R
An Essential Reference for Intermediate and Advanced R Programmers Advanced R presents useful tools and techniques …
book
Practical R 4: Applying R to Data Manipulation, Processing and Integration
Get started with an accelerated introduction to the R ecosystem, programming language, and tools including R …
article
Run Llama-2 Models
Llama is Meta’s answer to the growing demand for LLMs. Unlike its well-known technological relative, ChatGPT, …
book
Statistical Computing in C++ and R
With the advancement of statistical methodology inextricably linked to the use of computers, new methodological ideas …