Chapter 9. Standing Up a Cluster
Now that you have instances up and running in the cloud provider of your choice, they can be set up to run a Hadoop cluster. If you don’t have instances at the ready and want to follow along, then go back to Chapter 6 for AWS, Chapter 7 for Google Cloud Platform, or Chapter 8 for Azure first, and then return here.
The JDK
Hadoop requires a Java runtime to work, and so Java must be installed on each of your new instances. A good strategy is to use the operating system package management capability already on the instances, e.g., yum
on Red Hat Linux, apt
on Ubuntu. Cloud providers ensure that these capabilities work within their infrastructures, sometimes even providing local mirrors or gateways to help.
Table 9-1 suggests packages to install for some operating systems. As new versions of Java are released, the package names will change.
OS | Package names |
---|---|
Debian or Ubuntu |
openjdk-8-jdk or openjdk-7-jdk |
Red Hat or CentOS |
java-1.8.0-openjdk or java-1.7.0-openjdk |
Instead of using a package available natively for your operating system, you can install an Oracle JDK by downloading an installation package directly from Oracle. Since you have root access to your instances, you are free to use whatever means you prefer to install Java.
After you have installed Java, make note of where the Java home directory is (i.e., what the JAVA_HOME
environment variable should be set to). You will need to know this location ...
Get Moving Hadoop to the Cloud now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.