Configuring the Hadoop Gateway node

Hadoop Gateway or edge node is a node that connects to the Hadoop cluster, but does not run any of the daemons. The purpose of an edge node is to provide an access point to the cluster and prevent users from a direct connection to critical components such as Namenode or Datanode.

Another important reason for its use is the data distribution across the cluster. If a user connects to a Datanode and performs the data copy operation hadoop fs –put file /, then one copy of the file will always go to the Datanode from which the copy command was executed. This will result in an imbalance of data across the node. If we upload a file from a node that is not a Datanode, then data will be distributed evenly for all copies ...

Get Hadoop 2.x Administration Cookbook now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.