Join GitHub today
It is an introduction of Hadoop installation under pseudo-distributed model. The difference among single node, pseudo-distributed and distributed is introduced here: link. Install Homebrew and Cask Homebrew is a free and open-source software package management system that simplifies the installation of software on Apple’s macOS operating system. Connect Hadoop client on Mac OS X to Kerberized HDP cluster. Running a hadoop client on Mac OS X and connect to a Kerberized cluster poses some extra challenges. I suggest to use brew, the Mac package manager to conveniently install the Hadoop package. Connect Hadoop client on Mac OS X to Kerberized HDP cluster Related Articles.
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.
Sign upHave a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
commented Apr 4, 2016 • edited by CaesarPan
edited by CaesarPan
总环境配置一、预装环境配置1. Homebrew
2. JAVA
二、配置 SSH为了保证远程登录管理 Hadoop 及 Hadoop 节点用户共享的安全性,Hadoop 需要配置使用 SSH 协议
2. 配置伪分布式 Hadoop(1)配置 hadoop-env.sh
(2) 配置 yarn-env.sh
(3)配置 core-site.xml
6.可视化查看
四、总结其实配置起来,如果按照上面的话,其实很快,但摸索的时候坑多,网速什么,路径什么,没事就会崩一崩。 环境搭好,继续撸理论,与一些也做这个的朋友们讨论了一下,还是要补一下统计学的知识,如果部门谁有兴趣,可以试一试哦。 |
changed the titleMac OS X EI Captian 下安装及配置伪分布式 Hadoop 环境Aug 24, 2016
commented Sep 4, 2016
很详细,太赞了! |
commented Nov 30, 2016
求问, ssh localhost一直需要密码是怎么回事 |
commented Nov 30, 2016
|
commented Mar 27, 2017
mac 上可以用普通的tar.gz的linux包安装运行hadoop么?楼主试过没有? |
commented Mar 27, 2017
ssh-keygen -t dsa -P ' -f ~/.ssh/id_dsa |
commented Jun 13, 2017
感谢楼主教程, 刚又踩了一个坑, namenode跑不起来, 后来查看到, namenode用到的端口被占了 |
commented Jun 28, 2017
job 一直处理Running 状态有遇到过?
|
commented Aug 1, 2017
Hadoop Download For Mac Os X 7
1楼应该是datanode没有启动,所以例子也无法运行,可能是namenode跟datanode的id不一致,你是否多次格式化过namenode |
commented Sep 16, 2018
一直提示18/09/16 18:45:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable |
- Prerequisites
- Pseudo-Distributed Operation
Purpose
This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
Prerequisites
Supported Platforms
- GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
- Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.
Required Software
Required software for Linux and Windows include:
- JavaTM 1.6.x, preferably from Sun, must be installed.
- ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.
Additional requirements for Windows include:
- Cygwin - Required for shell support in addition to the required software above.
Installing Software
If your cluster doesn't have the requisite software you will need to install it.
For example on Ubuntu Linux:
$ sudo apt-get install ssh
$ sudo apt-get install rsync
On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:
- openssh - the Net category
Download
To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.
Prepare to Start the Hadoop Cluster
Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.
Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.
Now you are ready to start your Hadoop cluster in one of the three supported modes:
- Local (Standalone) Mode
- Pseudo-Distributed Mode
- Fully-Distributed Mode
Standalone Operation
By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.
The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ cat output/*
Pseudo-Distributed Operation
Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.
Configuration
Use the following:
conf/core-site.xml:
conf/hdfs-site.xml:
conf/mapred-site.xml:
Setup passphraseless ssh
Download Hadoop For Mac Os X
Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost
If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P ' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Execution
Format a new distributed-filesystem:
$ bin/hadoop namenode -format
Start the hadoop daemons:
$ bin/start-all.sh
The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).
Browse the web interface for the NameNode and the JobTracker; by default they are available at:
- NameNode - http://localhost:50070/
- JobTracker - http://localhost:50030/
Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input
Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
Examine the output files:
Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*
or
View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*
When you're done, stop the daemons with:
$ bin/stop-all.sh
Fully-Distributed Operation
For information on setting up fully-distributed, non-trivial clusters see Cluster Setup.
Java and JNI are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.