Hadoop Download For Mac Os X

Join GitHub today

It is an introduction of Hadoop installation under pseudo-distributed model. The difference among single node, pseudo-distributed and distributed is introduced here: link. Install Homebrew and Cask Homebrew is a free and open-source software package management system that simplifies the installation of software on Apple’s macOS operating system. Connect Hadoop client on Mac OS X to Kerberized HDP cluster. Running a hadoop client on Mac OS X and connect to a Kerberized cluster poses some extra challenges. I suggest to use brew, the Mac package manager to conveniently install the Hadoop package. Connect Hadoop client on Mac OS X to Kerberized HDP cluster Related Articles.

GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.

Sign up New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comments

commented Apr 4, 2016
edited by CaesarPan

一堆废话

前前后后几个星期都在看理论,所以趁着放小长假就搭了一下 hadoop 的环境,虽然教程一抓一大把,但是对于 Mac 上的伪分布搭建基本都是不怎么能跑的,各种博客都是互相转载,所以在撸了一部分官方文档之后,结合一些有点用的博客,总算是把这个环境打好了,正所以环境都不会搭,还谈什么开发,也是为了防止自己玩崩 hadoop 忘了怎么装,就写了这个,有兴趣的也可以考虑坑一下,对于 Linux 的话,教程很多,如果有时间,会再出一篇,各位看官往下看吧。

总环境配置

一、预装环境配置

1. Homebrew

  • 打开<终端>窗口, 粘贴以下脚本

2. JAVA

  • Oracle 官网下载 JDK8 的 Mac OS X 安装包:Java SE Downloads

  • 打开下载的 dmg 文件,双击包中的 pkg 文件进行安装

  • 打开<终端>,输入

  • 显示为

  • JDK目录为

    3. Xcode

  • 打开 App Store 进行下载

  • PS:速度可能不是很快,但是官方的还是很安全

二、配置 SSH

为了保证远程登录管理 Hadoop 及 Hadoop 节点用户共享的安全性,Hadoop 需要配置使用 SSH 协议

  • 打开系统偏好设置-共享-远程登录-允许访问-所有用户

  • 打开<终端>,分别输入

  • 配置好之后,输入

  • 显示

  • 或者类似时间信息,即配置完成

    三、安装及配置 Hadoop

    1.安装 Hadoop

  • <终端>输入

  • 显示如下即安装成功

2. 配置伪分布式 Hadoop

(1)配置 hadoop-env.sh
  • <终端>输入

  • 修改为

(2) 配置 yarn-env.sh
  • <终端>输入

  • 加入

(3)配置 core-site.xml
  • <终端>输入

  • 编辑

    (4) 配置 hdfs-core.xml
  • <终端>输入

  • 编辑

    (5) 配置 mapred-site.xml
  • <终端>依次输入

  • 编辑

    (6) 配置 yarn-site.xml
  • <终端>输入

  • 编辑

    3. 格式化 HDFS

  • <终端>输入

    4.启动

  • 找到sbin目录

    (1)启动 HDFS
    (2) 启动 MapReduce
    (3) 检查启动情况
  • 结果

    5.运行 MapReduce 自带实例

  • 测算pi值的实例

  • 结果

6.可视化查看

  • 通过web接口查看
  • Cluster Status http://localhost:8088
  • HDFS status http://localhost:50070
  • secondaryNamenode http://localhost:50090

四、总结

其实配置起来,如果按照上面的话,其实很快,但摸索的时候坑多,网速什么,路径什么,没事就会崩一崩。

环境搭好,继续撸理论,与一些也做这个的朋友们讨论了一下,还是要补一下统计学的知识,如果部门谁有兴趣,可以试一试哦。

changed the titleMac OS X EI Captian 下安装及配置伪分布式 Hadoop 环境Aug 24, 2016

commented Sep 4, 2016

很详细,太赞了!
我按楼主的步骤操作,到./start-dfs.sh就会出点问题,出现如下现象:
“2016-09-04 15:24:50.474 java[15275:613152] Unable to load realm info from SCDynamicStore”
在stackoverflow上查到,将“hadoopp-env.sh”中楼主修改的那句修改成如下可修复:
HADOOP_OPTS='${HADOOP_OPTS} -Djava.security.krb5.realm= -Djava.security.krb5.kdc='
HADOOP_OPTS='${HADOOP_OPTS} -Djava.security.krb5.conf=/dev/null'
这个不算问题。
现在遇到个问题,运行完start-dfs和start-yarn后,jps后出现如下
14572 ResourceManager
14242 NameNode
15585 Jps
4432
14450 SecondaryNameNode
14670 NodeManager
也就是DataNode没有显示出来,实际上4432应该是DataNode吧,不知怎么回事。
最后的示例也运行不了,显示如下问题:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /user/yjgu/QuasiMonteCarlo_1472973018410_433909577/in/part0 could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
请问楼主对这清不清楚,知道怎么解决吗。
谢谢。

commented Nov 30, 2016

求问, ssh localhost一直需要密码是怎么回事

commented Nov 30, 2016

yorkchu1995:
你在之前可能已经配置过ssh了。

commented Mar 27, 2017

mac 上可以用普通的tar.gz的linux包安装运行hadoop么?楼主试过没有?

commented Mar 27, 2017

ssh-keygen -t dsa -P ' -f ~/.ssh/id_dsa
cat /.ssh/id_dsa.pub >>/.ssh/authorized_keys
我执行上面这个好像不能免密,还是得输入密码,最后将dsa换成rsa就可以了

commented Jun 13, 2017

感谢楼主教程, 刚又踩了一个坑, namenode跑不起来, 后来查看到, namenode用到的端口被占了

commented Jun 28, 2017

job 一直处理Running 状态有遇到过?

17/06/28 23:16:11 INFO mapreduce.JobSubmitter: number of splits:1
17/06/28 23:16:12 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1498662635832_0002
17/06/28 23:16:13 INFO impl.YarnClientImpl: Submitted application application_1498662635832_0002
17/06/28 23:16:13 INFO mapreduce.Job: The url to track the job: http://MacBook-Air-2.local:8088/proxy/application_1498662635832_0002/
17/06/28 23:16:13 INFO mapreduce.Job: Running job: job_1498662635832_0002

commented Aug 1, 2017

Hadoop Download For Mac Os X 7

1楼应该是datanode没有启动,所以例子也无法运行,可能是namenode跟datanode的id不一致,你是否多次格式化过namenode

commented Sep 16, 2018

一直提示18/09/16 18:45:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment
  • Prerequisites
  • Pseudo-Distributed Operation

Purpose

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Prerequisites

Supported Platforms

  • GNU/Linux is supported as a development and production platform. Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
  • Win32 is supported as a development platform. Distributed operation has not been well tested on Win32, so it is not supported as a production platform.

Required Software

Required software for Linux and Windows include:

  1. JavaTM 1.6.x, preferably from Sun, must be installed.
  2. ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons.

Additional requirements for Windows include:

  1. Cygwin - Required for shell support in addition to the required software above.

Installing Software

If your cluster doesn't have the requisite software you will need to install it.

For example on Ubuntu Linux:

$ sudo apt-get install ssh
$ sudo apt-get install rsync

On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:

  • openssh - the Net category

Download

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation.

Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.

Now you are ready to start your Hadoop cluster in one of the three supported modes:

  • Local (Standalone) Mode
  • Pseudo-Distributed Mode
  • Fully-Distributed Mode

Standalone Operation

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ cat output/*

Pseudo-Distributed Operation

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

Configuration

Use the following:
conf/core-site.xml:


conf/hdfs-site.xml:


conf/mapred-site.xml:

Setup passphraseless ssh

Download hadoop for mac os x

Download Hadoop For Mac Os X

Now check that you can ssh to the localhost without a passphrase:
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:
$ ssh-keygen -t dsa -P ' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Execution

Format a new distributed-filesystem:
$ bin/hadoop namenode -format

Start the hadoop daemons:
$ bin/start-all.sh

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

  • NameNode - http://localhost:50070/
  • JobTracker - http://localhost:50030/

Copy the input files into the distributed filesystem:
$ bin/hadoop fs -put conf input

Run some of the examples provided:
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Examine the output files:

Copy the output files from the distributed filesystem to the local filesytem and examine them:
$ bin/hadoop fs -get output output
$ cat output/*

or

View the output files on the distributed filesystem:
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:
$ bin/stop-all.sh

Fully-Distributed Operation

For information on setting up fully-distributed, non-trivial clusters see Cluster Setup.

Java and JNI are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.