Tuesday, March 23, 2010

Compile hadoop eclipse-plugin 0.20.2 for Eclipse 3.5 on Mac

Since the original eclipse-plugin of Hadoop 0.20.2 doesn't work well, because the plugin is developed under eclipse europa, i.e., 3.3. The phenomenon of mine is that when I click the "Run on Hadoop", there will be nothing continue. Here (if you can read Chinese ^_^) offers a good way to ignore this, to write a self-write Makefile. This is easy, actually I am using this way finished my first little hadoop project, but also boring. So I want to make the eclipse-plugin work well. Then I found some other help from:
http://wiki.apache.org/hadoop/EclipsePlugIn
http://trac.nchc.org.tw/cloud/wiki/waue/2010/0211 (help me a lot)
http://hi.baidu.com/feng2211/blog/item/4ab8a2095eb1b9a42eddd48e.html

http://yjhexy.javaeye.com/blog/591769
All links above except the first one are written in Chinese.
However, this work is not so hard, I finished this in one hour, and I will describe my steps in the follow:

1. Modify two files

First modify this file,
$HADOOP_HOME/src/contrib/build-contrib.xml
add below property:

<property name="eclipse.home" location="$Eclipse_HOME" />

Here, the $Eclipse_HOME  is the application path of your own eclipse. For example, Since I am working on Mac OS X 10.5, so my $Eclipse_HOME is /Applications/eclipse .

The second file need to be modifed is,

$HADOOP_HOME/src/contrib/eclipse-plugin/src/java/org/apache/hadoop/eclipse/lauch/HadoopApplicationLaunchShortcut.java



comment the old import about JavaApplicationLaunchShortcut, and add a new one. Like this:



//import org.eclipse.jdt.internal.debug.ui.launcher.JavaApplicationLaunchShortcut; 
import org.eclipse.jdt.debug.ui.launchConfigurations.JavaApplicationLaunchShortcut;


2. Compile


The basic compile steps are as follows:




cd $HADOOP_HOME 
ant compile
ant pack


There are several problems I met when I executing the command ant pack, maybe it’s not usual, but I think it’s better to describe them all here:




  • First, I was told that:



  • build.xml:908: 'java5.home' is not defined. Forrest requires Java 5. Please pass -Djava5.home=&lt;base 5=&quot;&quot; distribution&gt; to Ant on the command-line.



It seems need to be indicate JDK5, since the Apache Forrest need use JDK5 to run. Fortunately, Mac OS X has several versions of JDK together by default, so I just set the pack command like this:



ant package -Djava5.home=/System/Library/Frameworks/JavaVM.framework/Versions/1.5.0



  • But, then I was told:



  • build.xml:904: 'forrest.home' is not defined. Please pass -Djava5.home=<base of Java 5 distribution> to Ant on the command-line.


That's because I don't have Apache Forrest in my computer. So I go to here to download it , then decompress it to $Forrest_HOME (self-defined) . Then use:




ant package -Djava5.home=/System/Library/Frameworks/JavaVM.framework/Versions/1.5.0 -Dforrest.home=$Forrest_HOME/apache-forrest-0.8



  • Then, the new problem is:



  • [exec] Error: JAVA_HOME is not defined correctly. 
    [exec] We cannot execute /System/Library/Frameworks/JavaVM.framework/Versions/1.5.0/bin/java



That's because the JDK in Mac OS X is different from Unix or Windows. I just create a link called bin under the JDK path to take a work around. Then everything left works very fine.





3. Result





The result jar file is inside:


$HADOOP_HOME/build/contrib/eclipse-plugin/hadoop-0.20.3-dev-eclipse-plugin.jar


It's very funny that file name is different from the default one. You can change it by your self interesting.

Sunday, March 21, 2010

Deploying Hadoop on Cluster with NFS

在学校的集群上配Hadoop, 用了三台计算机A,B,C构建一个小集群来练习Hadoop.
这三台计算机NFS同一个$HOME目录, 即对于$HOME下的内容, 无论从哪台机器上访问, 都能得到.
机器也不在我这边, 通过SSH访问. ABC之间都做了公私钥, 互访不需要密码.

我在这三台上布置了Hadoop,其中A作为master, B和C作为slaves.
设置方法参考了论坛中的专利. 因为三台共享一个$HOME目录, 所以只需设置一次, 就这点而言, 比较方便.

碰到的第一个问题是:
NameNode的web页面显示没有DataNode. 查询DataNode的日志, 记录为:

INFO org.apache.hadoop.ipc.RPC: Server at /IP.Addr.OF.A:9000 not available yet, Zzzzz...

还有一个现象就是开启A中的dfs, 然后在B,C上telnet A的9000端口,
User@~$telnet host-name-of-A 9000Trying IP.Addr.OF.A ...
telnet: connect to address IP.Addr.OF.A : Connection refused
telnet: Unable to connect to remote host
可见问题是出在B,C无法访问A的9000端口上了.

为了进一步确认问题, 将A也设置成了slave.
在启动hdfs后, 需要过一段时间, 才能显示找到了一个DataNode,即A.
也就是说明A本身访问localhost的9000端口是没有问题的.

一开始怀疑可能是A上启动了防火墙, 所以把非指定端口上的外来请求给回绝掉了.
但经过询问集群的管理员,得知A中并没有开防火墙. 后来眼尖的管理员发现当我查询端口调用情况时:
user@host-name-of-A:~> lsof -i :9000
COMMAND   PID     USER   FD   TYPE   DEVICE SIZE NODE NAME
java    25231 username   38u  IPv6 14009610       TCP host-name-of-A:cslistener (LISTEN)
端口的type竟然是IPv6的. 原来如果系统支持IPv6的话, 那么Hadoop就自动把端口绑到IPv6上去了.
而其它机器都是通过IPv4来进行访问, 协议不同, 自然访问不了.

原因找到了就好办了, goolge "Hadoop IPv6".
找到了一位热心肠的大佬的解释:
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_%28Single-Node_Cluster%29#Disabling_IPv6
照着做一下, 在hadoop-env.sh下添加
HADOOP_OPTS=-Djava.net.preferIPv4Stack=true
从而优先选择IPv4. 问题解决.

但Hadoop死也不让我这么轻松过去, 网络是没问题了, 但接下来的问题更是头痛.
尽管我有两台slaves, 可不管怎么弄, DataNode就只有一台.
查询日志, 发现成功的那台显示:
2010-03-11 17:17:12,689 INFO org.apache.hadoop.hdfs.server.common.Storage: Storage directory $HOME/data/dir is not formatted.
2010-03-11 17:17:12,689 INFO org.apache.hadoop.hdfs.server.common.Storage: Formatting ...
而失败的那台则显示:
2010-03-11 17:17:09,176 INFO org.apache.hadoop.hdfs.server.common.Storage: Cannot lock storage $HOME/data/dir. The directory is already locked.
2010-03-11 17:17:09,288 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: java.io.IOException: Cannot lock storage $HOME/data/dir. The directory is already locked.

看到这儿也就明白了, 因为我把Hadoop的data都放在$HOME下,而$HOME是通过NFS共享的,因此所有的结点都访问的是同一个目录.
当一台DataNode启动后, 它就会锁掉那个目录, 而另一台就只好死也动不了了.

本来我是想通过HOSTNAME这个环境变量, 来设置不同的目录的.
但Hadoop死也不支持, 因为它是利用Java的System.getProperty()来获取系统信息的.
而getProperty就只能获取这样一些值: http://blog.csdn.net/kongqz/archive/2009/03/13/3987198.aspx
其中没有包含hostname, 也没有其它对我有用的值.
还有一个方法就是修改源码, 自己加入对于hostname的解析.
但这个方法对我来讲就复杂了.

后来还是通过google, 发现有个台湾人也想跟我做类似的事: http://austintodo.pixnet.net/blog/post/23887121
所以依照他的方法, 将数据目录和log目录都设在非$HOME下. 自然就没问题了.
不过此时要注意一点的就是, 如果你重新格式化了NameNode中的data目录, 一定要把每个DataNode内的数据目录给清空了
否则会出现Incompatible namespaceIDs的问题.