Below are instructions to download install and Run PIG 0.15.0
DownLoad tar.gzMove it to a new location: on my machine /usr/local/pig-0.15.0
sudo mv pig-0.15.0 /usr/local/
Now setup env vars, below are variables on my machine (ubuntu 14+ LTS) (file .bashrc):
#PIG VARS
export PIG_HOME='/usr/local/pig-0.15.0'
export PATH=$PIG_HOME/bin:$PATH
export PIG_CLASSPATH=$HADOOP_INSTALL/etc/hadoop
#PIG VARS END
Now start hadoop and yarn with following commands:
start-dfs.sh
start-yarn.sh
Now try "pig" command on terminal, you will see message "Connecting to hadoop file system at: hdfs://localhost:9000"
Now submit following code in grunt shell:
a = load '/user/hduser/wordcount/input/file0';
b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
c = group b by word;
d = foreach c generate COUNT(b), group;
store d into '/user/hduser/wordcount/output/pig_wordcount';
Notes:
/user/hduser/wordcount/input/file0 is an hdfs path you can use put command to move files from local to hdfs file system.
Also you can see job running using link of Yarn:
http://localhost:8088/cluster
Verify the output:
hdfs dfs -cat /user/hduser/wordcount/output/pig_wordcount/part-r-00000
Running pig via script:
Example Input:file name: input-new and input-new1 (both are same file)
mike 3 40000 boston google
mike1 12 40000 california facebook
mike2 13 60000 seattle microsoft
mike3 32 100000 portland microsoft
mike4 32 1000000 portland cisco
santa 25 20000 banglore flipkart
example1.pig (this is script name, you can store it any where I use /usr/local/pig-0.15.0/scripts/myscripts ):
input1 = LOAD '/user/hduser/wordcount/input/input-new' USING PigStorage('\t') AS (name, age:int, salary:int, city, company);
input2 = LOAD '/user/hduser/wordcount/input/input-new1' USING PigStorage('\t') AS (name, age:int, salary:int, city, company);
filterinput = FILTER input1 BY age > 12;
STORE filterinput INTO '/user/hduser/wordcount/output/result1';
groupinput = GROUP filterinput BY (city);
STORE groupinput into '/user/hduser/wordcount/output/result3';
calculatesum = FOREACH groupinput GENERATE group,filterinput.name, SUM(filterinput.salary);
STORE calculatesum into '/user/hduser/wordcount/output/result2';
Now you can run it with following command:
pig example1.pig
Checkout url http://localhost:8088/cluster (yarn) it will show your job is running.
Verify output
hdfs dfs -ls /user/hduser/wordcount/output/result2
More more hadoop commands click here
For hadoop installation click here
It was really a nice post and i was really impressed by reading this Big Data Hadoop Online Training Bangalore
ReplyDelete