Pig Development Guide
1. Simple example
- Upload data to the hdfs directory
[hadoop@uhadoop-******-master1 pig]$ hadoop fs -put /etc/passwd /user/hadoop/passwd
- Start pig
[hadoop@uhadoop-******-master1 pig]$ pig
- Load data
grunt> A = load 'passwd' using PigStorage(':');
grunt> dump A;
Display results:
(root,x,0,0,root,/root,/bin/bash)
……
2. Use UDF
- Prepare data
Content of the student file
any 9 5
bob 8 4
Upload the student file
hdfs dfs -put student /user/root/student
- Sample code
package myudfs;
import java.io.IOException;
import org.apache.pig.EvalFunc;
import org.apache.pig.data.Tuple;
public class UPPER extends EvalFunc<String>
{
public String exec(Tuple input) throws IOException {
if (input == null || input.size() == 0 || input.get(0) == null)
return null;
try{
String str = (String)input.get(0);
return str.toUpperCase();
}catch(Exception e){
throw new IOException("Caught exception processing input row ", e);
}
}
}
- Compile
cd myudfs
javac -cp $ PIG_HOME/pig-0.12.0-cdh5.4.4.jar UPPER.java
cd ..
jar -cf myudfs.jar myudfs
Test script upper.pig
REGISTER myudfs.jar;
A = LOAD 'student' AS (name: chararray, age: int, gpa: float);
B = FOREACH A GENERATE myudfs.UPPER(name);
DUMP B;
- Execute
pig upper.pig
-
Output result
(ANY 9 5) (BOB 8 4)