DistributedCache Hadoop

Create a util for adding jar to distributed cache.

path: local jar path
conf: hadoop conf

    private static void addJarToDistributedCache(
            String path, Configuration conf)
            throws IOException {

        File jarFile = new File(path);

        // Declare new HDFS location
        Path hdfsJar = new Path("/tmp/"
                + jarFile.getName());

        // Mount HDFS
        FileSystem hdfs = FileSystem.get(conf);

        // Copy (override) jar file to HDFS
        hdfs.copyFromLocalFile(false, true, new Path(path), hdfsJar);

        // Add jar to distributed classPath
        DistributedCache.addFileToClassPath(hdfsJar, conf);
    }

add jar to distributed cache before create the job.

public static void main(String[] args) throws Exception {
 
    // Create Hadoop configuration
    Configuration conf = new Configuration();
 
    // Add 3rd-party libraries
    String path = "/tmp/MyClass.jar";
    addJarToDistributedCache(path, conf);
 
    // Create my job
    Job job = new Job(conf, "Hadoop-classpath");
    .../...
}

LEAVE A COMMENT