Hadoop submission task source code analysis

1、 Source flow

// Get into Job Class waitForCompletion() Method 
// 1 Establishing a connection 
// 1) Create submission Job Agent for 
new Cluster(getConfiguration());
// (1) Determine whether it is the local operating environment or yarn Cluster running environment 
initialize(jobTrackAddr, conf);
// 2 Submit job
submitter.submitJobInternal(Job.this, cluster)
// 1) Create a Stag route 
Path jobStagingArea = JobSubmissionFiles.getStagingDir(cluster, conf);
// 2) obtain jobid , And create Job route 
JobID jobId = submitClient.getNewJobID();
// 3) Copy jar Packet to cluster 
copyAndConfigureFiles(job, submitJobDir);
rUploader.uploadFiles(job, jobSubmitDir);
// 4) Computed slice , Generate slice plan file 
writeSplits(job, submitJobDir);
maps = writeNewSplits(job, jobSubmitDir);
// 5) towards Stag Path write XML The configuration file 
writeConf(conf, submitJobFile);
// 6) Submit Job, Return to submission status 
status = submitClient.submitJob(jobId, submitJobDir.toString(), job.getCredentials());

2、 Main items in the process of submitting tasks

  1. stay connect In the method , Mainly through cluster Object provides an entry access mr The way of clustering . Get into Cluster in , Enter again initialize(jobTrackAddr,conf) Contained in the initProviderList();ProviderList There is YarnClient and LocalClient; adopt for Loop traversal initProviderList(), And verify the parameters .

    Through parameters mapreduce.framework.name To determine what environment runs
    If the value is yarn That's it yarn Environmental Science
    If the value is local That's it local Environmental Science

  2. Get the submitter through the current environment ,

    • Verify that the output path exists ;

    • Provide a staging Temporary directory ; produce jobID; Ready to create staging Temporary directory +jobID route

    • stay staging Temporary directory +jobID Upload in the temporary directory of Job.xml The configuration file 、 Slice information 、(jar package –yarn Pattern )

      Cluster pattern : Submit jar package

      Local mode : No submission jar package

