Spark3. 2 tutorial (V) spark word frequency statistics of java development under idea

Miss Zhu 2022-02-13 08:37:05 阅读数:684

spark3. spark tutorial spark word

         In the last article , Used Scala Word frequency statistics is developed , In this article we use Java Develop the same word frequency statistics , To compare the differences between the two languages .
         Data still exists under a drive letter , See code for details , The content is :

apple orange pear
banana lemon apple
pear peach orange

One 、 Create a familiar Maven Java Module:
 Insert picture description here
Two 、 modify pom.xml
1, modify JDK by 1.8

<maven.compiler.source>1.8</maven.compiler.source>

2, add to Spark rely on :

<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.13</artifactId>
<version>3.2.0</version>
</dependency>

3, Development Java Code , See the notes for details :

package com.alan;
import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.*;
import scala.Tuple2;
import java.util.*;
import java.util.regex.Pattern;
public class Test1 {

private static final Pattern SPACE = Pattern.compile(" ");
public static void main(String[] args) {

SparkConf conf = new SparkConf().setAppName("WordCount").setMaster("local");
JavaSparkContext sparkContext = new JavaSparkContext(conf);
// Reading documents 
JavaRDD<String> lines = sparkContext.textFile("d://test/words.txt").cache();
// Read row , And put the words together 
JavaRDD<String> words = lines.flatMap(new FlatMapFunction<String, String>() {

@Override
public Iterator<String> call(String s) {

return Arrays.asList(SPACE.split(s)).iterator();
}
});
//map
JavaPairRDD<String, Integer> wordsOnes = words.mapToPair(new PairFunction<String, String, Integer>() {

@Override
public Tuple2<String, Integer> call(String s) {

return new Tuple2<String, Integer>(s, 1);
}
});
//reduce
JavaPairRDD<String, Integer> wordsCounts = wordsOnes.reduceByKey(new Function2<Integer, Integer, Integer>() {

@Override
public Integer call(Integer value, Integer toValue) {

return value + toValue;
}
});
// Console printing 
wordsCounts.foreach(new VoidFunction<Tuple2<String, Integer>>() {

@Override
public void call(Tuple2<String, Integer> tuple) throws Exception {

System.out.println( tuple._1()+" "+tuple._2());
}
});
// wordsCounts.saveAsTextFile("d:/test/spark_word");
}
}

         It can be seen from the above that there are many anonymous inner classes , Pseudo functional programming style and Scala Identifier syntax for , After all Spark By Scala Developed , although Scala Finally, it is compiled into Java Class, But in Spark The write Java, The feeling is to apply Scala.
         If you are proficient in Lambda, Rewrite it , Less code .

copyright:author[Miss Zhu],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130837028213.html