Python & Spark Study Notes Series [Chapter 3] Example: Python+Spark+Hbase

Actually I'm real 2022-09-23 10:17:55 阅读数:876


Here my original data is in Hbase. The following will introduce two ways to achieve the effect of operating Hbase through Spark.

The first type: Spark directly connected to HBase

The code is as follows, but I encountered a problem here, that is, the used in the code below, I used IDEA to run locally and prompted that the class could not be found, and the error was:



This class is under hbase-common-1.0.0.jar, and it should be available in other versions. I have used various methods to import this jar package without success, and the error remains unchanged, so I actually did not have this example.Done, if there is a solution, I hope you can leave a message for me.

from pyspark import SparkContext, SparkConfimport osos.environ['JAVA_HOME'] = 'D:\Java\jdk1.8.0_92'conf = SparkConf().setMaster("local").setAppName("spark_hbase_test")sc = SparkContext(conf=conf)host = ',,'table = '2:IndexMessage'conf = {"hbase.zookeeper.quorum": host, "hbase.mapreduce.inputtable": table}keyConv = "org.apache.spark.examples.pythonconverters.ImmutableBytesWritableToStringConverter"valueConv = "org.apache.spark.examples.pythonconverters.HBaseResultToStringConverter"hbase_rdd = sc.newAPIHadoopRDD("org.apache.hadoop.hbase.mapreduce.TableInputFormat","","org.apache.hadoop.hbase.client.Result", keyConverter=keyConv, valueConverter=valueConv,conf=conf)count = hbase_rddprint(count)

The second type: After creating a Hive table and establishing a mapping relationship with HBase, using SparkSQL to access Hive can also achieve the purpose of operating HBase

Refer to my previous post for how to establish the corresponding relationship between Hive and Hbase

To be continued

copyright:author[Actually I'm real],Please bring the original link to reprint, thank you.