Hive architecture

JOEL-T99 2022-01-26 16:14:58 阅读数:694

hive architecture

Hive by C/S Pattern , Its architecture is as follows :



Hive Data used in HDFS in ,Hive Of HQL It will turn into MR、Tez or Spark after , stay Hadoop Running on a cluster .

Hive Three modes of operation : Embedded mode 、 Local mode 、 Remote mode .

Embedded mode (Local/Embedded Metastore Database(Derby)): This mode is generally used to practice and test ,Hive At run time, a... Will be generated in the deployment directory Derby Documents and a metastore_db Catalog .

Local mode (Local/Embedded Metastore Server): Use MySQL The database stores metadata . In this mode , Whenever you use bin/hive perhaps hiveserver2 When , Will start a... Internally metastore Embedded services .

If there are too many clients , Each client initiates its own connection , Would be right. mysql Cause more pressure .

Remote mode (Remote Metastore Server): Use MySQL The database stores metadata . The mode is to metastore Service from Hive Services are stripped out and deployed , Give Way metastore Service and Hive Services run in different processes , This further decouples the architecture , To ensure the Hive The stability of , Improved service efficiency .

Support multiple clients to connect at the same time , And the client doesn't need to know MySQL Username and password , Just connect metastore The service can be , It provides better management and security .


Parser (SQL Parser): take SQL String to abstract syntax tree AST, This step is usually completed with a third-party tool library , such as antlr; Yes AST Grammatical analysis , For example, whether a table exists , Whether the field exists 、SQL Is there any semantic error .

compiler (Physical Plan): take AST Compile build logic execution plan .

Optimizer (Query Optimizer): Optimize the logical execution plan .

actuator (Execution): Transform the logical execution plan into a physical plan that can run


Realize to Hive The user interfaces accessed include :CLI、JDBC/ODBC、HWI、Thrift etc. .

CLI(Command Line Interface): Command line interface .CLI Startup time , It will start one at the same time Hive copy .

JDBC/ODBC: Use Java Mode of access Hive.

HWI(Hive Web Interface): Access... Through a browser Hive.

Thrift:Facebook An acid-base framework developed ,Hive Inherited the service (Hiveserver/HiveServer2)

Hiveserver/HiveServer2 The difference between :

Are allowed without starting CLI Under the circumstances , Access via remote client Hive,Hiveserver Only a single client is supported , stay Hive-0.11.0 The code of this module has been rewritten in the version, and Hiveserver2,Hiveserver2 Support multi client , For open clients (JDBC、ODBC…) Provides better support .


Metadata (Metastore): Contains the table name 、 The database to which the table belongs ( The default is default)、 The owner of the watch 、 Column / Partition field 、 The type of watch ( Inside / External table )、 Table data directory, etc .

Metadata is stored in by default derby In the database , But it's usually used MySQL Database storage !

Hive Operating mechanism

1️⃣: Client submit HQL The program is sent to Driver( Any database driver , Such as JDBC、ODBC) In the implementation of ;

2️⃣:Driver according to HQL analysis Query sentence , Validation Syntax ;

3️⃣: The compiler sends a request for metadata to Metastore;

4️⃣:Metastore Send the required metadata to the compiler as a response ;

5️⃣: Compiler check requirements , And resend the plan to Driver;

6️⃣:Driver Send the execution plan to the execution engine ;

7️⃣: The execution engine sends the job to JobTracker,NamaNode Assign jobs to TaskTracker,DataNode perform MapReduce operation , At the same time of execution , The execution engine passes Matastore Perform metadata operations

8️⃣: The execution engine receives DataNode Result

9️⃣: The execution engine sends the results to Driver

1️⃣0️⃣ :Driver Send results to Hive port


️ END ️
copyright:author[JOEL-T99],Please bring the original link to reprint, thank you.