Research on high throughput real-time transaction data warehouse scheme

Goose 2022-06-24 05:47:29 阅读数:946


0. background

demand :

  • Support transactions
  • Large throughput , Real time statistical query , The statistical granularity is about 10 Minutes or less
  • There are multiple data sources , All data sources need to be aggregated into a wide table before statistical query
  • Query results may be concentrated at the same time
  • Black box as much as possible , The business has no perception

1. Total warehouse solution

Tencent cloud data warehouse PostgreSql TDSQL,PingCAP Of TiDB, Ali's OceanBase, Hua Wei Yun DWS, All are HTAP Commonly used in the industry , It can solve the needs in a one-stop way .

1.1 Yunshucang PostgreSQL

Cloud data warehouse PostgreSQL(Cloud Data Warehouse PostgreSQL)( primary Snova Data warehouse ) Provide you with simple 、 Fast 、 Cost effective PB Class cloud data warehouse solution . Cloud data warehouse compatible Greenplum Open source data warehouse , It's based on MPP( Large scale parallel processing ) Data warehouse service of the architecture . With the help of this product , You can use a wealth of PostgreSQL Open source ecological tools , Realize ad hoc query and analysis of massive data in cloud data warehouse 、ETL Processing and visualization exploration , Benchmarking Huawei cloud DWS;

1.1.1 Data access

Data access can use DataX The tool uses other data sources such as MySQL、SQL Server、Oracle、PostgreSQL、HDFS、Hive、HBase Data incremental update or offline update to cloud data warehouse PGSQL. And support the adoption of SQL Import the required data into the cloud data warehouse PGSQL. If there are multiple data sources, multiple data sources can be configured DataX Task data access .


data source

Reader( read )

Writer( Write )


RDBMS Relational database


read  、 Write


read  、 Write


read  、 Write


read  、 Write

Universal RDBMS( Support all relational databases )

read  、 Write

NoSQL data storage


read  、 Write


read  、 Write


read  、 Write


read  、 Write

Unstructured data storage


read  、 Write


read  、 Write


read  、 Write



Or use COS Data is loaded and queried as a surface , Long storage time , Cold data with less query requirements can also be migrated to COS To reduce costs .

load COS data

Cold data migration

1.1.2 ETL

Data aggregation in the data warehouse ETL Management can use open source components AirFlow perhaps azkaban, After installation and configuration, you can perform real-time processing on multiple tables in the warehouse / Offline aggregation , Here we use AirFlow For example .

1.2 TiDB

TiDB yes PingCAP The company designed independently 、 Research and development of open source distributed relational database , It supports both online transaction processing and online analysis processing (Hybrid Transactional and Analytical Processing, HTAP) Integrated distributed database products , With horizontal expansion or reduction 、 Financial high availability 、 real time HTAP、 Cloud native distributed database 、 compatible MySQL 5.7 The protocol and MySQL Ecological and other important characteristics . The goal is to provide users with one-stop OLTP (Online Transactional Processing)、OLAP (Online Analytical Processing)、HTAP Solution .TiDB For high availability 、 Strong consensus requires higher requirements 、 Large data scale and other application scenarios .

Data migration :

2. Open source big data component composition scheme

2.1 Introduction to common frameworks

Commonly used OLAP MPP Advantages and disadvantages of the framework

Common combination schemes in the industry Hbase+Phoenix 、Kudu+impala、 clickhouse The comparison is as follows

2.1 HBASE+Phoenix

HBASE Excellent performance in real-time mass query and writing , In the introduction of Phoenix Post query is much more convenient , It can also solve some problems rowkey Design problems . However, the later operation and maintenance costs may be higher .

Business aggregation processing : Easy to use Phoenix Write SQL Go straight ahead , Supports aggregation across multiple tables , Complex aggregation operations can use spark To deal with ;

transactional :HBASE Data modification is supported ;

Expansion and operation and maintenance :EMR Support one click expansion , Operation and maintenance can be provided ;

2.2 Kudu+impala

Kudu and Impala It's all Cloudera Contribute to Apache The foundation's top projects .Kudu As underlying storage , Support high concurrency and low latency kv At the same time , Also maintain good Scan performance , This feature makes it theoretically possible to At the same time OLTP Classes and OLAP Class query .Impala As an old card SQL Parsing engine , It faces ad hoc queries (Ad-Hoc Query) The stability and speed of class requests have been widely verified in the industry ,Impala No own storage engine , It is responsible for parsing SQL, And connect to its underlying storage engine . At the beginning of the release Impala Main support HDFS,Kudu After the release ,Impala and Kudu It has also done deep integration .

In many big data frameworks ,Impala Similar location Hive, however Impala Pay more attention to ad hoc query SQL The fast analysis of the , For those that take too long to execute SQL, It's still Hive More appropriate .

about GroupBy etc. SQL Inquire about ,Impala It is a memory calculation , thus Impala Yes The machine configuration requirements are high , Official recommended memory 128G above , Such questions Hive The bottom layer corresponds to the traditional MapReduce Computing framework , Although the implementation efficiency is low , But stability is good , The requirements for machine configuration are also low .

The efficiency of execution is Impala Biggest advantage , For storage in HDFS Data in ,Impala The parsing speed of is much faster than Hive, With Kudu After the addition , It's even more powerful , The execution speed of some queries can be up to 100 times .

2.2.1 contrast

The difference in Hbase Wait for the storage engine ,Kudu It has the following advantages :

  • fast OLAP Class query processing speed
  • And MapReduce、Spark etc. Hadoop Common ecosystem systems are highly compatible , Its connection drive is maintained by official support
  • And Impala Deep integration , comparison HDFS+Parquet+Impala The traditional architecture of ,Kudu+Impala Better performance in most scenarios .
  • A powerful and flexible consistency model , Allow the user to define a consistency model separately for each request , Even strong sequence consistency .
  • Can support at the same time OLTP and OLAP request , And has good performance .
  • Kudu Integrated into the ClouderaManager In , Friendly to operation and maintenance .
  • High availability . use Raft Consensus Algorithm as master Post failure election model , Even if the election fails , The data is still readable .
  • Support structured data , Pure columnar storage , While saving space , Provide more efficient query speed .
  • Kudu Is a pure columnar storage engine , comparison Hbase Just store data by column ,Kudu The columnar storage of is closer to Parquet, In support of more efficient Scan At the same time of operation , Also uses less storage space .

Columnar storage has such advantages , Mainly because of two points :

  1. In the usual sense OLAP Queries access only part of the column data , The column storage engine supports on-demand access in this case , And all the data in one row .
  2. Generally speaking, data put together by column will have a higher compression ratio , This is because data with the same columns tend to have higher similarity

Kudu and Hbase There are two essential differences

  • Kudu Our data model is more like a traditional relational database ,Hbase It's complete no-sql Design , Everything is a byte .
  • Kudu The disk storage model of is the real columnar storage ,Kudu Storage structure design and Hbase Difference is very big . In general , Pure OLTP The request is suitable for Hbase,OLTP And OLAP The combined request is suitable for Kudu.

2.2.2 transactional

Kudu It can guarantee the atomicity of single line operation

Kudu Multi row transaction operation is not supported , Rollback of transactions is not supported

2.2.3 ETL

Aggregate in multiple tables ETL You can use impala view Create temporary tables for different data sources , Then the real-time and offline tasks are used to load the wide tables aggregated by different data sources , For business parties to perform different aggregate query statistics .

2.3 Apache Cassandra

Just look at the performance ,Cassandra It's still very powerful , But it is different from other databases ,Cassandra Is a kind of ownerless , The converse is Cassandra It is a multi - Master . Multi master means that multiple nodes can operate , Not all are forwarded to one node . It is easy to lock on a node , Just lock a row , Just keep all requests in serial . So there is actually a conflict for independent line writing , stay Cassandra The solution to the conflict is very violent , Namely last write win ( The last writer wins ), Resulting in Cassandra It is not suitable for reading before writing . For this scenario ,Cassandra It is recommended to use cas The grammar of , but cas Poor performance , Therefore use cassandra Avoid many conflicting scenarios . What is a lot of conflict ? For example, multiple mobile phone users update a piece of data at the same time , Is a strong conflict .

3. Flink programme

There is a bottleneck in the data warehouse write performance , In case of high writing pressure or high real-time requirements , Real time computing framework can be introduced for real-time aggregation , Reduce the downstream calculated pressure .

3.1 Upstream and downstream compatible

See also Oceanus, be based on Flink, It can support traditional relational databases racle、 cache Redis、 perhaps Hadoop ecology HBASE、hdfs、 Message queue Kafka as well as clickhouse、ES、hippo、HTTP And so on source And sink surface .

It should be noted that ,source surface 、sink The table does not mean that in oceanus A real physical table similar to a database is really created in , actually source surface 、sink All tables are logical tables , It just maps the configuration items filled in by the business to the real data source 、 Destination .

3.2 Throughput capacity

According to the US regiment Flink And Storm Comparative data (spark streaming In seconds storm And flink In milliseconds )

  • Storm Single thread throughput is about 8.7 Ten thousand / second ,Flink Single thread throughput up to 35 Ten thousand / second .Flink Throughput is about Storm Of 3-5 times .
  • Storm QPS Delay near throughput ( contain Kafka Reading and writing time ) The median is about 100 millisecond ,99 Line about 700 millisecond ,Flink The median is about 50 millisecond ,99 Line about 300 millisecond .Flink The delay at full throughput is about Storm Half of , And with QPS Gradually increase ,Flink The advantage of delay begins to show .
  • To sum up, we can get ,Flink The performance of the frame itself is better than Storm.

3.3 Aggregate processing

Flink Can be created by view Temporary table , Realize the aggregation of multiple business tables , And the results will not be stored , And can aggregate on demand . Business can be written on demand SQL The query view, And No need to write spark Program , You don't need to use it every time spark stay hive Create a wide table and then query , The process will be much simpler .

If there is complex operation support UDF.

3.4 transactional

Some transactions can use Flink Time window to resolve , For example, when counting the number of orders, you can use the time window or . Traditional database ACID Currently not supported .

flink Two building blocks are provided to implement transactional sink The connector :write-ahead-log(WAL, Pre written logs )sink And two-stage commit sink.WAL type sink All calculations will be written to the state of the application , After receiving the notice of completion of the checkpoint , Will send the calculation results to sink System . because sink The operation will cache the data in the post state segment , therefore WAL Can be used on any external sink On the system . For all that ,WAL It still can't provide a guarantee of just processing semantics once , In addition, the problem of the state size of the segment after the state due to caching data ,WAL The model is not perfect . In contrast to it ,2PC sink need sink The system provides transaction support or modules that can simulate transaction characteristics . For each checkpoint ,sink Start a transaction , Then add all the received data to the transaction , And write the data to sink System , But did not submit (commit) they . When a transaction receives notification of checkpoint completion , The transaction will be commit, The data will be actually written sink System . This mechanism mainly depends on one time sink You can start a transaction before the checkpoint completes , And after the application recovers from a failure commit The ability of .2PC The agreement depends on Flink The checkpoint mechanism of . A checkpoint barrier is a notification to start a new transaction , The notification of the success of all operators' own checkpoints is that they can commit Vote for , The message that the job manager notifies a checkpoint of success is commit Transaction instructions . On WAL sink In contrast ,2PC sinks Depend on sink Systems and sink Its own implementation can handle semantics just once . added ,2PC sink Keep writing data to sink In the system , and WAL Writing a model will have the problems described above .

3.5 attach : Comparison of other flow calculation frameworks

The comparison table of each real-time computing engine is as follows

project / engine





Flexible bottom layer API And with business guarantees Trident API

flow API And more suitable for data development Table API and Flink SQL Support

flow API and Structured-Streaming API At the same time, it can also use a more suitable data development Spark SQL

Fault tolerance mechanism

ACK Mechanism

State Distributed snapshot savepoints

RDD Save it

State management

Trident State State management

Key State and Operator State Two kinds of State have access to , Support multiple persistence schemes

Yes UpdateStateByKey etc. API Make changes with status , Support multiple persistence schemes

Processing mode

Single stream processing

Single stream processing

Mic batch Handle




Second level

Semantic protection

At Least Once,Exactly Once

Exactly Once,At Least Once

At Least Once

Compared with Storm And other stream computing frameworks ,Flink It has the following advantages :

  • More friendly programming interface .Storm Provided API It's too low and too simple , Users need a lot of development work to complete the business requirements . in addition , Users are developing Storm The learning cost of programming is also high , Need to be familiar with framework principles and implementation details in a distributed environment .Flink In addition to providing Table API and SQL Beyond these high-level declarative programming languages , Also on the window These common operators in stream computing are encapsulated , Help users to deal with the problem of data out of order in stream computing , It greatly reduces the development cost of stream computing application and reduces unnecessary repeated development .
  • Effective state management support . Most computing programs are stateful , That is, the result of calculation is not only determined by the input , It also depends on the current state of the computing program . but Storm Support for program state is very limited . In general , Users often need to save state data in MySQL and HBase In such external storage , They are responsible for accessing these state data . These accesses to external storage often become Storm Program performance bottlenecks . Most of the time , Users can only design complex local cache To improve performance .Spark Streaming Only recently has limited state management support been provided , However, its implementation mechanism requires certain remote access and data migration , Therefore, the access efficiency of state data is not high .Flink It provides effective support for the state storage of computing program . Users can easily store and access program state through the interface provided . Because these state data are stored locally , Therefore, users can get higher access performance . In case of failure ,Flink The state management of the system will cooperate with the fault tolerance mechanism to reconstruct the state data , Ensure the correctness of the user program . And when the user needs to modify the concurrency of the program ,Flink State data can also be automatically distributed to new computing nodes .
  • Rich fault tolerance semantics . because Storm Lack of effective support for program state , Its support for fault tolerance is also weak , It's hard to guarantee in case of failure , Each input data is processed exactly once . and Flink It relies on the classic Chandy-Lamport Algorithm , It can generate a consistent snapshot of user program input and state . In case of exception, rollback through snapshot ,Flink Can guarantee EXACTLY-ONCE Fault tolerant semantics of . And with asynchronous checkpoint And incremental checkpoint technology ,Flink The ability to take a snapshot of user programs at a lower cost . When you start a snapshot , The performance of the user program is almost unaffected .
  • Excellent performance .Flink Data stream is processed based on event triggered execution pattern , Compared with Spark Streaming take mini batch Execution mode of , It can greatly reduce the scheduling cost of program execution . Besides ,Flink A lot of optimization has been done on the network layer , Improve data transfer performance through fine-grained blocking and efficient memory access , And through the back pressure mechanism and flow control to effectively reduce the performance degradation caused by traffic congestion . add Flink It can avoid remote access to state data ,Flink In practice, it shows better performance than other stream computing systems , It has lower processing delay and higher throughput .

summary :Flink and Spark Streaming Of API 、 Fault tolerance mechanism and state persistence mechanism can solve some problems Storm Problems encountered in . but Flink On data latency and Storm Closer to the , So the industry tends to use Flink Do real-time computing and aggregation .

4. Ref

  1. sound of dripping water
  2. Meituan flink
  3. flink+clickhouse
  4. Tencent cloud oceanus
  5. flink And storm contrast
  6. hive transactional
  7. flink Business
  8. Tencent cloud clickhouse
  9. hbase kudu ch contrast
  10. HTAP
  11. Apache Cassandra
  12. kudu impala
  13. OceanBase Realization
copyright:author[Goose],Please bring the original link to reprint, thank you.