null_ wry 2022-02-13 07:59:25 阅读数:19
Storm It's an open source , The distribution of the , reliable , Fault tolerant data stream processing system .Storm There are a lot of usage scenarios ： Such as real-time analysis , Online machine learning , Keep counting , Distributed RPC,ETL wait .Storm Support horizontal scaling , High fault tolerance , Make sure every message is processed , And it's very fast （ In a small cluster , Each node can process millions of messages per second ）.Storm The deployment and operation and maintenance are very convenient , And more importantly, you can use any programming language to develop applications .
Storm The input stream of a cluster is called spout Component management ,spout Transfer data to bolt, bolt Or save the data to some kind of memory , Or pass the data on to something else bolt. One Storm Clusters are a series of bolt Between spout The data that came in .
stay Storm In the cluster , There are two types of nodes ： Master node master node And work nodes worker nodes.
Master node operation Nimbus Daemon , This daemon is responsible for distributing code in the cluster , Assign tasks to work nodes , And monitor the fault .Supervisor The daemon runs on the work node as part of the topology . One Storm The topology runs many work nodes on different machines . Every work node is topology Implementation of a subset in . and Nimbus and Supervisor The coordination between them is through Zookeeper System or cluster .
Topology：Storm The name of a real-time application running in .（ Topology ）
Spout： In a topology Get the component of the source data stream in . Usually spout Will read data from an external data source , And then convert to topology Internal source data . See Spout Sketch Map
Bolt： A component that accepts data and then performs processing , Users can do what they want . See Bolt Sketch Map
Tuple： The lowest unit of message sending , It's a Tuple object , The object has a List
Stream： Continuously Tuple Make up the Stream（ Represents the flow of data ）
Zookeeper It's done Supervisor and Nimbus Coordinated services between . The real-time logic of the application is encapsulated in Storm Medium “topology”.topology It's a group of Spouts（ data source ） and Bolts（ Data manipulation ） adopt Stream Groupings Figure for connection .
Spout Read the data from the source and put it into topology.Spout Divided into reliable and unreliable ; When Storm When receiving fails , reliable Spout Would be right tuple（ Tuples , A list of data items ） To resend ; And unreliable Spout It will not consider whether the reception is successful or not, and only transmit once .
Topology All the processing in is done by Bolt complete .Bolt from Spout Receive and process data in , If you encounter complex flow processing, you may also tuple Send to another Bolt To deal with .
Stream Grouping Defines a flow in Bolt How to be segmented in the task .
Shuffle grouping： Random distribution tuple To Bolt The task of , Ensure that each task gets an equal number of tuple.
Fields grouping： Split the data flow according to the specified field , And in groups . for example , according to “user-id” Field , identical “user-id” Tuples of are always distributed to the same task , Different “user-id” Tuples of may be distributed to different tasks .
Partial Key grouping： Split the data flow according to the specified field , And in groups . similar Fields grouping.
All grouping：tuple Be copied to bolt All the tasks of . This type needs to be used with caution .
Global grouping： All streams are assigned to bolt The same task . To be clear , Is assigned to ID The youngest one task.
None grouping： You don't have to worry about how flows are grouped . at present , No grouping is equivalent to random grouping . But in the end ,Storm Will put the non grouped Bolts Put it in Bolts or Spouts The same thread that subscribes to them executes （ If possible ）.
Direct grouping： This is a special grouping type . The producer decides tuple Which tuple handler task receives .
Local or shuffle grouping： If target bolt There are one or more tasks in the same work process ,tuples Will disrupt these tasks in the process . otherwise , It's like a normal Shuffle grouping.
According to the above description storm Understanding , We decided to pass storm To divide tasks into important components , Conduct relevant code analysis . The core code is mainly storm-core、storm-client、storm-server In waiting bag . I am mainly responsible for zookeeper、worker Code analysis of , Let's work together stream groupings The correlation analysis of .
Reference link ：Apache Storm brief introduction https://blog.csdn.net/C_FuL/article/details/78497237
copyright：author[null_ wry]，Please bring the original link to reprint, thank you. https://en.javamana.com/2022/02/202202130759239814.html