Storm source code analysis

null_ wry 2022-02-13 07:59:25 阅读数:19

storm source code analysis

[email protected]

Storm Source code analysis ( One ) review

[email protected]

One 、storm Introduce

Storm It's an open source , The distribution of the , reliable , Fault tolerant data stream processing system .Storm There are a lot of usage scenarios : Such as real-time analysis , Online machine learning , Keep counting , Distributed RPC,ETL wait .Storm Support horizontal scaling , High fault tolerance , Make sure every message is processed , And it's very fast ( In a small cluster , Each node can process millions of messages per second ).Storm The deployment and operation and maintenance are very convenient , And more importantly, you can use any programming language to develop applications .

Storm The input stream of a cluster is called spout Component management ,spout Transfer data to bolt, bolt Or save the data to some kind of memory , Or pass the data on to something else bolt. One Storm Clusters are a series of bolt Between spout The data that came in .

 Insert picture description here

Two 、storm Components

stay Storm In the cluster , There are two types of nodes : Master node master node And work nodes worker nodes.

Master node operation Nimbus Daemon , This daemon is responsible for distributing code in the cluster , Assign tasks to work nodes , And monitor the fault .Supervisor The daemon runs on the work node as part of the topology . One Storm The topology runs many work nodes on different machines . Every work node is topology Implementation of a subset in . and Nimbus and Supervisor The coordination between them is through Zookeeper System or cluster .

Topology:Storm The name of a real-time application running in .( Topology )
Spout: In a topology Get the component of the source data stream in . Usually spout Will read data from an external data source , And then convert to topology Internal source data . See Spout Sketch Map
Bolt: A component that accepts data and then performs processing , Users can do what they want . See Bolt Sketch Map
Tuple: The lowest unit of message sending , It's a Tuple object , The object has a List
Stream: Continuously Tuple Make up the Stream( Represents the flow of data )


Zookeeper It's done Supervisor and Nimbus Coordinated services between . The real-time logic of the application is encapsulated in Storm Medium “topology”.topology It's a group of Spouts( data source ) and Bolts( Data manipulation ) adopt Stream Groupings Figure for connection .


Spout Read the data from the source and put it into topology.Spout Divided into reliable and unreliable ; When Storm When receiving fails , reliable Spout Would be right tuple( Tuples , A list of data items ) To resend ; And unreliable Spout It will not consider whether the reception is successful or not, and only transmit once .


Topology All the processing in is done by Bolt complete .Bolt from Spout Receive and process data in , If you encounter complex flow processing, you may also tuple Send to another Bolt To deal with .

Stream Groupings

Stream Grouping Defines a flow in Bolt How to be segmented in the task .

  1. Shuffle grouping: Random distribution tuple To Bolt The task of , Ensure that each task gets an equal number of tuple.

  2. Fields grouping: Split the data flow according to the specified field , And in groups . for example , according to “user-id” Field , identical “user-id” Tuples of are always distributed to the same task , Different “user-id” Tuples of may be distributed to different tasks .

  3. Partial Key grouping: Split the data flow according to the specified field , And in groups . similar Fields grouping.

  4. All grouping:tuple Be copied to bolt All the tasks of . This type needs to be used with caution .

  5. Global grouping: All streams are assigned to bolt The same task . To be clear , Is assigned to ID The youngest one task.

  6. None grouping: You don't have to worry about how flows are grouped . at present , No grouping is equivalent to random grouping . But in the end ,Storm Will put the non grouped Bolts Put it in Bolts or Spouts The same thread that subscribes to them executes ( If possible ).

  7. Direct grouping: This is a special grouping type . The producer decides tuple Which tuple handler task receives .

  8. Local or shuffle grouping: If target bolt There are one or more tasks in the same work process ,tuples Will disrupt these tasks in the process . otherwise , It's like a normal Shuffle grouping.

3、 ... and 、 Task assignment

According to the above description storm Understanding , We decided to pass storm To divide tasks into important components , Conduct relevant code analysis . The core code is mainly storm-core、storm-client、storm-server In waiting bag . I am mainly responsible for zookeeper、worker Code analysis of , Let's work together stream groupings The correlation analysis of .

Reference link :Apache Storm brief introduction

copyright:author[null_ wry],Please bring the original link to reprint, thank you.