Ten thousand words long article, take you in-depth understanding of the Java virtual machine!

Java program ape 2022-06-24 07:17:25 阅读数:825

thousandwordslongarticlein-depth

Preface

according to 《Java Virtual machine specification 》 The provisions of the ,Class The file format is similar to C The pseudo structure of a language structure to store data , There are only two types of data in this pseudo structure :“ An unsigned number ” and “ surface ”.

Bytecode instruction

Java The bytecode instruction is Java Virtual machines can understand 、 Executable instructions , Can be said to be Jvm Level assembly language , Or rather, Java The smallest execution unit of code .

javac Command will Java The source file is compiled into a bytecode file , namely .class file , It contains a lot of bytecode instructions .

Java The virtual machine adopts an architecture oriented to the operand stack instead of register ( The execution process of these two architectures 、 The difference and influence will be in the second 8 Discussion in Chapter ), So most instructions don't contain operands , There is only one opcode , Instruction parameters are stored in the stack of operands .

jvm Tuning actual learning note address :JVM tuning 400 Multi page learning notes

Bytecode instruction classification :

Store and load class instructions : It mainly includes load A series of instructions 、store A series of instructions and ldc、push A series of instructions , Mainly used in the local variable table 、 Data scheduling between the stack of operands and the pool of constants ;( There is no special explanation about constant pool , This is very simple , seeing the name of a thing one thinks of its function , It's this pool with all kinds of constants , It's like a set of props )

Object operation instructions ( Create and read write access ): Like what we just did putfield and getfield It belongs to the instruction of read-write access , Besides, there are putstatic/getstatic, also new A series of instructions , as well as instanceof Such as instruction .

Operand stack management instructions : Such as pop and dup, They only operate on the stack of operands .

Type conversion instructions and operation instructions : Such as add/div/l2i And so on , In fact, these instructions only operate on the stack of operands .

Control jump commands : This category contains the commonly used if A series of instructions and goto Class instructions .

Method calls and return instructions : It mainly includes invoke A series of instructions and return A series of instructions . This kind of instruction also means the opening and ending of this method space , namely invoke Will wake up a new java Methods the universe ( New stack and local variable table ), and return It means the end of the universe .

Public design , Private implementation

There are two ways to realize virtual machine :

· Will input Java Virtual machine code is translated into another virtual machine instruction set when it is loaded or executed ;

· Will input Java The virtual machine code is translated into the local instruction set of the host processor when it is loaded or executed ( Just in time compiler code generation technology ).

Precisely defined virtual machine behavior and target file format , There should not be too many restrictions on the creativity of virtual machine implementers ,Java Virtual machines are designed to allow many different implementations , And various implementations can provide different new 、 Interesting solutions .

class Changes in documents

Class The platform neutrality of the file format ( Independent of specific hardware and operating system )、 compact 、 Stable and scalable features , yes Java The technology architecture has nothing to do with the platform 、 Language independence is an important pillar of two characteristics .

Class File is Java The data entry of the virtual machine execution engine , It's also Java One of the fundamental pillars of the technology system .

Virtual machine class loading mechanism

Java The virtual machine takes the data describing the class from Class File loaded into memory , And verify the data 、 Transform resolution and initialization , Finally, it can be directly used by virtual machine Java type , This process is called the class loading mechanism of virtual machine .

The life cycle of a class

java Compile time doesn't need connections like other languages , Type of load 、 Both the connection and initialization process are completed during the program run . Write an interface oriented application program , You can wait until the runtime to specify its actual implementation class , The user can go through Java Preset or custom class loaders , Let a local application load a binary stream from the network or elsewhere at run time as part of its program code . Load at run time Widely used in Java In the process .

《Java Virtual machine specification 》 It is strictly stipulated that there are only six cases in which the class must be immediately “ initialization ”( And load 、 verification 、 Preparation naturally needs to start before ):

3) When initializing a class , If it is found that its parent class has not been initialized , You need to trigger the initialization of its parent class first .

4) When the virtual machine starts , The user needs to specify a main class to execute ( contain main() Method type ), Virtual opportunity initializes the main class first .

Interfaces and classes What's really different is the six “ Yes and no ” You need to trigger the third type of initialization scenario : When a class is initializing , Requires that all of its parent classes have been initialized , But when an interface is initialized , It is not required that all of its parent interfaces are initialized , Only when the parent interface is actually used ( Such as constant defined in reference interface ) Will be initialized .

Class loading process

load

1) Get the binary byte stream that defines this class by using the fully qualified name of a class .

2) Convert the static storage structure represented by the byte stream into the runtime data structure of the method area .

3) Generate a representation of this class in memory java.lang.Class object , As the access to all kinds of data of this class in method area .

Gets the form of the binary byte stream of the class

· from ZIP Read in compressed package , It's very common , Eventually become the future JAR、EAR、WAR The basis of format .

· Get it from the network , The most typical application of this scenario is Web Applet.· Runtime compute generation , The most commonly used scenario is dynamic proxy technology , stay java.lang.reflect.Proxy in , Is to use the ProxyGenerator.generateProxyClass() To generate the form for a particular interface “*$Proxy” The binary byte stream of the proxy class .

· Generated from other files , The typical scenario is JSP application , from JSP The corresponding file is generated Class file .· Read from database , This kind of scene is relatively rare , For example, some middleware servers ( Such as SAP Netweaver) You can choose to install the program into the database to complete the distribution of program code among clusters .

· It can be obtained from the encrypted file , This is a typical defense Class The protection of the file against decompilation , Decrypt on load Class File to ensure that the program running logic is not snooped .

verification

The verification phase will basically complete the following four phases of verification actions : File format validation 、 Metadata validation 、 Bytecode verification and symbol reference verification .

“ Downtime problem ”(Halting Problem) illustrations , That is, we can't check whether the program can finish running in a limited time through the program .

Get ready

Formally defined as a variable in a class ( That is, static variables , By static Decorated variable ) The stage of allocating memory and setting the initial value of class variables .

First of all, memory allocation at this time only includes class variables , Not instance variables , Instance variables will be assigned to the Java In the pile . Next is the initial value “ General situation ” Next is the zero value of the data type , Suppose a class variable is defined as :

public static int value=123;

That variable value The initial value after the preparation phase is 0 instead of 123, Because this is not the beginning of any Java Method , But the value The assignment is 123 Of putstatic An instruction is when a program is compiled , Stored in class constructor () Among methods , So the value The assignment is 123 The action of will not be executed until the initialization phase of the class . surface 7-1 Lists Java Zero value of all basic data types in .

analysis

Java The process of virtual machine replacing symbolic reference in constant pool with direct reference .

Symbol references are independent of the memory layout of the virtual machine implementation , The target of the reference is not necessarily the content that has been loaded into the memory of the virtual machine . A direct reference is a pointer that points directly to the target 、 Relative offset or a handle that can be indirectly located to the target . Direct reference is directly related to the memory layout of virtual machine implementation

1. Class or interface resolution

You need to determine whether the class is an array type

If we say one D Have C Of Access right , That means the following 3 At least one of the rules holds :

· The visited class C yes public Of , And with access classes D In the same module .

· The visited class C yes public Of , Not related to access class D In the same module , But the visited class C Modules of allow access to classes D Access to the module of .

· The visited class C No public Of , But it's related to access classes D In the same package .

2. Field analytical

First of all, we will check the field table class_index Indexed in the illustration item CONSTANT_Class_info Symbol reference To analyze , That is, the symbolic reference of the class or interface to which the field belongs ;

3. Method resolution

First, analyze the method table class_index The symbolic reference of the class or interface to which the indexed method belongs in the illustration item , If the parsing is successful , So we still use C Represents this class .

1) because Class Constant type definitions referenced by method symbols of class and interface in file format are separated , If you find in the method table of a class class_index In the index C If it's an interface , Then throw it directly java.lang.IncompatibleClassChangeError abnormal .

2) If the first step is passed , In the class C To find out if there is a method that both the simple name and descriptor match the target , If there is a direct reference to this method , Find the end .

3) otherwise , In the class C Whether there is a method whose simple name and descriptor match the target in the parent class of , If there is a direct reference to this method , Find the end .

4) otherwise , In the class C The list of implemented interfaces and their parent interfaces recursively look up whether there is a method with simple names and descriptors matching the target , If there is a matching method , Description class C Is an abstract class , This is the end of the search , Throw out java.lang.AbstractMethodError abnormal .

5) otherwise , Failed to declare method lookup , Throw out java.lang.NoSuchMethodError. Last , If the search process successfully returns a direct reference , This method will be verified for permissions , If you find that you do not have access to this method , Will throw out java.lang.IllegalAccessError abnormal .

4. Interface method parsing

Method parsing is similar to

JDK 9 The static private method of interface is added in , There are also modular access constraints , So from JDK 9 rise , Access to interface methods may also be caused by access control java.lang.IllegalAccessError abnormal .

initialization

The initialization phase is to execute the class constructor () Method process .

·() The compiler automatically collects the assignment actions and static statement blocks of all class variables in the class (static{} block ) The combination of the statements in , The order in which the compiler collects is determined by the order in which statements appear in the source file , Only variables defined before the static statement block can be accessed in the static statement block , Define the variable after it , In the previous static statement block, you can assign values , But there is no access .

() Constructor of methods and classes ( That is, the instance constructor in the virtual machine perspective () Method ) Different , It does not need to call the parent constructor explicitly ,Java Virtual Opportunities are guaranteed in subclass () Before method execution , Of the parent class () Method has been executed . So in Java The first one in the virtual machine to be executed () The type of method must be java.lang.Object.· Due to the () Method first , This means that the static statement block defined in the parent class takes precedence over the variable assignment operation of the child class .

Class loader

For any class , It must be established by the class loader that loads it and the class itself Java Uniqueness in virtual machines , Every classloader , All have a separate class namespace .

Compare two classes whether or not “ equal ”, It makes sense only if the two classes are loaded by the same classloader , otherwise , Even if the two classes come from the same Class file , Be the same Java Virtual machine loading , As long as the classloaders that load them are different , Then these two classes must not be equal .

In this case “ equal ”, Include... That represents the class Class Object's equals() Method 、isAssignableFrom() Method 、isInstance() method , It also includes the use of instanceof Keywords do object ownership judgment and other situations .

from java From a virtual machine perspective, there are two different class loaders :

One : Start class loader (Bootstrap ClassLoader) C++ Realization

Two : All other classloaders ( All inherit from abstract classes java.lang.ClassLoader) java Realization

From a developer's point of view :

Start class loader Bootstrap ClassLoader

effect : Be responsible for storing in \lib In the directory , Or be -Xbootclasspath In the path specified by the , And it's recognized by virtual machines ( Identify by filename only , Such as rt.jar, The name does not match the class library even if placed in lib The directory will not be loaded ) Load the class library into the virtual machine . The boot class loader cannot be java The program directly refers to , When users write custom classes to load , If you need to delegate the load request to the bootstrap loader , That's direct use null Instead of just ,

Extend the classloader Extension ClassLoader

Application Loader Application ClassLoader

This class loader is made by sun.misc.Launcher$AppClassLoader Realization . Because this class loader is ClassLoader Medium getSystemClassLoader() Return value of method , So it is also called system classloader . It is responsible for loading the user classpath (ClassPath) Class library specified on , Developers can use this classloader directly . If you don't customize your own classloader in your application , In general, this is the default classloader in the program .

Parent delegation model

Parent delegation model workflow :

When a class loader receives a request to load a class , It doesn't try to load the class itself , Instead, delegate the request to the parent loader to complete , This is true of class loaders at every level , So all requests should eventually be sent to the boot loader , Only when the parent loader feeds back that it cannot complete the load request ( Its search scope did not find the required class ) when , Sub loader will try to load by itself .

advantage :java Class has a hierarchical relationship with priority along with its classloader .

give an example : For example, we need to load java.lang.Object, It's stored in rt.jar in , No matter which class loader needs to load another class , Will be delegated to the boot loader at the top of the model for loading , therefore Object Class is the same class in all kinds of classloader environments of the program ( The above mentioned how to compare two classes to see if ' equal '). contrary , If there is no parent delegation model , If each class loader loads itself , Then there will be multiple Object class , It's causing chaos in the application .

Implementation of parent delegation model

The parental delegation model requires that in addition to the top-level boot loader , The rest of the class loaders should have their own parent class loaders . However, the parent-child relationship between class loaders is not inherited (Inheritance) The relationship to achieve , It's usually used Combine (Composition) Relation to reuse the code of the parent loader .

Disruption of the parental delegation model

JDK12 There's a parenting model , Facing the existing user-defined class loader code , In order to be compatible with the existing code , There is no longer a technical means to avoid loadClass() The possibility of being covered by subclasses , Only in JDK 1.2 After that java.lang.ClassLoader Add a new protected Method findClass(), And guide the user to rewrite this method as much as possible when loading the class logic , Not in loadClass() Code in .

It's caused by the defects of the model itself , If there is an underlying type, it will call back to the user's code .

Due to the user's pursuit of the dynamic nature of the program , What I'm talking about here “ dynamic ” It means something very “ heat ” A noun for a door : Code hot swap (Hot Swap)、 Module hot deployment (HotDeployment) etc.

OSGi The key to realize modular hot deployment is the implementation of its custom classloader mechanism , Every program module (OSGi called Bundle) All have their own classloader , When you need to replace one Bundle when , Just put Bundle Replace with the same kind of loader to realize the hot replacement of code . stay OSGi In the environment , Class loaders are no longer the tree structure recommended by the parental delegation model , But further developed into a more complex network structure

Java Modular system

stay JDK 9 Introduced in Java Modular system (Java Platform Module System,JPMS) It's right Java An important upgrade of Technology , In order to be able to achieve the key objectives of modularity —— Configurable encapsulation isolation mechanism ,Java The virtual machine has also made corresponding changes and adjustments to the class loading architecture , To make the modular system work smoothly .

·JAR File access rules in the classpath : All Classpaths JAR Documents and other resource documents , Are considered automatically packaged in an anonymous module (Unnamed Module) in , This anonymous module has almost no isolation , It can see and use all the packages on the classpath 、JDK All export packages in the system module , And packages exported from all modules on the module path .

· Access rules of modules in the module path : Named modules under the module path (Named Module) You can only access the modules and packages that are listed in the dependency definition , All content in the anonymous module is invisible to the named module , In other words, the named module can not see the tradition JAR The contents of the package .

·JAR File access rules in the module path : If you put a traditional 、 Does not contain module definitions JAR The file is placed in the module path , It becomes an automatic module (Automatic Module). Although it does not contain module-info.class, But automatic modules will default to all modules in the entire module path , So you can access the packages exported by all modules , The automatic module also exports all its own packages by default .

JDK9 in the future , Extend the classloader (Extension Class Loader) By platform class loader (Platform ClassLoader) replace .

When platform and application class loaders receive class load requests , Before delegating to the parent loader to load , To determine whether the class can belong to a system module , If you can find such a relationship , The loader responsible for that module should be first assigned to complete the loading , Maybe it's the fourth violation of parental delegation .

Virtual machine execution engine

Java Virtual machine takes method as the basic execution unit ,“ Stack frame ”(Stack Frame) Is used to support virtual machine method calls and methods behind the implementation of the data structure , It is also the virtual machine stack in the data area of the virtual machine runtime (VirtualMachine Stack) The illustrated stack elements .

The stack frame stores the local variable table of the method 、 The stack of operands 、 Dynamic connection and method return address etc .

The process of each method from the beginning of the call to the end of the execution , All correspond to a stack frame in the virtual machine stack from the stack to the stack .

Local variable table

Local variable table (Local Variables Table) Is the storage space for a set of variable values , Used to store method parameters and local variables defined inside the method . stay Java The program is compiled as Class When you file , It's in the method Code Attribute max_locals The data item determines the maximum capacity of the local variable table that the method needs to allocate .

A variable slot can hold a 32 Data types within bits ,Java Not more than 32 The data types of bit storage space are boolean、byte、char、short、int、float、reference Illustrations and returnAddress this 8 Types .

The first 7 Kind of reference Type represents a reference to an object instance , Virtual machine implementations should be able to do at least two things through this reference , One is to find objects directly or indirectly by reference Java The starting address or index of the data in the heap , The second is to find the type information of the data type of the object stored in the method area directly or indirectly according to the reference , Otherwise, it will not be possible 《Java language norm 》 Syntax conventions defined in .

When a method is called ,Java Virtual opportunity uses local variable table to complete the transfer process from parameter value to parameter variable list , That is, the transfer from real parameter to formal parameter . If the instance method is executed ( Has not been static The method of decoration ), That's number one in the local variable table 0 By default, the variable slot of bit index is used to pass the reference of the object instance to which the method belongs , In the method, you can use the keyword “this” To access this implied parameter .

The stack of operands

The stack of operands (Operand Stack) Also known as the operation stack , It's a LIFO (Last In First Out,LIFO) Stack . Same as local variable table , The maximum depth of the operands stack is also written to at compile time Code Attribute max_stacks In the data item .

Java The virtual machine's interpretation execution engine is called “ Stack based execution engine ”, Inside “ Stack ” It's the operand stack .

Dynamic connection

Each stack frame contains a reference to the method to which the stack frame belongs in the runtime constant pool illustration , This reference is held to support dynamic connections during method calls (Dynamic Linking).

Class There are a lot of symbol references in the constant pool of the file , The method call instruction in bytecode takes the symbolic reference to the method in the constant pool as the parameter . Some of these symbolic references will be converted to direct references during class loading or the first time they are used , This transformation is called static parsing . The other part will be converted to direct reference during each run , This part is called dynamic connection .

Method return address

When a method starts executing , There are only two ways to exit this method .

The first is that the execution engine encounters the bytecode instruction returned by any method , At this time, there may be a return value passed to the upper method caller ( The method that calls the current method is called the caller or the calling method ), Whether a method has a return value and the type of the return value depends on which method return instruction is encountered , This method of exit is called “ Normal call completed ”(Normal Method InvocationCompletion).

Another way to exit is to encounter an exception during method execution , And this exception is not properly handled in the method body . This method of exit is called “ Exception call completed (Abrupt Method Invocation Completion)”. A method exits by using an exception completion exit , It does not provide any return value to its upper level callers .

No matter what exit method is adopted , After the method exits , Must return to where the original method was called , The program can continue , Method may need to store some information in the stack frame , To help restore the execution state of its upper calling method .

When the method exits normally , The main theme is PC The value of the counter can be used as the return address , This counter value is likely to be stored in the stack frame . When the method exits abnormally , The return address is determined by the exception handler table , Generally, this part of information will not be saved in stack frame .

In general, the dynamic connection 、 Method return address and other additional information are all in the same category , be called Stack frame information .

Method call

Method calls are not equivalent to code being executed in a method , The only task in the method call phase is to determine the version of the called method ( That is, which method to call ), The specific operation process inside the method has not been involved yet .

analysis

The target methods of all method calls are in Class The files are all symbolic references in a constant pool , In the parsing phase of class loading , Will meet “ The compile time is known , The operation period is immutable ” The method symbol reference of is converted to direct reference . let me put it another way , The call target is written in the program code 、 The moment the compiler compiles it has been determined . Calls to such methods are called parsing (Resolution).

Static methods 、 Private method 、 Instance builder 、 Parent class method 4 Kind of , Plus being final The method of decoration ( Although it uses invokevirtual Instruction call ), this 5 A method call will resolve the symbol reference to the direct reference of the method when the class is loaded . These methods are collectively referred to as “ Non virtual method ”(Non-VirtualMethod), By contrast , Other methods are called “ Virtual method ”(Virtual Method).

Assignment Dispatch1. Static Dispatch

English is generally “Method Overload Resolution”, So it's actually a dynamic concept

Human hu = new Man():

In the face code “Human” Called the static type of the variable (Static Type) Or the appearance type (Apparent Type), hinder “Man” Is called the actual type of the variable (Actual Type), Both static and actual types can be changed in a program , The difference is that static types change only when used , The static type of the variable itself is not changed , And the final static type is known at compile time ; The results of actual type changes can only be determined during the operation period , The compiler does not know what the actual type of an object is when compiling the program ? Like the following code :

All rely on static types to determine the dispatch action of the method execution version , All called static dispatch . The most typical application of static dispatch is method overloading . Static dispatch occurs during the compile phase , So make sure that the statically dispatched actions are not actually performed by the virtual machine , That's why some data choose to classify it as “ analysis ” instead of “ Assignment ” Why .

The relationship between analysis and assignment is not exclusive , it We go to different levels to screen 、 The process of determining the target method . for example , As I said before , Static methods are parsed during class loading , Static methods can also have overloaded versions , The process of selecting overloaded versions is also done by static dispatch .

Automatic transformation It can continue to happen many times , according to char>int>long>float>double To match the order of transformation , But it won't match byte and short Overload of type , because char To byte or short It's not safe for us to transform .

Automatic boxing

After boxing, it is transformed into a parent class , If there are more than one parent class , That will start searching from the bottom up in the inheritance relationship , The higher the connection, the lower the priority .

It can be seen that the overload priority of variable length parameters is the lowest , Now the character 'a' It's taken as a char[] Elements of array .

There are some automatic transformations that can be set up in a single parameter , Such as char Transformation into int, It does not hold in variable length parameter

Dynamic dispatch

Java Another important manifestation of language polymorphism —— rewrite (Override).

according to 《Java Virtual machine specification 》,invokevirtual The run-time parsing process of instructions is roughly divided into the following steps :

1) Find the actual type of the object that the first element at the top of the operand stack points to , Write it down as C.

2) If in type C Find a method that matches both the descriptor and the simple name in the constant , Then check the access rights , If it passes, it returns the direct reference to this method , The search process is over ; If not, return to java.lang.IllegalAccessError abnormal .

3) otherwise , According to the relationship of inheritance, you should treat C The second step of the search and verification process for each parent class of .

4) If you never find the right way , Throw out java.lang.AbstractMethodError abnormal .

Precisely because invokevirtual The first step in instruction execution is to determine the actual type of receiver at runtime , So in two calls invokevirtual The instruction does not resolve the symbolic reference of the method in the constant pool to the direct reference , The method version will also be selected based on the actual type of method receiver , This process is Java The essence of method rewriting in language . We call this dispatch process of determining the execution version of the method according to the actual type in the runtime dynamic dispatch .

Since the root of this polymorphism lies in the virtual method call instruction invokevirtual Execution logic , Naturally, our conclusions will only work for the method , Invalid for field , Because fields don't use this directive . in fact , stay Java There are only virtual methods in it , Fields can never be empty , let me put it another way , Fields never participate in polymorphism , When the method of which class accesses the field of a certain name , The name refers to the field that the class can see . When a child class declares a field with the same name as the parent class , Although two fields will exist in the memory of the subclass , But the fields of the subclass will mask the fields of the same name of the parent class .

Both sentences are “I am Son”, This is because Son When the class is created , First, it implicitly calls Father Constructor for , and Father In the constructor showMeTheMoney() The call to is a Virtual method call , The actual version is Son::showMeTheMoney() Method , So the output is “I am Son”, After the previous analysis, I believe that readers have no doubt about this . And at this time, although the parent class money The field has been initialized to 2 了 , but Son::showMeTheMoney() Methods are subclasses money Field , At this time, the result is still 0, Because it won't be initialized until the subclass's constructor is executed .main() The last sentence of access to the parent class through a static type money, Output 2.

Multiple dispatch and single dispatch of methods

The receiver of a method and the parameters of a method are collectively referred to as the quantity of a method , This definition should have originated from the famous 《Java With the model 》 A Book . Based on how many kinds of volume , Dispatch can be divided into single dispatch and multiple dispatch . Single assignment is to select the target method according to a quantity , Multi dispatch is to select the target method according to more than one quantity .

According to the result of the above argument , We can conclude by saying : Now ( Until this book was written Java 12 And the preview version of Java 13) Of Java Language is a static multi dispatch 、 Dynamic single dispatch language .

Dynamic language support

JDK 7 The first new member of the published bytecode ——invokedynamic Instructions .

The key feature of DTL is that the main process of type checking is at run time rather than compile time , There are many languages that satisfy this feature , Common ones include :APL、Clojure、Erlang、Groovy、JavaScript、Lisp、Lua、PHP、Prolog、Python、Ruby、Smalltalk、Tcl, wait . That's relative , Language for type checking at compile time , for example C++ and Java And so on are the most commonly used static type languages .

Java There is still a lack of support for dynamic typing languages at the virtual machine level , It is mainly reflected in method calling :JDK 7 Previous bytecode instruction sets ,4 Method call instructions (invokevirtual、invokespecial、invokestatic、invokeinterface) The first parameter of is the symbolic reference of the called method (CONSTANT_Methodref_info perhaps CONSTANT_InterfaceMethodref_info Constant ), I've already mentioned that , Symbolic references to methods are generated at compile time , The dynamic type language can only determine the receiver of the method at run time .

invokedynamic Command and MethodHandle The mechanism works the same , It's all about solving the original problem 4 strip “invoke*” Instruction method assignment rules are completely embedded in the virtual machine , Transfer the decision of how to find the target method from the virtual machine to the specific user code , Let the user ( Broad user , Including designers of other programming languages ) There's a higher degree of freedom .

Stack based bytecode interpretation execution engine

read 、 understand , And then get executive power . Before most of the program code is converted into the object code of the physical machine or the instruction set executed by the virtual machine , You need the steps in the figure below :

Stack based instruction set and register based instruction set What's the difference between the two ? Let's take the simplest example , Use these two instruction sets to compute “1+1” Result , Stack based instruction sets look like this :

Two article iconst_1 The instruction continues two constants 1 After being pushed into the stack ,iadd Instruction takes the two values at the top of the stack out of the stack 、 Add up , And then put the results back to the top of the stack , Last istore_0 Put the value at the top of the stack to the 0 In a variable slot . The instructions in this kind of instruction flow usually have no parameters , The data in the stack of operands is used as the operation input of the instruction , The operation result of the instruction is also stored in the stack of operands .

And if you use a register based instruction set , That program might look like this :mov Give orders to EAX The value of the register is set to 1, then add Command to add this value 1, The results are stored in EAX In the register . This two address instruction is x86 The mainstream of instruction set , Each instruction contains two separate input parameters , Depending on the deposit .

The main advantage of stack based instruction sets is portability , Because the register is directly provided by the hardware , It is inevitable that the program depends on these hardware registers directly .

The main drawback of stack architecture instruction set is that the execution speed is relatively slow in theory , The instruction set of all mainstream physical machines is a register architecture .

for example :

javap Prompt that this code needs Depth is 2 The stack of operands and 4 The local variable space of a variable slot

Tomcat Class loading

OSGi(Open Service Gateway Initiative) yes OSGi union (OSGi Alliance) A plan based on Java Dynamic modular specification of language ( stay JDK 9 Introduced JPMS It's a static modular system )

Bytecode generation technology and implementation of dynamic proxy

Bytecode generation technology is applied to :javac,Web In the server JSP compiler , At compile time AOP frame , There is also a very common dynamic agent technology , Even when using reflection, the virtual machine may generate bytecode at runtime to improve execution speed .

What dynamic agents say is “ dynamic ”, It's about using Java The code actually writes the proxy class “ static state ” In terms of agency , Its advantage is not that it saves the coding workload of writing proxy classes , It is implemented when the original class and interface are unknown , Determine the agent behavior of the agent class , After the separation of the Contemporary Neo Confucianism from the primitive , It can be flexibly reused in different application scenarios .

stride across JDK The gap between versions , Put the higher version JDK Put the code in the lower version JDK To deploy and use . To solve this problem , A name is “Java Reverse transplant ” Tools for (Java Backporting Tools) emerge as the times require ,Retrotranslator Illustrations and Retrolambda Is an outstanding representative of such tools .

JDK The new functions of each upgrade can be roughly divided into the following five categories :

2) Improvements made at the front-end compiler level . This improvement is called grammar sugar , Such as automatic packing and unpacking , It's actually Javac The compiler automatically inserts many in the program where the wrapper object is used Integer.valueOf()、Float.valueOf() Code like that ; Variable length parameters are automatically converted into an array to complete parameter transfer after compilation ; The generic information is erased at compile time ( But there's still... In the metadata ), The corresponding place is automatically inserted by the compiler into the type conversion code illustration .

3) Changes that need to be supported in bytecode . Such as JDK 7 The new grammatical features —— Dynamic language support , You need to add a new one to the virtual machine invokedynamic Bytecode instructions to achieve the relevant call functions . However, the bytecode instruction set has been in a relatively stable state , Such a direct change at the bytecode level is relatively rare .

4) Need to be in JDK Support improvements at the overall structural level , A typical such as JDK 9 Introduced by Java Modular system , It involves JDK structure 、Java grammar 、 Class loading and connection process 、Java Virtual machine and other layers .

5) Improvements focused on the inside of the virtual machine . Such as JDK 5 Implemented in JSR-133 The illustration specification redefines Java Memory model (Java Memory Model,JMM), And in JDK 7、JDK 11、JDK 12 In the new G1、ZGC and Shenandoah Changes like collectors , This change is basically transparent for programmers to write code , Only when the program is running .

The concept of compilation

Front end compiler ( It's called “ The front end of the compiler ” More accurate ) hold *.java The document is transformed into *.class Documentation process ;

Java Instant compiler for virtual machine ( Often called JIT compiler ,Just In Time Compiler) The process of converting bytecode into local machine code at run time ;

Refers to the use of static advance compilers ( Often called AOT compiler ,Ahead Of Time Compiler).

Java The optimization process of real-time compiler at run time , It supports the continuous improvement of program execution efficiency ; The optimization process of the front-end compiler at compile time , It supports the improvement of the coding efficiency of programmers and the happiness of language users .

compile ——1 Preparations 3 A process

1) Preparation process : Initializing the plug-in annotation processor .

2) Process of parsing and filling symbol table , Include :· morphology 、 Syntax analysis . Turn the character flow of the source code into a set of tags , Construct an abstract syntax tree .· Fill in the symbol table . Generate symbol address and symbol information .

3) The annotation process of the plug-in annotation processor : The execution phase of the plug-in annotation processor , The actual part of this chapter will design a plug-in annotation processor to influence Javac The compilation behavior of .

4) Analysis and bytecode generation process , Include :· Mark check . Check the static information of Syntax .· Data flow and control flow analysis . Check the dynamic running process of the program .· Paraphrase sugar . Reduce the syntax sugar of simplified code writing to the original form .· Bytecode generation . Convert the information generated in the previous steps into bytecode .

New symbols may be generated when inserting annotations , If a new symbol is created , We have to go back to the previous analysis 、 Reprocessing these new symbols as you fill in the symbol table

Plug in annotation processor As a set of compiler plug-ins , When these plug-ins work , Allow reading 、 modify 、 Add any element in the abstract syntax tree . If these plug-ins modify the syntax tree during annotation processing , The compiler will return to the process of parsing and populating symbol tables , Until all of the plug-in annotation processors no longer modify the syntax tree , Each cycle is called a round (Round).

Semantic analysis and bytecode generation

1. Mark check

The contents to be checked in the annotation check step include, for example, whether the variable has been declared before use 、 Whether the data type between variable and assignment can match .

Constant collapse (Constant Folding) Code optimization of : Define in the code “a=1+2” Compared with direct definition “a=3” Come on , It does not increase the processing workload of the program running time, even if it is only one processor clock cycle .

2. Data and control flow analysis

Data flow analysis and control flow analysis are further verification of program context logic , It can check whether there is a value assigned to a program's local variable before it is used 、 Whether each path of the method has a return value 、 Whether all checked exceptions have been correctly handled, etc .

Syntax sugar generics

The essence of generics is parameterized type (Parameterized Type) Or parametric polymorphism (ParametricPolymorphism) Application , That is, the data type of the operation can be specified as a special parameter in the method signature , This parameter type can be used in class 、 Interface and method creation , Each constitutes a generic class 、 Generic interfaces and generic methods .

Java The generic implementation chosen is called “ Type erasing generics ”(Type Erasure Generics), and C# The generic implementation chosen is “ With materialized generics ”(Reified Generics).

Java Generics in languages are different , It only exists in the source code of the program , In the compiled bytecode file , All generics are replaced with the original bare type (Raw Type, Later, we will explain what the naked type is ) 了 , And insert the mandatory transformation code in the corresponding place , So for the runtime Java language ,ArrayList And ArrayList It's actually the same type

In the era of no generics , because Java The array in supports covariance (Covariant) Of , After introducing generics, you can choose :

1) Types that need to be generic ( Mainly container type ), Some have remained the same before , Then add a set of generic versions of the new types in parallel .

2) Directly generalize existing types , That is to make all existing types that need to be generalized in place , Do not add any generic versions parallel to existing types .

We continue to ArrayList For example Java How to implement the type erasure of generics . because Java The second way was chosen , Directly generalize existing types . Make all existing types that need to be generic , for example ArrayList, In situ generics become ArrayList, And make sure you use it directly before ArrayList Your code must continue to use the same container in the new version of generics , This requires that all generic instance types , for example ArrayList、ArrayList These all automatically become ArrayList Only subtypes of can , Otherwise, type conversion is unsafe . This leads to “ Bare type ”(Raw Type) The concept of , A bare type should be treated as the common parent type of all generic instances of that type (Super Type).

How to implement bare types . There are two more options : One is run by Java Virtual machines to automatically 、 Truly construct ArrayList This type , And automatically realize from ArrayList Derive from ArrayList To meet the definition of bare types ; The other is simply to simply and crudely put ArrayList Restore back ArrayList, Access only in elements 、 Automatically insert some cast and check instructions when modifying .

Generics based on this method are called pseudo generics .

This code is compiled into Class file , Then decompile with bytecode decompiler , Generic types have changed back to native types

java Defects of generic erasure implementation :

1. For the original type (Primitive Types) Data support has become a new problem , Since there is no way to convert, you can support the generics of native types , You all use it ArrayList、ArrayList, Anyway, automatic cast is done , When a native type is encountered, Boxing 、 Unpacking is done automatically . This decision resulted in countless constructed wrapper classes and Boxing 、 Unpacking expenses , Become Java The main reason why generics are slow , Also become today Valhalla One of the key problems to be solved in the project .

2. The runtime cannot get generic type information .

because List and List It's the same type after erasing , We can only add two return values that do not need to be used to complete the overload .

in addition , from Signature We can also come to a conclusion about the appearance of attributes , Erasure is called erasure , It's just about the method Code The bytecode in the property is erased , In fact, generic information is preserved in metadata , This is also the fundamental basis that we can get parameterized types by reflection when coding .

Conditional compilation

Define a final The variable of , And then in if Statement to separate the code .

Because the compiler optimizes the code , The condition is always false The sentence of ,Java The compiler will not generate bytecode for it .

Application scenarios : Implement a distinction DEBUG and RELEASE Mode program .

Covariance and Contravariance

Inversion and covariance are used to describe type conversion (type transformation) The relationship of succession after , Its definition : If A、B Indicates the type ,f(⋅) Represents type conversion ,≤ To represent an inheritance relationship ( such as ,A≤B Express A By B Derived subclasses );

f(⋅) It's inverter (contravariant) Of , When A≤B From time to tome f(B)≤f(A) establish ;

f(⋅) It's covariance (covariant) Of , When A≤B From time to tome f(A)≤f(B) establish ;

f(⋅) It's the same (invariant) Of , When A≤B Neither of the above two formulas holds , namely f(A) And f(B) There is no inheritance between them .

Arrays are covariant

Generics are immutable

Generics use wildcards to achieve covariance and inversion . PECS: producer-extends, consumer-super.

List

List

You can put appleList Assign a value to foodList, But it can't be right foodList Add in addition to null Any object other than .

The formal parameters of a method are covariant 、 The return value is inverse :

By talking with netizens iamzhoug37 The discussion of the , Update as follows .

Calling method result = method(n); according to Liskov Substitution principle , Pass in formal parameters n The type of should be method Subtypes of formal parameters , namely typeof(n)≤typeof(method's parameter);result Should be method Returns the base type of the value , namely typeof(methods's return)≤typeof(result)

Back end compilation

Bytecode is regarded as an intermediate representation of programming language (Intermediate Representation,IR) Words , That compiler whenever 、 In what state do you put Class File conversion with local infrastructure ( Hardware instruction set 、 operating system ) Related binary machine code , It can be regarded as the back end of the whole compilation process .

Efficient concurrent

When the price is constant , The number of components that can be accommodated on an integrated circuit , About every 18-24 It doubles in two months , Performance will also double ,

CPU For a long time, it has been increasing exponentially , But in recent years ,CPU The dominant frequency is always kept at 4G About hertz , Can't be further improved . Moore's law is failing .

Number of processors Parallel scale

There is a big gap between the computing speed of the computer and the speed of its storage and communication subsystem , A lot of time is spent on disk I/O、 Network communication or database access .

Measure the performance of a service , Transactions per second (Transactions PerSecond,TPS) Is one of the important indicators , It represents the total number of requests that the server can respond to on average in one second , and TPS Value is closely related to the concurrency of the program

java Memory model main memory and working memory

Java Memory model The main purpose of is to define the access rules of various variables in the program , That is, pay attention to the underlying details of storing variable values into memory and fetching variable values from memory in the virtual machine . Here Variable (Variables) And Java The variables in programming are different , It includes instance fields 、 Static fields and the elements that make up the array object , However, local variables and method parameters are not included , Because the latter is thread private illustration , Will not be Shared , Naturally there will be no competition . In order to achieve better performance ,Java The memory model does not limit execution engine usage

Processor specific registers or caches to interact with the main memory , There is also no restriction on whether the immediate compiler should take optimization measures such as adjusting the code execution order .

Java The memory model specifies that all variables are stored in Main memory (Main Memory) in ( The main memory here is the same as the main memory name mentioned when introducing the physical hardware , The two can also be compared , But physically, it's only part of the virtual machine memory ). Each thread has its own The working memory (Working Memory, It can be compared with the processor cache mentioned earlier ), The working memory of the thread holds a copy of the main memory of the variables used by the thread , All operations of a thread on a variable ( Read 、 Assignment etc. ) All must be done in working memory , You cannot directly read or write data illustrations in the main memory . There is also no direct access between threads to variables in each other's working memory , The transfer of variable values between threads needs to be completed through main memory .

Specific interaction protocol between main memory and working memory , That is, how to copy a variable from main memory to working memory 、 How to synchronize from working memory back to main memory .

·lock( lock ): A variable acting on main memory , It identifies a variable as a thread exclusive state .

·unlock( Unlock ): A variable acting on main memory , It releases a locked variable , Freed variables can be locked by other threads .

·read( Read ): A variable acting on main memory , It transfers the value of a variable from main memory to the working memory of the thread , For subsequent load The action to use .

·load( load ): A variable acting on working memory , It is the read The operation places the values of the variables obtained from main memory into a copy of the variables in working memory .·use( Use ): A variable acting on working memory , It passes the value of a variable in working memory to the execution engine , This is done whenever the vm gets a bytecode instruction that requires the value of the variable to be used .

·assign( assignment ): A variable acting on working memory , It assigns a value received from the execution engine to a variable in working memory , This is done whenever the vm receives a bytecode instruction that assigns a value to a variable .

·store( Storage ): A variable acting on working memory , It transfers the value of a variable in working memory to main memory , For subsequent write Operation and use .

·write( write in ): A variable acting on main memory , It is the store The value of the variable obtained by the operation from working memory is put into the variable in main memory .

If you want to copy a variable from main memory to working memory , Then do it in order read and load operation , If you want to synchronize variables from working memory back to main memory , You have to do it in order store and write operation . Be careful ,Java The memory model only requires that the above two operations be performed in sequence , But it is not required to be continuous . in other words read And load Between 、store And write You can insert other instructions between , As for variables in main memory a、b When doing an interview , One possible order is reada、read b、load b、load a. besides ,Java The memory model also specifies the execution of the above 8 The following rules must be met for basic operations :· Don't allow read and load、store and write One of the operations occurs separately , That is, a variable is not allowed to be read from main memory, but working memory does not accept , Or the working memory initiates a write back, but the main memory does not accept it .· A thread is not allowed to discard its latest assign operation , That is, after the variable changes in working memory, it must be synchronized back to main memory .· A thread is not allowed for no reason ( It didn't happen assign operation ) Synchronize data from the thread's working memory back to main memory .

volatile There will be two features : The first is to ensure the visibility of this variable to all threads , there “ visibility ” When a thread changes the value of this variable , The new value is immediately known to other threads . And ordinary variables don't do that , The values of common variables need to be passed through the main memory . such as , Threads A Change the value of a common variable , Then write back to main memory , Another thread B In a thread A After the write back is completed, read the main memory , The value of the new variable will be on the thread B so .

The second semantics is the prohibition of instruction reordering optimization , Ordinary variables can only ensure that the correct results can be obtained in all places that depend on the assignment results during the execution of the method , It can't guarantee that the order of variable assignment operation is consistent with the execution order in the program code .

Java In the memory model volatile Definition of special rules for variable definition . Assume T Represents a thread ,V and W Two... Respectively volatile Type variable , So it's going on read、load、use、assign、store and write The following rules shall be met during operation :· Only when the thread T The variable V The previous action is load When , Threads T To the variables V perform use action ; also , Only when the thread T The variable V The next action to perform is use When , Threads T To the variables V perform load action . Threads T The variable V Of use Actions can be thought of as and threads T The variable V Of load、read Action related , It has to happen continuously and together .

Atomicity (Atomicity)

Access to basic data types 、 Reading and writing are atomic ( The exception is long and double The nonatomic agreement of

visibility (Visibility)

Common variables vs volatile The difference between variables is ,volatile The special rule of ensures that the new value can be synchronized to the main memory immediately , And refresh from main memory immediately before each use

Java There are also two keywords that allow visibility , They are synchronized and final. The visibility of synchronization blocks is determined by “ Execute on a variable unlock Before the operation , This variable must be synchronized back to main memory ( perform store、write operation )” This rule gets . and final Keyword visibility refers to : By final Once the decorated field is initialized in the constructor , And the constructor didn't “this” References to are passed out (this It's a very dangerous thing to quote escape , It is possible for other threads to access “ Half initialized ” The object of ), So you can see it in other threads final Value of field .

Orderliness (Ordering)

Java The natural order in the procedure can be summed up in one sentence : If you observe in this thread , All operations are orderly ; If you observe another thread in one thread , All operations are out of order . The first half refers to “ It seems that the semantics of thread is serial ”(Within-ThreadAs-If-Serial Semantics), The second half of the sentence refers to “ Instruction reordering ” Phenomenon and “ Working memory and main memory synchronization delay ” The phenomenon .

Antecedent principle

Java Under the memory model “ natural ” Antecedent relationship , These antecedents exist without any synchronizer assistance , Can be used directly in coding . If the relationship between two operations is not in this column , And can't be derived from the following rules , Then they have no sequential guarantee , Virtual machines can reorder them at will .

· Tube lock rule (Monitor Lock Rule): In a thread , In the order of control flow , The operation of writing before takes place before the operation of writing after . Be careful , This is about the sequence of control flow, not the sequence of program code , Because we need to think about branches 、 Circulation and other structures .

· Tube lock rule (Monitor Lock Rule): One unlock The operation occurs first of all on the same lock lock operation . It must be emphasized here that “ Same lock ”, and “ Back ” It refers to the order of time .

·volatile Variable rule (Volatile Variable Rule): To a volatile The write operation of the variable first occurs after the read operation of the variable , there “ Back ” It also refers to the order of time .

· Thread start rule (Thread Start Rule):Thread Object's start() Method occurs first in every action of this thread .

· Thread termination rule (Thread Termination Rule): All operations in a thread occur first at the termination detection of this thread , We can go through Thread::join() Whether the method ends 、Thread::isAlive() Check whether the thread has terminated the execution .

· Thread interrupt rule (Thread Interruption Rule): For threads interrupt() Method calls occur first when the interrupted thread's code detects the occurrence of an interrupt event , Can pass Thread::interrupted() Method detects if an interrupt has occurred .

· Object termination rule (Finalizer Rule): Initialization of an object completed ( End of constructor execution ) What happened first finalize() The beginning of the method .

· Transitivity (Transitivity): If you operate A First occurs in operation B, operation B First occurs in operation C, Then we can get the operation A First occurs in operation C Conclusion .

There is no basic relationship between the principle of precedence and the principle of causation , So when we measure concurrency security issues, we don't need to be distracted by time sequence , Everything must be based on the principle of first occurrence .

Three implementations of threads

Use kernel thread to implement (1:1 Realization )—— Kernel thread (Kernel-Level Thread,KLT) It's directly from the operating system kernel (Kernel, Hereinafter referred to as kernel ) Supported threads , An advanced thread interface —— Lightweight process (LightWeight Process,LWP), A lightweight process is what we usually call a thread .

The cost of system calls is relatively high , It needs to be in user mode (User Mode) And kernel state (Kernel Mode) Switch back and forth . secondly , Every lightweight process needs to have a kernel thread support , So lightweight processes consume a certain amount of kernel resources ( Such as the stack space of kernel thread ), So the number of lightweight processes supported by a system is limited .

Use user thread to implement (1:N Realization )—— As long as a thread is not a kernel thread , Can be thought of as a user thread (User Thread,UT) A kind of

Mixed implementation of user thread and lightweight process (N:M Realization ).

User threads are still completely built in user space , So the creation of user thread 、 Switch 、 Operations such as structure analysis are still cheap , And can support large-scale user thread concurrency . The lightweight process supported by the operating system serves as a bridge between the user thread and the kernel thread , In this way, we can use the thread scheduling function and processor mapping provided by the kernel , And the system call of user thread should be completed by lightweight process , This greatly reduces the risk of the entire process being completely blocked .

Main stream java A virtual machine is a kernel thread implementation

Thread scheduling It refers to the process that the system allocates processor usage rights to threads , There are two main scheduling methods , They are collaborative (Cooperative Threads-Scheduling) Thread scheduling and preemptive (Preemptive Threads-Scheduling) Thread scheduling .

Coordinated scheduling —— If you use a multithreaded system with collaborative scheduling , The execution time of a thread is controlled by the thread itself , After the thread finishes executing its own work , To actively notify the system to switch to another thread .

Java The language has been set up 10 Three levels of thread priority (Thread.MIN_PRIORITY to Thread.MAX_PRIORITY).Windows The system thread priority has 7 individual

java Defined thread state :

6 The two states are :

· newly build (New): Threads that have not been started since creation are in this state .

· function (Runnable): Include... In the operating system thread state Running and Ready, That is, the thread in this state may be executing , It may also be waiting for the operating system to allocate execution time for it .

· Waiting indefinitely (Waiting): Threads in this state are not allocated processor execution time , They wait to be explicitly awakened by other threads . The following methods will put the thread into an infinite waiting state :■ No settings Timeout Parametric Object::wait() Method ;■ No settings Timeout Parametric Thread::join() Method ;■LockSupport::park() Method .

· Deadline waiting (Timed Waiting): Threads in this state are not allocated processor execution time , But no need to wait for other threads to wake up explicitly , After a certain time, they will be automatically awakened by the system . The following methods will put the thread into a deadline waiting state :■Thread::sleep() Method ;■ Set up Timeout Parametric Object::wait() Method ;■ Set up Timeout Parametric Thread::join() Method ;■LockSupport::parkNanos() Method ;■LockSupport::parkUntil() Method .

· Blocking (Blocked): The thread is blocked ,“ Blocked state ” And “ Wait state ” Is the difference between the “ Blocked state ” Waiting to get an exclusive lock , This event will occur when another thread gives up the lock ; and “ Wait state ” It's waiting for a while , Or the wake-up action . When the program is waiting to enter the synchronization area , The thread will enter this state .

· end (Terminated): Thread state of terminated thread , Thread has ended execution .

coroutines

Java The current concurrent programming mechanism has some contradictions with the above architectural trend ,1:1 The kernel threading model is now Java The mainstream choice of virtual machine thread implementation , But the natural drawback of this thread mapped to the operating system is switching 、 Scheduling costs are high , The number of threads the system can hold is also very limited .

The scheduling cost of kernel threads mainly comes from the state transition between user state and core state , The overhead of these two state transitions mainly comes from responding to interrupts 、 The cost of protecting and restoring the execution site .

How do you deal with the process , For a blocked business operation , We're not dealing with threads , It's about using a process , So when it comes to IO When it's blocked , And you haven't finished running the timeslice , You won't let CPU Run away , It's about setting up another coordination task , Let him go on with the calculation . And usually we know , Code pure computation is very fast ,5ms May have run away N There's a way , So make the most of the time slice , And reduce CPU Switching time .

Thread safety

When multiple threads access an object at the same time , If you don 't have to consider the scheduling and alternate execution of these threads in the runtime environment , There's no need for extra synchronization , Or any other coordination operation at the caller , The behavior of calling this object can get the correct result , This object is called thread safe .

According to thread safe “ Degree of safety ” Order from strong to weak , Can be Java The data shared by various operations in the language can be divided into the following five categories :

immutable

String outside , There are also enumeration types and java.lang.Number Some subclasses of , Such as Long and Double Equal value packaging type 、BigInteger and BigDecimal Equal big data type . But both Number Atomic classes of subtypes AtomicInteger and AtomicLong It's variable

Absolute thread safety

Relative thread safety

Relative thread safety is what we usually call thread safety , It needs to make sure that for this object A single operation is thread safe , We don't need additional safeguards when calling , But for some specific sequential calls , It may be necessary to use additional synchronization means at the calling end to ensure the correctness of the call .

Thread compatibility

Thread opposition

An example of thread opposition is Thread Class suspend() and resume() Method . If there are two threads holding a thread object at the same time , An attempt to interrupt a thread , An attempt to recover threads , In the case of concurrency , Whether or not the call is synchronized , All target threads are at risk of deadlock —— If suspend() The interrupted thread is about to execute resume() That thread of , Then there must be a life and death lock . That's why ,suspend() and resume() Methods have been declared obsolete .

Thread safe implementation

1、 Mutually exclusive synchronization

It is one of the most common and main concurrent correctness guarantee means . Also known as blocking synchronization (Blocking Synchronization).

Synchronization refers to concurrent access to shared data by multiple threads , Ensure that shared data is shared by only one at a time ( Or some , When using semaphores ) Thread usage . Mutual exclusion is a means to achieve synchronization , A critical region (CriticalSection)、 The mutex (Mutex) And semaphore (Semaphore) Are common implementations of mutual exclusion . So in “ Mutually exclusive synchronization ” In these four words , Mutual exclusion is due to , Synchronization is the result ; Mutual exclusion is a method , Synchronization is the purpose .

stay Java Inside , The most basic means of mutual exclusion and synchronization is synchronized keyword , This is a block structure (BlockStructured) Synchronization syntax of .synchronized Keywords pass through Javac After the compilation , Will be formed before and after the synchronized block monitorenter and monitorexit These two bytecode instructions . Both bytecode instructions require a reference Type to indicate which objects to lock and unlock .

· By synchronized Decorated synchronization blocks are reentrant for the same thread . This means that if the same thread repeatedly enters the synchronization block, it will not lock itself .

· By synchronized Decorated synchronization block before the thread holding the lock executes and releases the lock , It will unconditionally block the entry of other subsequent threads . This means that you can't handle locks in some databases , Force the thread that has acquired the lock to release the lock ; It is also impossible to force a thread waiting for a lock to interrupt waiting or timeout exit .

ReentrantLock The same is reentrant , In terms of function synchronized Superset : Wait for interruptible 、 Fair locks can be implemented and locks can be bound to multiple conditions .

jdk6 The latter two have similar performance ,synchronized The lock can be released automatically ,lock Need to be in finally Manual release in .

2、 Nonblocking synchronization

Optimistic concurrency strategy based on conflict detection , Generally speaking, regardless of risk , Do it first , If there are no other threads competing for shared data , Then the operation is directly successful ; If the shared data is indeed contested , There was a conflict , Then take other compensation measures , The most common remedy is to keep trying again , Until there is shared data without competition . The implementation of this optimistic concurrency strategy no longer needs to suspend thread blocking , Therefore, this synchronization operation is called non blocking synchronization (Non-Blocking Synchronization), Code that uses this measure is also often referred to as lockless (Lock-Free) Programming .

· Compare and exchange (Compare-and-Swap, Hereinafter referred to as CAS)

If a variable V The first time I read it is A value , And check that it's still A value , Does that mean that its value has not been changed by other threads ? This is impossible , Because if its value has been changed to B, It was later changed back to A, that CAS The operation mistakenly assumes that it has never been changed . This loophole is called CAS Operation of the “ABA problem ”.

The solution is to use the version number

3、 No synchronization scheme

Synchronization is just a means to ensure the correctness of shared data contention , If you can make a method that doesn't involve sharing data , Then it naturally does not need any synchronization measures to ensure its correctness

Reentrant code

This kind of code is also called pure code (Pure Code), It means that it can be interrupted at any time of code execution , Instead, execute another piece of code ( Including recursively calling itself ), And after the return of control , The original program won't make any mistakes , It won't affect the results .

Thread local storage

java.lang.ThreadLocal Class to realize the function of thread local storage . Each thread's Thread Object has one ThreadLocalMap object , This object stores a set of ThreadLocal.threadLocalHashCode As the key , With the local thread variable as the value K-V It's worth it ,ThreadLocal Object is the current thread ThreadLocalMap Access to , every last ThreadLocal Objects contain a unique threadLocalHashCode value , With this value, you can use it in the thread K-V Retrieve the corresponding local thread variable in the value pair .

Lock the optimization

Lock spin

If the physical machine has more than one processor or processor core , Enables two or more threads to execute concurrently in parallel , We can then ask the thread that requested the lock later “ Wait a minute ”, But don't give up the processor's execution time , See if the thread holding the lock will soon release the lock . To keep the thread waiting , We just have to let the thread execute a busy loop ( The spin ), This technology is called spin lock .

There must be a certain limit between , If the spin exceeds the limited number of times, the lock is still not obtained successfully , You should use the traditional way to suspend threads .

JDK6 introduce Adaptive spin . Adaptive means that the spin time is no longer fixed , It is determined by the spin time of the previous lock and the state of the lock owner . If it's on the same lock object , Spin wait has just successfully acquired a lock , And the thread holding the lock is running , Then the virtual machine will think that this spin is likely to succeed again , This allows the spin to wait for a relatively longer time , For example, continuous 100 Second busy cycle . On the other hand , If for a lock , Spin rarely succeeds in getting locked , It will be possible to omit the spin process directly when acquiring the lock in the future , To avoid wasting processor resources .

Lock elimination

Lock cancellation means that the virtual machine immediate compiler is running , Require synchronization for some code , But it can eliminate the lock which is detected that there is no possibility of sharing data competition . The main criterion of lock elimination comes from Escape analysis Data support for , If a piece of code is found , All data on the heap will not escape and be accessed by other threads , Then you can treat them as data on the stack , Think of them as thread private , Synchronous locking is naturally unnecessary .

Lock coarsening

If a series of continuous operations repeatedly lock and unlock the same object , Even the locking operation occurs in the loop body , Even if there is no thread race , Frequent mutex synchronization also leads to unnecessary performance loss . If the virtual machine detects such a string of piecemeal operations that lock the same object , It will extend the scope of lock synchronization ( Coarsening ) To the outside of the entire sequence of operations .

Lightweight lock

“ Lightweight ” As opposed to a traditional lock implemented using operating system mutexes , Therefore, the traditional locking mechanism is called “ heavyweight ” lock .

HotSpot Object header of virtual machine (Object Header)

Before the code enters the synchronization block , If the synchronization object flag bit is 01, Not locked -> Create a lock record in the stack frame of the current thread (Lock Record) Space , Used to store the current lock object MarkWord A copy of the .

The virtual machine will be used CAS The operation attempts to put the object's Mark Word Update to point Lock Record The pointer to . If the update action succeeds , This means that the thread owns the lock of the object , And object Mark Word The lock flag bit (Mark Word The last two bits of ) Will be turned into “00”, Indicates that the object is in a lightweight locked state . If the update operation fails , That means there is at least one thread competing with the current thread for the lock on the object .

The virtual machine first checks the object Mark Word Whether to point to the current thread's stack frame , If it is , Indicates that the current thread already has a lock on the object , Then go directly to the synchronization block to continue execution , Otherwise, it means that the lock object has been preempted by other threads . If more than two threads compete for the same lock , The lightweight lock is no longer valid , It has to be expanded into a heavyweight lock , The status value of the lock flag becomes “10”, here Mark Word Is stored to point to a heavyweight lock ( The mutex ) The pointer to , The thread waiting for the lock must also enter the blocking state .

The unlocking process is also through CAS operations , If the object's Mark Word Still point to the thread's lock record , Then use CAS The operation sets the object's current Mark Word And copied in threads Displaced Mark Word Replace it with . If we can successfully replace , Then the whole synchronization process is completed successfully ; If the substitution fails , Other threads have tried to acquire the lock , Just as you release the lock , Wakes up the suspended thread .

JVM The real interview question of Dachang :JVM Virtual machine interview questions & With the answer analysis

Biased locking

Say lightweight locks are used without contention CAS Operation to eliminate the mutex used for synchronization , The biased lock is to eliminate the whole synchronization without competition , even CAS The operation is not done .

The lock is biased towards the first thread that gets it , If in the following execution , The lock has not been acquired by other threads , Threads holding biased locks will never need to be synchronized again .

Suppose that the current virtual machine has biased locks enabled ( Enable the parameters -XX:+UseBiased Locking, This is the JDK 6 rise HotSpot The default value of the virtual machine ), So when the lock object is first acquired by the thread , The virtual machine will set the flag bit in the object header to “01”、 Set the bias mode to “1”, It means entering biased mode . Use at the same time CAS Operation to get the lock to the thread ID Record in object Mark Word In . If CAS Successful operation , Every time the thread holding the biased lock enters the lock related synchronization block later , Virtual machines can no longer do any synchronization ( For example, lock 、 Unlock and to Mark Word And so on ).

When an object has been evaluated Consistent hash code after , It can no longer enter the bias lock state ; When an object is currently in a biased lock state , When you receive a request to calculate its consistent hash code , Its biased state will be revoked immediately , And the lock will expand into a heavyweight lock . In the implementation of heavyweight locks , The object header points to the location of the heavyweight lock , Representing a heavyweight lock ObjectMonitor There are fields in the class that can record the unlocked state ( Sign bit is “01”) Under the Mark Word, The original hash code can be stored naturally .

copyright:author[Java program ape],Please bring the original link to reprint, thank you. https://en.javamana.com/2022/175/20210702150335488E.html