.NET Metadata, IL, Type and Memory
Metadata and IL Language
Metadata and IL Language are the fundamentals of CLR. Metadata describes the static structure and IL Language
describes the dynamic execution.
Metadata
Metadata is fundamental data that is used to describe other complex data.
For the .NET metadata, it is used to describe: class, type, property, method, filed, parameters, attribute. The main category of .NET metadata are:
Type | Name |
---|---|
Definition Form | TypeDef, MethodDef, FieldDef, ModuleDef, PropertyDef |
Reference Form | AssemblyRef, TypeRef, ModuleRef, MethodsRef |
Pointer Form | MethodPtr, FieldPtr, ParamPtr |
Heap | #String, #Blob, #US, #GUIDe |
Below is the metadata to describe the Main function of class Program
![Metadata for the Main function of class program](Metadata.png %}
IL Language
- IL (Intermediate Language) is the language appearing after compile and before compile in CLR. It links the programming language which follows the CLS standard and the machine code.
- It is independent from the CPU command. It is converted to machine code through JIT compile
- It is still object-oriented, and it still owns class, object, inheritance, polymorphisn.
![IL code without reference](ILCodewithoutReference.png %}
IL Grammar
1 | ldc.i4.2 // Load integer 2 to evaluation stack |
![IL Command](ILCommand.png %}
Metadata, IL works together
IL language uses Metadata Token to refer a metadata, which is a 4 bytes address. Below is the IL code with the reference to metadata.
![IL Code](ILCodewithReferencetoMetadata.png %}
Pipeline for JIT to load metadata and IL program and compile to naive code and call methods.
- JIT uses
class loader
to load the class structure from metadata toCORINFO_CLASS_STRUCT
structure in CLR. - After loading the class structure to
CORINFO_CLASS_STRUCT
, all methods are saved inMethodsDesc
. Contents for each method are a pointer that points to the method’s IL code. There is also a pointer to trigger the compile of JIt, calledPreJitStub
. - When call a method in the class. If it is the first time, JIT will compile the IL code to naive code (machine code), and replace that method’s
MethodsDesc
with ajmp
command which points to the naive code’s line. - When the method is called again, since the naive code has been generated, the
jmp
command will be executed automatically.
Initialization in CLR
CLR supports value type and reference type. For a line of code.
1 | Person person = new Person(); |
Person person
: Reference typeinitobj
- Initialize the type of the class
- Set value type of class to
0
, and reference type of class tonull
new Person()
: Value typenewobj
- Create a space in the heap, initialize pointer and other members used by CLR
- Call the
ctor
constructor method - Return the pointer of this address
Initialization of some special type
- ldstr: initialize string
- newarr: initialize array
- only handles the 1-dimension array
- more than 1-dimension array still requires
newobj
Call the method in CLR
IL has two keywords to describe the method type.
- For static method
- use
static
keyword to declare - Initialize during the compile
- use
call
keyword to execute
- use
- For instance method
- use
instance
keyword to declare - Initialize during the runtime
- use
callvirt
to execute
- use
Direct Calling
call
: call the method according to the static type of the reference, mainly used for non-virtual method.- Special cases for virtual method
- Call the virtual method of System.Object
Equals
,ToString
- Value type (closed type) call the virtual method
- Call the virtual method of System.Object
- Special cases for virtual method
callvirt
: call the method according to the dynamic type (real object) of the reference, mainly used for virtual method.- Special cases for non-virtual method
- In the reference type, it is more safe because it could throw the exception
- Special cases for non-virtual method
Indirect Calling
calli
: call the method through a method pointer- Use
Idftn
orIdvirtftn
command get the method pointer - Based on reflection,
MethodInfo.Invoke()
and Dynamic Method Delegate
- Use
Some Useful Info and Tools
- mscorlib.dll: Microsoft Standard Common Objected Runtime Library
- Contains base class
System.Object
- Contains base class
- constructor, see more here
.ctor
method: class constructor method.cctor
method: class type constructor method
- Compile and Decompile Tools (run in .NET SDK Powershell)
.cs
to.exe
- csc: can build cs file to metadata or
.dll
or.exe
- csc: can build cs file to metadata or
.cs
to.il
- reflector.exe (need to download)
.il
and.exe
- ILASM.exe: IL Assembler, compile
.il
to.exe
- ILDASM.exe: IL Disassembler, decompile
.exe
to.il
- In the ILDASM, use
ctrl + m
could view the metadata in that code.
- In the ILDASM, use
- ILASM.exe: IL Assembler, compile
Data types in memory
Common Type System
CTS defines the types and rules for different programming language based on .NET. It defines the relationship between each language with their IL program.
Like in C#, the type of int
, char
, string
corresponds to System.Int32
, System.Char
, System.String
.
.NET Structure
.NET Specification
- Common Type System (CTS): defines the basic types of different .NET languages.
- Common Language Specification (CLS): subset of CTS. The minimum set that .NET language should support.
- Common Intermediate Language (CIL): MSIL. The intermediate language between the .NET language and the naive code. Based on Heap and Stack. Can be compiled to naive code with JIT compiler.
- Common Language Infrastructure (CLI): includes the above elements
The type in the content of CTS includes the following chart.
.NET Implement
- Common Language Runtime (CLR): Control the code to execute.
- Framework Class Library (FCL): Standard types for some fundamental work. Organized in a tree shape with the namespace and the root is
System
- .NET Framework: implementation of above 2 elements in Windows platform.
Primitive type in C# and FCL
C# | FCL |
---|---|
int | System.Int32 |
long | System.Int64 |
floaat | System.Single |
string | System.String |
object | System.Object |
Value Type and Reference Type
Value Type
int
,char
,long
,bool
,float
,double
,byte
,enum
,struct
.- Is allocated in the stack of thread
- The value type itself contains the data
- When assign one value type variable to another, it assign the data
- Inherit from
System.ValueType
(which inherits fromSystem.Object
)- Override the
Equals
method, could compare the value of two data instead of the addresses of two data
- Override the
- Scenarios to use
- Simple structure and no polymorphism requirement
- Only store data and no action required
- No inheritance requirement
- Parameters passing and return
Reference type
- It includes:
class
,interface
,delegate
,object
,string
andint[]
etc. - Is allocated in the stack of thread, but its instance is allocated in the managed heap
- The reference type contains the address of its instance in the managed heap
- When assign one reference type variable to another, it assign the address
- Instance in managed heap is managed by GC (Garbage Collection)
- Reference type is less efficient than value type because it requires GC to manage
- The reference type contains the address of its instance in the managed heap
- Inherit from
System.Object
- Could compare whether two address are the same
String
is very special. The equals of string will compare the value instead of the address. Every action on the string will generate a new string data in the managed heap.
- Scenarios to use
- Required inheritance, polymorphism, and actions
- Several parameters returning (
ref
,out
)
Relationship of 2 types
- When the value type is inside a reference type (a variable in the class), the data will be allocated in the managed heap together with the whole class
- Use
static public implicit operator XX(YY y)
to write a method can customize the convert between different type
For the convert between value type and reference type, we need to use boxing (from value to reference) and unboxing (from reference to value) to achieve it.
Note. unboxing must happen in the reference type which is generated from boxing process.
The boxing and unboxing will influence the efficiency. Use generic could avoid influencing the efficiency.
Examples of boxing and unboxing
- ArrayList
- Hashtable
- enum
Parameters Passing
Value Type | Reference Type | |
---|---|---|
Value Passing | Inner change won’t influence outside | Inner assign action influences outside, but the initialization won’t influence outside |
Reference Passing | Any change will influence outside | Any change will influence outside |
For reference passing, must use ref
or out
ref
: must be initialized before passingout
: must be initialized inside method
Memory Management
.NET CLR has 3 part of memory.
- Stack of thread.
- Store the instance of value type
- Managed by OS
- Release when the method of value type ends, thus high efficiency
- Garbage Collection Heap
- Large Object Heap
Memory allocation
When call an instance method. The next command will be pushed to stack of thread
, then other local value type will be pushed to the stack. When finish, those value type will be pop until get the pointer to the next command.
GC Heap
stores the instance of reference type. Manged by GC. The instance will contains a TypeHandler
which points to its method table in the Loader Heap.
Loader Heap
stores the metadata (type). Every type in the loader heap is a method table. Managed by AppDomain. The method table has a part called Method Slot Table
, which is a linked list containing the methods. When call a method in a reference type (class), CLR will follow the TypeHandler
in the GC heap to find the metadata which contains this type’s definition in Loader Heap. Then CLR will check the Method Slot Table
in this method table, and use JIT to compile the method’s IL code to naive code. The naive code will be stored in a dynamic memory.
![Memory in Heap](MemoryInHeap.png %}
Garbage Collection
If an object is not used by other objects in heap or not referred by data in stack, it will be regarded as garbage.
When the memory is full, the CLR will automatically work to recycle garbage and release the memory.
After releasing the memory, the CLR will use GC to move existed data to a continuous space to make the heap looks dense. The generation
of the left data will increase by 1. Next time, the data with generation 0 will be checked firstly.
In C#, you can call GC.Collect
to force the GC recycle the memory. You can also use Finalize
method or Dispose
method to customize the behavior when the class was recycled.
Methods to improve GC efficiency
- Use
Dispose
instead ofFinalize
- Use
using
keyword to achieve automatically dispose calling - Choose the right type of GC.
Workstation GC
orServer GC
- Use
WeakReference
class to achieve the weak reference - Use generic for value type data to avoid boxing and unboxing process
- Set a configured size for collections
- Rewrite
ToString
in subclass - Use
String.Compare
instead of==
;str.Length
instead of== ''
- Use
foreach
instead offor
- Use struct if class is not necessary
- Use
is
to check the type, useas
to convert the type - Use
static readonly
instead ofconst
- Use 1-d array
- Use Multi-thread for system
- Avoid throw lots of exceptions. For
catch
block, catch the specific exception firstly - Use
FxCop
software to check the efficiency of code
Reference
- 《你必须知道的.NET(第二版)》,第2部分,王涛著