.NET Metadata, IL, Type and Memory

Metadata and IL Language

Metadata and IL Language are the fundamentals of CLR. Metadata describes the static structure and IL Language
describes the dynamic execution.

Metadata

Metadata is fundamental data that is used to describe other complex data.

For the .NET metadata, it is used to describe: class, type, property, method, filed, parameters, attribute. The main category of .NET metadata are:

Type Name
Definition Form TypeDef, MethodDef, FieldDef, ModuleDef, PropertyDef
Reference Form AssemblyRef, TypeRef, ModuleRef, MethodsRef
Pointer Form MethodPtr, FieldPtr, ParamPtr
Heap #String, #Blob, #US, #GUIDe

Below is the metadata to describe the Main function of class Program

![Metadata for the Main function of class program](Metadata.png %}

IL Language

  • IL (Intermediate Language) is the language appearing after compile and before compile in CLR. It links the programming language which follows the CLS standard and the machine code.
  • It is independent from the CPU command. It is converted to machine code through JIT compile
  • It is still object-oriented, and it still owns class, object, inheritance, polymorphisn.

![IL code without reference](ILCodewithoutReference.png %}

IL Grammar

1
2
3
ldc.i4.2    // Load integer 2 to evaluation stack
stloc.0 // Assign the value in evaluation stack to variable 0
brtrue.s IL_XX // If brtrue is true, jump to XX line to run

![IL Command](ILCommand.png %}

Metadata, IL works together

IL language uses Metadata Token to refer a metadata, which is a 4 bytes address. Below is the IL code with the reference to metadata.

![IL Code](ILCodewithReferencetoMetadata.png %}

Pipeline for JIT to load metadata and IL program and compile to naive code and call methods.

  • JIT uses class loader to load the class structure from metadata to CORINFO_CLASS_STRUCT structure in CLR.
  • After loading the class structure to CORINFO_CLASS_STRUCT, all methods are saved in MethodsDesc. Contents for each method are a pointer that points to the method’s IL code. There is also a pointer to trigger the compile of JIt, called PreJitStub.
  • When call a method in the class. If it is the first time, JIT will compile the IL code to naive code (machine code), and replace that method’s MethodsDesc with a jmp command which points to the naive code’s line.
  • When the method is called again, since the naive code has been generated, the jmp command will be executed automatically.

Initialization in CLR

CLR supports value type and reference type. For a line of code.

1
Person person = new Person();
  • Person person: Reference type
    • initobj
    • Initialize the type of the class
    • Set value type of class to 0, and reference type of class to null
  • new Person(): Value type
    • newobj
    • Create a space in the heap, initialize pointer and other members used by CLR
    • Call the ctor constructor method
    • Return the pointer of this address

Initialization of some special type

  • ldstr: initialize string
  • newarr: initialize array
    • only handles the 1-dimension array
    • more than 1-dimension array still requires newobj

Call the method in CLR

IL has two keywords to describe the method type.

  • For static method
    • use static keyword to declare
    • Initialize during the compile
    • use call keyword to execute
  • For instance method
    • use instance keyword to declare
    • Initialize during the runtime
    • use callvirt to execute

Direct Calling

  • call: call the method according to the static type of the reference, mainly used for non-virtual method.
    • Special cases for virtual method
      • Call the virtual method of System.Object Equals, ToString
      • Value type (closed type) call the virtual method
  • callvirt: call the method according to the dynamic type (real object) of the reference, mainly used for virtual method.
    • Special cases for non-virtual method
      • In the reference type, it is more safe because it could throw the exception

Indirect Calling

  • calli: call the method through a method pointer
    • Use Idftn or Idvirtftn command get the method pointer
    • Based on reflection, MethodInfo.Invoke() and Dynamic Method Delegate

Some Useful Info and Tools

  • mscorlib.dll: Microsoft Standard Common Objected Runtime Library
    • Contains base class System.Object
  • constructor, see more here
    • .ctor method: class constructor method
    • .cctor method: class type constructor method
  • Compile and Decompile Tools (run in .NET SDK Powershell)
    • .cs to .exe
      • csc: can build cs file to metadata or .dll or .exe
    • .cs to .il
      • reflector.exe (need to download)
    • .il and .exe
      • ILASM.exe: IL Assembler, compile .il to .exe
      • ILDASM.exe: IL Disassembler, decompile .exe to .il
        • In the ILDASM, use ctrl + m could view the metadata in that code.

Data types in memory

Common Type System

CTS defines the types and rules for different programming language based on .NET. It defines the relationship between each language with their IL program.

Like in C#, the type of int, char, string corresponds to System.Int32, System.Char, System.String.

.NET Structure

.NET Specification

  • Common Type System (CTS): defines the basic types of different .NET languages.
  • Common Language Specification (CLS): subset of CTS. The minimum set that .NET language should support.
  • Common Intermediate Language (CIL): MSIL. The intermediate language between the .NET language and the naive code. Based on Heap and Stack. Can be compiled to naive code with JIT compiler.
  • Common Language Infrastructure (CLI): includes the above elements

The type in the content of CTS includes the following chart.

.NET Implement

  • Common Language Runtime (CLR): Control the code to execute.
  • Framework Class Library (FCL): Standard types for some fundamental work. Organized in a tree shape with the namespace and the root is System
  • .NET Framework: implementation of above 2 elements in Windows platform.

Primitive type in C# and FCL

C# FCL
int System.Int32
long System.Int64
floaat System.Single
string System.String
object System.Object

Value Type and Reference Type

Value Type

  • int, char, long, bool, float, double, byte, enum, struct.
  • Is allocated in the stack of thread
    • The value type itself contains the data
    • When assign one value type variable to another, it assign the data
  • Inherit from System.ValueType (which inherits from System.Object)
    • Override the Equals method, could compare the value of two data instead of the addresses of two data
  • Scenarios to use
    • Simple structure and no polymorphism requirement
    • Only store data and no action required
    • No inheritance requirement
    • Parameters passing and return

Reference type

  • It includes: class, interface, delegate, object, string and int[] etc.
  • Is allocated in the stack of thread, but its instance is allocated in the managed heap
    • The reference type contains the address of its instance in the managed heap
      • When assign one reference type variable to another, it assign the address
    • Instance in managed heap is managed by GC (Garbage Collection)
      • Reference type is less efficient than value type because it requires GC to manage
  • Inherit from System.Object
    • Could compare whether two address are the same
    • String is very special. The equals of string will compare the value instead of the address. Every action on the string will generate a new string data in the managed heap.
  • Scenarios to use
    • Required inheritance, polymorphism, and actions
    • Several parameters returning (ref, out)

Relationship of 2 types

  • When the value type is inside a reference type (a variable in the class), the data will be allocated in the managed heap together with the whole class
  • Use static public implicit operator XX(YY y) to write a method can customize the convert between different type

For the convert between value type and reference type, we need to use boxing (from value to reference) and unboxing (from reference to value) to achieve it.

Note. unboxing must happen in the reference type which is generated from boxing process.

The boxing and unboxing will influence the efficiency. Use generic could avoid influencing the efficiency.

Examples of boxing and unboxing

  • ArrayList
  • Hashtable
  • enum

Parameters Passing

Value Type Reference Type
Value Passing Inner change won’t influence outside Inner assign action influences outside, but the initialization won’t influence outside
Reference Passing Any change will influence outside Any change will influence outside

For reference passing, must use ref or out

  • ref: must be initialized before passing
  • out: must be initialized inside method

Memory Management

.NET CLR has 3 part of memory.

  • Stack of thread.
    • Store the instance of value type
    • Managed by OS
    • Release when the method of value type ends, thus high efficiency
  • Garbage Collection Heap
  • Large Object Heap

Memory allocation

When call an instance method. The next command will be pushed to stack of thread, then other local value type will be pushed to the stack. When finish, those value type will be pop until get the pointer to the next command.

GC Heap stores the instance of reference type. Manged by GC. The instance will contains a TypeHandler which points to its method table in the Loader Heap.

Loader Heap stores the metadata (type). Every type in the loader heap is a method table. Managed by AppDomain. The method table has a part called Method Slot Table, which is a linked list containing the methods. When call a method in a reference type (class), CLR will follow the TypeHandler in the GC heap to find the metadata which contains this type’s definition in Loader Heap. Then CLR will check the Method Slot Table in this method table, and use JIT to compile the method’s IL code to naive code. The naive code will be stored in a dynamic memory.

![Memory in Heap](MemoryInHeap.png %}

Garbage Collection

If an object is not used by other objects in heap or not referred by data in stack, it will be regarded as garbage.

When the memory is full, the CLR will automatically work to recycle garbage and release the memory.

After releasing the memory, the CLR will use GC to move existed data to a continuous space to make the heap looks dense. The generation of the left data will increase by 1. Next time, the data with generation 0 will be checked firstly.

In C#, you can call GC.Collect to force the GC recycle the memory. You can also use Finalize method or Dispose method to customize the behavior when the class was recycled.

Methods to improve GC efficiency

  • Use Dispose instead of Finalize
  • Use using keyword to achieve automatically dispose calling
  • Choose the right type of GC. Workstation GC or Server GC
  • Use WeakReference class to achieve the weak reference
  • Use generic for value type data to avoid boxing and unboxing process
  • Set a configured size for collections
  • Rewrite ToString in subclass
  • Use String.Compare instead of ==; str.Length instead of == ''
  • Use foreach instead of for
  • Use struct if class is not necessary
  • Use is to check the type, use as to convert the type
  • Use static readonly instead of const
  • Use 1-d array
  • Use Multi-thread for system
  • Avoid throw lots of exceptions. For catch block, catch the specific exception firstly
  • Use FxCop software to check the efficiency of code

Reference

  1. 《你必须知道的.NET(第二版)》,第2部分,王涛著