Microsoft Mixed Reality Development

There are a lot of application development paths for Microsoft MixedReality platforms. This post covers things including OpenXR, Unity, Stereo Kit, MRTK and etc.

Overview

Target Devices

  • HoloLens only supports UWP applications
  • Windows MR Headsets support UWP and Win32 applications

WinRT and OpenXR APIs

Same things

  • Provide applications and middleware(game engine) the access to the MR features built in Windows OS
  • Depends on different runtimes (converting OS data into API data) baked in Windows OS
  • Required Windows SDK to call APIs
  • Support both C++/WinRT or C# interfaces.

Different things

~ WinRT APIs OpenXR APIs
Who defined Microsoft Khronos Group
Status Legacy Serve as industrial standards
OS runtime requirement Windows Runtime OpenXR Runtime
Documentations Windows.Perception.* namespace Xr…

Runtime converts OS lower layer data into API layer data, both Windows Runtime and OpenXR Runtime are baked in Windows OS.

WinRT API

WinRT API, or called Windows Runtime API, is a set of APIs distributed by Windows SDK, which allows applications to get access to runtime and OS data.

Specifically, the APIs in the following namespace are those APIs serve for Mixed Reality usages.

Namespace Functions Main classes
Windows.Perception.People Classes that describe user - EyePose
- HandPose
- HeadPose
Windows.Perception.Spatial Classes to reason about spatial relationships within the user’s surroundings - SpatialAnchor
- SpatialStationaryReferenceOfFrame
Windows.Perception.Spatial.Preview Classes to track spatial nodes, allowing user to reason about places and things in their surroundings CreateLocatorForNode
Windows.Perception.Spatial.Surface Classes to describe the surfaces observed in the user’s surroundings and their triangle meshes ~
Windows.UI.Input.Spatial Classes let apps react to gaze, hand gestures, motion controllers and voice - SpatialPointerPose::TryGetAtTimestamp
- SpatialInteractionManager::GetForCurrentView

See this post for more introduction about HoloLens features and related APIs.

OpenXR API

OpenXR could provide a generic interface for application, or engines to develop on HoloLens. APIs of HoloLens were wrapped by OpenXR Runtime, and provided as several extensions forms

  • OpenXR core API: systems, sessions, reference spaces
  • OpenXR KHR (official) extension: Direct3D 11/12 integration
  • OpenXR EXT (pre-official) extension: hand articulation, eye gaze
  • OpenXR MSFT (vendor) extension: Unbounded reference space, spatial anchors, hand interaction, scene understanding, other UWP CoreWindow API, other legacy WinRT APIs

Some resources

Naive Development

Naive development means developing the application directly with WinRT APIs or OpenXR APIs, without using any middleware like game engines. This allows developers to get more access to the raw data of HoloLens and build their own middleware or framework.

Both WinRT APIs and OpenXR APIs support to build UWP or Win32 types of applications, and they both support to use C++ (C++/WinRT for WinRT APIs) or C# to development.

Samples using WinRT API (Legacy)

UWP sample based on WinRT (using C++/CX, C++/WinRT, C#)
Win32 sample based on WinRT (using C++/WinRT)

Since the difference of UWP and Win32 applications are only the way of wrapping up and the source code is mostly same, this section will mainly use the UWP application sample written in C++/WinRT as reference.

Basic Hologram

Main Working Procedure

  1. Start: wWinMain function in AppView.cpp.
    1. Create a IFrameworkView
      1. Create ApplicationView
      2. Create CoreWindow
      3. Create m_holographicSpace for CoreWindow
    2. Starts the CoreApplication.
  2. Update: Update and Render functions in AppMain.cpp.
    1. Update
      1. Create a new holographicFrame from m_holographicSpace
      2. Get prediction of where holographic cameras will be when this frame is presented by holographicFrame.CurrentPrediction()
      3. Recreate resource views and depth buffers as needed
      4. Get coordinate system to use as basis for rendering
      5. Process gaze and gesture input
      6. Process time-based updates in StepTimer class. e.g. position and rotate holograms
      7. Update constant buffer data
      8. Returns the holographicFrame
    2. Use Render function which takes the holographicFrame to render the current frame to each holographic camera, according to the current app and spatial positioning state

Classes in Windows.Perception.* namespace

Name Description
HolographicSpace Portal into the holographic world.
- control immersive rendering
- receives holographic camera data
- provides access to spatial reasoning APIs
SpatialLocator Represents the MR device, tracks its motion, and provides coordinate system understood by its location
- create stationary frame of reference with the origin placed at the device’s position when the app is launched
SpatialAnchor - provides coordinate systems, with or without easing adjustments applied
- could be stored using SpatialAnchorStore class
SpatialSurfaceObserver provides information about surfaces in application-specified regions of space near the user, in the form of SpatialSurfaceInfo objects.

Spatial Mapping Sample

  1. Set up a SpatialSurfaceObserver
    1. Ensure have the device’s permission
    2. Provide spatial volumes (e.g. cube or sphere) to define the regions of interest where the app wants to receive spatial mapping data, using SetBoundingVolumes method
      • volume with world-locked spatial coordinate system: for fixed region of the physical world
      • volume with body-locked spatial coordinate system: for region moves with user (but not rotates)
    3. Register for the ObservedSurfacesChanged event
  2. Retrieve spatial surfaces information
    • subscribe the event: for fixed region of physical world
    • polling: for dynamic region of physical world
    • SpatialSurfaceInfo describes a single extant spatial surface, including a unique ID, bounding volume and time of last change.
    • GetObservedSurfaces of observer returns a map of <GUID, SpatialSurfaceInfo>
  3. Process asynchronous mesh request
    1. TryComputeLatestMeshAsync of each SpatialSurfaceInfo asynchronously return one SpatialSurfaceMesh object, which contains several SpatialSurfaceMeshBuffer data to represents the triangle mesh vertex parameters
    2. hole filling, hallucination removal, smoothing, plane finding for meshes data

Classes in Windows.Perception.* namespace

Class Functions
SpatialSurfaceObserver provides information about surfaces in application-specified regions of space near the user, in the form of SpatialSurfaceInfo objects.
SpatialSurfaceInfo describes a single extant spatial surface, including a unique ID, bounding volume and time of last change. It will provide a SpatialSurfaceMesh asynchronously upon request.
SpatialSurfaceMeshOptions contains parameters used to customize the SpatialSurfaceMesh objects requested from SpatialSurfaceInfo.
SpatialSurfaceMesh represents the mesh data for a single spatial surface. The data for vertex positions, vertex normals and triangle indices is contained in member SpatialSurfaceMeshBuffer objects.
SpatialSurfaceMeshBuffer wraps a single type of mesh data.
Main Working Procedure

Samples using OpenXR API

UWP and Win32 sample based on OpenXR

Procedure of sample using OpenXR API.

  1. Application started, decide API and extensions usages
  2. Create instance
  3. Create session, and main logics happen here
  4. Destroyed session, instance, and close application

More info about the working procedure could be found here

Game Engine

Game engines are mainly used to development games, but they are also very suitable to develop Mixed Reality applications.

Unity3d

Unity3d has the built-in XR plugins to build an application for different platforms and devices. There is also a Windows XR plugin which calls the WinRT API. Both of them are deprecated or in legacy.

Staring from version 2020.3 LTS, Unity3d has enabled the Unity OpenXR plugin for Mixed Reality development, and starting in Unity 2021.2, Unity OpenXR plugin will be the only supported Unity backend for targeting HoloLens 2 and Windows MR headsets. Unity OpenXR plugin only provides the access to those OpenXR core features defined in the standards. To fully get access to features of HoloLens and WinMR, the Mixed Reality OpenXR plugin is required, which calls the MSFT-vendor parts of OpenXR APIs.

The Unity C# APIs provided by the 2 OpenXR plugins are as follows.

Unity has the MRTK to help develop applications.

Unreal

Staring from version 4.26, Unreal supports OpenXR APIs to develop for HoloLens and Mixed Reality headsets.

Unreal also has the MRTK to help develop applications. The MRTK-Unreal 0.12 supports OpenXR projects.

MRTK

MRTK, Mixed Reality Toolkit, is a Microsoft-driven open source project that provides a set of components and features, used to accelerate cross-platform MR app development in Unity and Unreal.

Starting version 2.7, the Unity-MRTK supports OpenXR plugin. The current structure of Unity is as follows.

Library

StereoKit

StereoKit is a library that is built on top of OpenXR APIs. It is a wrapper of OpenXR APIs and provides easy-to-use C# interfaces for developers. Developers could use NuGet package to install the library.

Compared to developing on native OpenXR APIs, StereoKit encapsules those low-level APIs. While compared to engines, StereoKit is more lightweight, more focused on XR development, and provide only-code based development.

App Cycle

This section takes the Holographic D3D 11 UWP Application template (C++/WinRT) as an example, to show what is the app cycle for an Mixed Reality app rendering a stable spinning cube in space.

General procedure

A Holographic template application is like a Direct3D application. The main logics is to iterate following actions for each frame’s rendering.

  1. Maintain the holograms’ pose, appearance
  2. Retrieve the stereo rendering cameras (for HoloLens, they refer to left and right display) poses
  3. Set the holograms’s pose and stereo rendering cameras’s view and projection matrices to rendering graphics pipeline.

App structure

Main loop

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// AppView.cpp

// This method is called after the window becomes active.
// It calls Update() and Render() of main app in a loop.
void AppView::Run() {
HolographicFrame previousFrame { nullptr };

while (!m_windowClosed) {
// Calculating the model pose under SpatialStationaryReferenceOfFrame
HolographicFrame currentFrame = m_main->Update(previousFrame);

// Calculating each rendering camera's viewProjection matrix,
// attach them and model transform to rendering graphics pipeline
m_main->Render(currentFrame);
previousFrame = currentFrame;
}
}

AppMain logics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// HolographicTemplateAppMain.cpp

// Updates the application state once per frame.
HolographicFrame HolographicTemplateAppMain::Update(HolographicFrame const& previousFrame) {
// Start point for a new frame
HolographicFrame holographicFrame = m_holographicSpace.CreateNextFrame();

// Get prediction of where holographic cameras will be when this frame is presented.
HolographicFramePrediction prediction = holographicFrame.CurrentPrediction();

// Set pose for holograms according to the Head gaze.
// Poses are under the SpatialStationaryReferenceOfFrame space
SpatialInteractionSourceState pointerState = m_spatialInputHandler->CheckForInput();
if (pointerState != nullptr) {
SpatialPointerPose pose = pointerState.TryGetPointerPose(
m_stationaryReferenceFrame.CoordinateSystem());
m_spinningCubeRenderer->PositionHologram(pose);
}

// Call hologram's update function to set the model transform
// (model space to SpatialStationaryReferenceOfFrame space)
m_spinningCubeRenderer->Update(m_timer);

return holographicFrame;
}

// Renders the current frame to each holographic camera,
// according to the current application and spatial positioning state.
bool HolographicTemplateAppMain::Render(HolographicFrame const& holographicFrame) {
// Up-to-date frame predictions enhance the effectiveness of image stabilization and
// allow more accurate positioning of holograms.
holographicFrame.UpdateCurrentPrediction();
HolographicFramePrediction prediction = holographicFrame.CurrentPrediction();

// for each rendering target (HoloLens display)
for (HolographicCameraPose const& cameraPose : prediction.CameraPoses()) {
// Set the viewProjection transform
// (SpatialStationaryReferenceOfFrame space to rendering plane) based on current camera pose
pCameraResources->UpdateViewProjectionBuffer(
m_deviceResources,
cameraPose,
m_stationaryReferenceFrame.CoordinateSystem());

// Attach the viewProjection transform for this camera to the graphics pipeline
bool cameraActive = pCameraResources->AttachViewProjectionBuffer(m_deviceResources);

// Call holograms render function, which will attach the model transform to graphics pipeline
m_spinningCubeRenderer->Render();
}
}

Holograms logics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
// SpinningCubeRenderer.cpp

// This function positions the world-locked hologram two meters in front of the user's heading.
// The position in under the SpatialStationaryReferenceOfFrame space.
void SpinningCubeRenderer::PositionHologram(SpatialPointerPose const& pointerPose)
{
// Get the gaze direction relative to the given coordinate system.
const float3 headPosition = pointerPose.Head().Position();
const float3 headDirection = pointerPose.Head().ForwardDirection();

// The hologram is positioned two meters along the user's gaze direction.
constexpr float distanceFromUser = 2.0f; // meters
const float3 gazeAtTwoMeters = headPosition + (distanceFromUser * headDirection);

// This will be used as the translation component of the hologram's model transform.
// current position is under the SpatialStationaryReferenceOfFrame (SpatialStationaryReferenceOfFrame)
m_position = gazeAtTwoMeters;
}

// Called once per frame. Rotates the cube, and calculates and sets the model transform.
// This is the transform from model space to SpatialStationaryReferenceOfFrame space.
void SpinningCubeRenderer::Update(DX::StepTimer const& timer) {
// use m_position and radians to get modelTranslation and modelRotation.
// This modelTransform is to transform coordinate from model space to SpatialStationaryReferenceOfFrame space.
const XMMATRIX modelTransform = XMMatrixMultiply(modelRotation, modelTranslation);

// store into buffer data
SMStoreFloat4X4(&m_modelConstantBufferData.model, XMMatrixTranspose(modelTransform))

// store buffer data into buffer
Microsoft::WRL::ComPtr<ID3D11Buffer> m_modelConstantBuffer
}

// Renders one frame using the vertex and pixel shaders.
void SpinningCubeRenderer::Render() {
ID3D11DeviceContext* context = m_deviceResources->GetD3DDeviceContext();

// Apply the model constant buffer to the vertex shader.
context->VSSetConstantBuffers(
0,
1,
m_modelConstantBuffer.GetAddressOf()
);
}

CameraResource logics

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
// CameraResources.cpp

// Updates the view/projection constant buffer for a holographic camera.
// The viewProjection transform is to transform coordinate from
// SpatialStationaryReferenceOfFrame space to 2D rendering plane
void DX::CameraResources::UpdateViewProjectionBuffer(
std::shared_ptr<DX::DeviceResources> deviceResources,
HolographicCameraPose const& cameraPose,
SpatialCoordinateSystem const& coordinateSystem)
{
// The system changes the viewport on a per-frame basis for system optimizations.
m_d3dViewPort = cameraPose.ViewPort();

// The projection transform for each frame is provided by the HolographicCameraPose.
// This cameraProjectionTransform is to transform coordinate from rendering camera space to 2D rendering plane
HolographicStereoTransform cameraProjectionTransform = cameraPose.ProjectionTransform();

// Get a container object with the view and projection matrices for
// the given pose in the SpatialStationaryReferenceOfFrame space.
// This returns a viewTransform which transform coordinate from
// SpatialStationaryReferenceOfFrame space to rendering camera
HolographicStereoTransform viewTransformContainer = cameraPose.TryGetViewTransform(coordinateSystem).Value();

// Store the viewProjection matrices into buffer data.
// This viewProjectionMatrices could transform coordinate from
// SpatialStationaryReferenceOfFrame space to 2D rendering plane
ViewProjectionConstantBuffer viewProjectionConstantBufferData;
XMStoreFloat4x4(&viewProjectionConstantBufferData.viewProjection[0],
XMMatrixTranspose(XMLoadFloat4x4(&viewCoordinateSystemTransform.Left) *
XMLoadFloat4x4(&cameraProjectionTransform.Left)));
XMStoreFloat4x4(&viewProjectionConstantBufferData.viewProjection[1],
XMMatrixTranspose(XMLoadFloat4x4(&viewCoordinateSystemTransform.Right) *
XMLoadFloat4x4(&cameraProjectionTransform.Right)));

// Store into
Microsoft::WRL::ComPtr<ID3D11Buffer> m_viewProjectionConstantBuffer;
}


// Gets the viewProjection constant buffer for the HolographicCamera and attaches it to the shader pipeline.
bool DX::CameraResources::AttachViewProjectionBuffer(std::shared_ptr<DX::DeviceResources>& deviceResources) {
ID3D11DeviceContext* context = deviceResources->GetD3DDeviceContext();

context->RSSetViewports(1, &m_d3dViewport);
context->VSSetConstantBuffers(
1,
1,
m_viewProjectionConstantBuffer.GetAddressOf()
);
}
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
// Namespace: Windows.Graphics.Holographic

// Represents a holographic scene, with one or more holographic cameras rendering its content.
class HolographicSpace {
// Creates a HolographicFrame for the next frame to display.
// Apps use the HolographicFrame returned here to find out the predicted positions of
// each HolographicCamera at the time of frame display,
// render their views based on that prediction
HolographicFrame CreateNextFrame();
}

class HolographicFrame {
// Gets the most recent camera location prediction for the current HolographicFrame.
HolographicFramePrediction CurrentPrediction();

// Computes an updated prediction for the CurrentPrediction property.
void UpdateCurrentPrediction();
}

class HolographicFramePrediction {
// Gets the camera poses that correspond with the time specified by Timestamp.
IVectorView<HolographicCameraPose> CameraPoses();

// Gets the time when the camera poses are predicted to be displayed.
PerceptionTimestamp Timestamp();
}

class HolographicCameraPose {
// Viewport rectangle that the app must render to for this camera in this frame.
Rect Viewport();

// The stereo projection transform for left/right display, more like the intrinsics.
HolographicStereoTransform ProjectionTransform();

// Gets the stereo view transform for this camera pose,
// expressed as a transform from the specified coordinate system to the left/right display.
// This method will return null if the specified coordinate system cannot be located at the moment.
IReference<HolographicStereoTransform> TryGetViewTransform(SpatialCoordinateSystem const& coordinateSystem);
}

// Contains view or projection transforms for stereo rendering.
// One transform is for the left display, and the other is for the right display.
class HolographicStereoTransform {
float4x4 Left;
float4x4 Right;
}

Reference

  1. MS Mixed Reality Docs, Core Concepts, Coordinate System
  2. MS Mixed Reality Docs, Core Concepts, Spatial anchors
  3. MS Mixed Reality Docs, Core Concepts, Scene understanding
  4. MS Mixed Reality Docs, Core Concepts, Spatial Mapping
  5. MS Mixed Reality Docs, Core Building Blocks, Spatial mapping in DirectX
  6. OpenXR Home Page
  7. OpenXR Specification
  8. OpenXR 1.0 Reference Guide
  9. Youtube video, Updates on OpenXR for Mixed Reality