WebXR Hand Tracking — From Joints to Gestures
xrvrarwebxrHand tracking is one of the most transformative features of modern XR. Instead of reaching for a controller, you reach for the object directly. In this article, we explore the WebXR Hand Tracking API from the ground up — how to access joint data, render hands visually, detect gestures, and build natural interaction patterns.
Prerequisites
- Basic familiarity with WebXR sessions and render loops
- Working knowledge of Three.js for 3D rendering
- A WebXR-compatible headset with hand tracking (Quest 2/3/Pro, PICO 4, Magic Leap 2)
- A browser that supports the
hand-trackingfeature (Chrome for Android, Meta Quest Browser)
Device Support
| Device | Hand Tracking | Joint Count | Max Hands |
|---|---|---|---|
| Meta Quest 2 | ✅ | 25 per hand | 2 |
| Meta Quest 3 | ✅ | 26 (wrist included) | 2 |
| Meta Quest Pro | ✅ | 26 | 2 |
| PICO 4 | ✅ | 25 | 2 |
| Magic Leap 2 | ✅ controller-based | 25 | 2 |
All devices expose the same XRHand interface — the difference is in tracking quality, joint count, and occlusion handling.
Getting Hand Data
Hand tracking is an optional feature — you must request it during session creation:
const session = await navigator.xr.requestSession('immersive-vr', {
requiredFeatures: ['hand-tracking']
});
Once the session is active, each XRInputSource (representing a hand or controller) has a hand property containing an XRHand object. The XRHand is a Map-like collection of XRHandJoint → XRJointSpace entries.
// In the animation frame callback:
function onXRFrame(time: number, frame: XRFrame) {
for (const inputSource of session.inputSources) {
const hand = inputSource.hand;
if (!hand) continue; // Not a hand-tracking input source
for (const [jointName, jointSpace] of hand) {
const pose = frame.getPose(jointSpace, referenceSpace);
if (pose) {
const pos = pose.transform.position;
const rot = pose.transform.orientation;
// pos.x, pos.y, pos.z are in meters relative to referenceSpace
}
}
}
}
Joint Enumeration
The WebXR spec defines 25 joints per hand, accessible via the XRHandJoint enum:
- Fingertips:
thumb-tip,index-finger-tip,middle-finger-tip,ring-finger-tip,pinky-finger-tip - Middle phalanges:
thumb-phalanx-proximal,index-finger-phalanx-proximal, etc. - Distal phalanges:
index-finger-phalanx-distal, etc. - Base/MCP:
index-finger-metacarpal, etc. - Wrist:
wrist
On Quest 3 and Quest Pro, an additional thumb-metacarpal joint brings the total to 26.
Bulk Pose Queries with fillPoses()
Querying joints one-by-one with getPose() can be slow — 25 joints × 2 hands = 50 getPose() calls per frame. Use XRFrame.fillPoses() for bulk queries:
// Set up once
const jointSpaces: XRSpace[] = [];
for (const [, space] of hand) {
jointSpaces.push(space);
}
const jointPoses = new Float32Array(jointSpaces.length * 16); // 4x4 matrices
// Each frame
frame.fillPoses(jointSpaces, referenceSpace, jointPoses);
This is significantly faster because the runtime processes all joints in a single operation.
Rendering Hands
Once you have joint positions, the simplest visualization is a sphere-per-joint with connecting lines:
const jointGeometry = new THREE.SphereGeometry(0.01, 8, 8);
const jointMaterial = new THREE.MeshBasicMaterial({ color: 0x00aaff });
const jointMeshes: THREE.Mesh[] = [];
for (const [, jointSpace] of hand) {
const mesh = new THREE.Mesh(jointGeometry, jointMaterial);
scene.add(mesh);
jointMeshes.push(mesh);
}
// Each frame — update positions
let i = 0;
for (const [, jointSpace] of hand) {
const pose = frame.getPose(jointSpace, referenceSpace);
if (pose) {
jointMeshes[i].position.set(
pose.transform.position.x,
pose.transform.position.y,
pose.transform.position.z
);
}
i++;
}
For a more realistic look, connect joints with cylinders or line segments to form a skeleton:
// Bone connections
const boneConnections: [XRHandJoint, XRHandJoint][] = [
['wrist', 'thumb-metacarpal'],
['thumb-metacarpal', 'thumb-phalanx-proximal'],
['thumb-phalanx-proximal', 'thumb-phalanx-distal'],
['thumb-phalanx-distal', 'thumb-tip'],
['wrist', 'index-finger-metacarpal'],
// ... repeat for all fingers
];
Skinned Mesh Hands
For production applications, consider using skinned meshes — a single hand model with bones that map to the WebXR joints. Libraries like three-mesh-bvh can help with collision detection on the hand mesh.
Gesture Detection
The real power of hand tracking is gesture recognition. Here are the key patterns:
Pinch Detection
Pinch is detected by measuring the distance between thumb-tip and index-finger-tip:
function getPinchStrength(
hand: XRHand,
frame: XRFrame,
referenceSpace: XRReferenceSpace
): number {
const thumbPose = frame.getPose(hand.get('thumb-tip')!, referenceSpace);
const indexPose = frame.getPose(hand.get('index-finger-tip')!, referenceSpace);
if (!thumbPose || !indexPose) return 0;
const dx = thumbPose.transform.position.x - indexPose.transform.position.x;
const dy = thumbPose.transform.position.y - indexPose.transform.position.y;
const dz = thumbPose.transform.position.z - indexPose.transform.position.z;
const distance = Math.sqrt(dx * dx + dy * dy + dz * dz);
// 0.05 meters (~2 inches) = fully open, 0.01 meters = fully pinched
return Math.max(0, Math.min(1, 1 - (distance - 0.01) / 0.04));
}
A value of 1.0 means fully pinched. Use a threshold (e.g., 0.8) to trigger pinch events.
Point Gesture
Detect when the index finger is extended and other fingers are curled:
function isPointing(hand: XRHand, frame: XRFrame, refSpace: XRReferenceSpace): boolean {
const indexTip = frame.getPose(hand.get('index-finger-tip')!, refSpace)?.transform.position;
const indexBase = frame.getPose(hand.get('index-finger-metacarpal')!, refSpace)?.transform.position;
const middleTip = frame.getPose(hand.get('middle-finger-tip')!, refSpace)?.transform.position;
const middleBase = frame.getPose(hand.get('middle-finger-metacarpal')!, refSpace)?.transform.position;
if (!indexTip || !indexBase || !middleTip || !middleBase) return false;
// Index is extended (tip further from base than middle)
const indexLength = indexTip.distanceTo(indexBase);
const middleLength = middleTip.distanceTo(middleBase);
return indexLength > middleLength * 1.3;
}
Gesture State Machine
Raw gesture detection is noisy. Implement a state machine to smooth transitions:
enum GestureState { Idle, Pinching, Pointing, Grab }
class GestureDetector {
private state: GestureState = GestureState.Idle;
private pinchHeld = false;
update(hand: XRHand, frame: XRFrame, refSpace: XRReferenceSpace) {
const pinch = getPinchStrength(hand, frame, refSpace);
const pointing = isPointing(hand, frame, refSpace);
switch (this.state) {
case GestureState.Idle:
if (pinch > 0.8) this.transitionTo(GestureState.Pinching);
else if (pointing) this.transitionTo(GestureState.Pointing);
break;
case GestureState.Pinching:
if (pinch < 0.3) this.transitionTo(GestureState.Idle);
break;
// ...
}
}
private transitionTo(newState: GestureState) {
// Fire onEnter/onExit callbacks
this.state = newState;
}
}
Interaction Patterns
Once you have reliable gesture detection, here are the three most common interaction patterns:
Direct Touch
The hand acts as a 3D cursor. A collider (sphere or capsule) at the index-tip or palm checks for overlap with interactable objects:
const handCollider = new THREE.Mesh(
new THREE.SphereGeometry(0.02),
new THREE.MeshBasicMaterial({ visible: false })
);
scene.add(handCollider);
// In frame loop — position collider at index tip
const indexTip = frame.getPose(hand.get('index-finger-tip')!, refSpace);
if (indexTip) {
handCollider.position.copy(indexTip.transform.position);
// Check overlap with interactable objects
}
Best for: Grabbing, pushing, pressing buttons, manipulating small objects.
Ray + Pinch
A ray emanates from between the thumb and index finger (or from the palm). Pinch to select, move, release:
const rayOrigin = new THREE.Vector3();
const rayDirection = new THREE.Vector3();
// Compute ray from midpoint of thumb-index web
const thumbBase = frame.getPose(hand.get('thumb-phalanx-proximal')!, refSpace);
const indexBase = frame.getPose(hand.get('index-finger-metacarpal')!, refSpace);
if (thumbBase && indexBase) {
rayOrigin.lerpVectors(thumbBase.transform.position, indexBase.transform.position, 0.5);
// Direction = forward from palm, computed from wrist/knuckle orientation
}
Best for: Distant object selection, menu navigation, pointing.
Palm UI
Attach a 2D or 3D UI element to the palm space:
const palmSpace = hand.get('wrist'); // or 'palm' on supported devices
// The UI element follows the palm position and orientation
Best for: Quick-access menus, status indicators, tool selection.
Performance Considerations
| Optimization | Impact |
|---|---|
Use fillPoses() instead of per-joint getPose() | 3-5x faster on 25 joints |
| Only render hands when they're in view (frustum culling) | Saves GPU time |
| Reduce joint detail based on hand distance from head | LOD: 25 → 12 → 5 joints |
| Skip occlusion-handled joints (inside geometry) | Fewer draw calls |
| Pool joint meshes instead of allocating per frame | Zero GC pressure |
Frame budget tip: Hand tracking data arrives at 30-60 Hz depending on the device. If your render loop runs at 90 Hz, you don't need to update hand visuals every frame — interpolation between samples is fine.
Series Cross-References
- Building a WebXR App with Three.js — Integrate hand tracking into a full application scaffold
- WebXR Hit Testing & Depth Sensing — Combine hand input with environment understanding