Architecture
A zero-server pipeline. Webcam frames enter the browser, get processed by WebAssembly, and produce landmark coordinates that drive your React components — all in under 16ms.
The Pipeline
KineEngine — The Singleton Core
The engine is a singleton that downloads the MediaPipe HandLandmarker WASM model once and reuses it across all page navigations. It manages the video element reference and exposes a single detectHands() method that returns 21 landmark coordinates per hand per frame.
// Singleton — model downloads once, persists across navigations
const engine = KineEngine.getInstance();
await engine.initialize(videoElement);
// Called every frame at ~60fps
const result = engine.detectHands(performance.now());
// → { landmarks: [{ x, y, z }[21]], handedness: ["Left"|"Right"] }- ▸Singleton pattern — The WASM model (~4MB) is loaded only once. Navigating between pages swaps the video element reference without re-downloading.
- ▸GPU delegation — The HandLandmarker runs on WebGL by default, keeping the main thread free for React rendering.
- ▸Two-hand tracking — Configured for
numHands: 2enabling bimanual gestures like Pinch to Zoom.
KineProvider — The React Bridge
The provider wraps your component tree with a hidden <video> element and a requestAnimationFrame loop. On each frame, it calls the engine, stores the latest landmarks in a ref, and exposes them via the useKine() hook.
// Wrap any component tree to enable gesture detection
<KineProvider>
<AirCursor />
<SwipeArea>{children}</SwipeArea>
</KineProvider>
// Inside any child component:
const { landmarksRef } = useKine();
// landmarksRef.current → NormalizedLandmark[21] (updated every frame)Landmark data updates at 60fps. Storing it in React state would trigger 60 re-renders per second across the entire tree. Instead, landmarks are stored in a mutable ref (useRef). Individual gesture components read from the ref inside their own animation loops, updating only what they need — typically a single framer-motion spring value.
Gesture Components — The UI Layer
Each gesture component is a self-contained module that reads landmarks from the ref, applies its own physics and thresholding logic, and drives DOM updates via Framer Motion springs.
Landmark Reference
MediaPipe returns 21 normalized landmarks per hand. Each coordinate is in the range [0, 1] relative to the video frame. Here are the key landmarks used by Kine UI components:
| Index | Name | Used By |
|---|---|---|
| 0 | Wrist | Air Scroll, Swipe Area |
| 4 | Thumb Tip | Air Cursor (pinch), Pinch to Zoom |
| 8 | Index Finger Tip | Air Cursor (position), Pinch to Zoom |
| 12 | Middle Finger Tip | Reserved |