GitHub
SYS_ARCHITECTURE

Architecture

A zero-server pipeline. Webcam frames enter the browser, get processed by WebAssembly, and produce landmark coordinates that drive your React components — all in under 16ms.

The Pipeline

data_flow
WebcamMediaPipe WASM21 LandmarksSpring PhysicsDOM Update
Layer 01

KineEngine — The Singleton Core

The engine is a singleton that downloads the MediaPipe HandLandmarker WASM model once and reuses it across all page navigations. It manages the video element reference and exposes a single detectHands() method that returns 21 landmark coordinates per hand per frame.

system.out
$
// Singleton — model downloads once, persists across navigations
const engine = KineEngine.getInstance();
await engine.initialize(videoElement);

// Called every frame at ~60fps
const result = engine.detectHands(performance.now());
// → { landmarks: [{ x, y, z }[21]], handedness: ["Left"|"Right"] }
Key Design Decisions
  • Singleton pattern — The WASM model (~4MB) is loaded only once. Navigating between pages swaps the video element reference without re-downloading.
  • GPU delegation — The HandLandmarker runs on WebGL by default, keeping the main thread free for React rendering.
  • Two-hand tracking — Configured for numHands: 2 enabling bimanual gestures like Pinch to Zoom.
Layer 02

KineProvider — The React Bridge

The provider wraps your component tree with a hidden <video> element and a requestAnimationFrame loop. On each frame, it calls the engine, stores the latest landmarks in a ref, and exposes them via the useKine() hook.

system.out
$
// Wrap any component tree to enable gesture detection
<KineProvider>
  <AirCursor />
  <SwipeArea>{children}</SwipeArea>
</KineProvider>

// Inside any child component:
const { landmarksRef } = useKine();
// landmarksRef.current → NormalizedLandmark[21] (updated every frame)
Why a Ref Instead of State?

Landmark data updates at 60fps. Storing it in React state would trigger 60 re-renders per second across the entire tree. Instead, landmarks are stored in a mutable ref (useRef). Individual gesture components read from the ref inside their own animation loops, updating only what they need — typically a single framer-motion spring value.

Layer 03

Gesture Components — The UI Layer

Each gesture component is a self-contained module that reads landmarks from the ref, applies its own physics and thresholding logic, and drives DOM updates via Framer Motion springs.

Air Cursor
Landmark 8
Index fingertip position → cursor X/Y via spring physics
Swipe Area
Landmark 0
Wrist X velocity → lateral swipe detection with threshold
Air Scroll
Landmark 0
Wrist Y delta → scroll velocity with pinch-protection lockout
Pinch to Zoom
Landmarks 4+8
Thumb-index distance on both hands → scale factor

Landmark Reference

MediaPipe returns 21 normalized landmarks per hand. Each coordinate is in the range [0, 1] relative to the video frame. Here are the key landmarks used by Kine UI components:

hand_landmarks
IndexNameUsed By
0WristAir Scroll, Swipe Area
4Thumb TipAir Cursor (pinch), Pinch to Zoom
8Index Finger TipAir Cursor (position), Pinch to Zoom
12Middle Finger TipReserved

Performance Characteristics

~4MB
Model Size
Downloaded once, cached by browser
<16ms
Inference
GPU-accelerated via WebGL
0
Server Load
All processing runs on client