SYS_ARCHITECTURE

Architecture

A zero-server pipeline. Webcam frames enter the browser, get processed by WebAssembly, and produce landmark coordinates that drive your React components — all in under 16ms.

The Pipeline

data_flow

Webcam→MediaPipe WASM→21 Landmarks→Spring Physics→DOM Update

Layer 01

KineEngine — The Singleton Core

The engine is a singleton that downloads the MediaPipe HandLandmarker WASM model once and reuses it across all page navigations. It manages the video element reference and exposes a single detectHands() method that returns 21 landmark coordinates per hand per frame.

system.out

// Singleton — model downloads once, persists across navigations
const engine = KineEngine.getInstance();
await engine.initialize(videoElement);

// Called every frame at ~60fps
const result = engine.detectHands(performance.now());
// → { landmarks: [{ x, y, z }[21]], handedness: ["Left"|"Right"] }

Key Design Decisions

▸Singleton pattern — The WASM model (~4MB) is loaded only once. Navigating between pages swaps the video element reference without re-downloading.
▸GPU delegation — The HandLandmarker runs on WebGL by default, keeping the main thread free for React rendering.
▸Two-hand tracking — Configured for numHands: 2 enabling bimanual gestures like Pinch to Zoom.

Layer 02

KineProvider — The React Bridge

The provider wraps your component tree with a hidden <video> element and a requestAnimationFrame loop. On each frame, it calls the engine, stores the latest landmarks in a ref, and exposes them via the useKine() hook.

system.out

// Wrap any component tree to enable gesture detection
<KineProvider>
  <AirCursor />
  <SwipeArea>{children}</SwipeArea>
</KineProvider>

// Inside any child component:
const { landmarksRef } = useKine();
// landmarksRef.current → NormalizedLandmark[21] (updated every frame)

Why a Ref Instead of State?

Landmark data updates at 60fps. Storing it in React state would trigger 60 re-renders per second across the entire tree. Instead, landmarks are stored in a mutable ref (useRef). Individual gesture components read from the ref inside their own animation loops, updating only what they need — typically a single framer-motion spring value.

Layer 03

Gesture Components — The UI Layer

Each gesture component is a self-contained module that reads landmarks from the ref, applies its own physics and thresholding logic, and drives DOM updates via Framer Motion springs.

Air Cursor

Landmark 8

Index fingertip position → cursor X/Y via spring physics

Swipe Area

Landmark 0

Wrist X velocity → lateral swipe detection with threshold

Air Scroll

Landmark 0

Wrist Y delta → scroll velocity with pinch-protection lockout

Pinch to Zoom

Landmarks 4+8

Thumb-index distance on both hands → scale factor

Landmark Reference

MediaPipe returns 21 normalized landmarks per hand. Each coordinate is in the range [0, 1] relative to the video frame. Here are the key landmarks used by Kine UI components:

hand_landmarks

Index	Name	Used By
0	Wrist	Air Scroll, Swipe Area
4	Thumb Tip	Air Cursor (pinch), Pinch to Zoom
8	Index Finger Tip	Air Cursor (position), Pinch to Zoom
12	Middle Finger Tip	Reserved

Performance Characteristics

~4MB

Model Size

Downloaded once, cached by browser

<16ms

Inference

GPU-accelerated via WebGL

Server Load

All processing runs on client