Shipping ML to the Browser: Building Real-Time Document Detection with Custom ONNX Runtime WebAssembly FOSSASIA Summit 2026

FOSSASIA Summit 2026 Sunday, 8 March, 2026 9:00 AM (Asia/Bangkok) To Tuesday, 10 March, 2026 7:30 PM (Asia/Bangkok)

Info Featured Schedule Speakers Exhibition

Shipping ML to the Browser: Building Real-Time Document Detection with Custom ONNX Runtime WebAssemblyDeploying machine learning models to web browsers presents unique challenges: bundle size limits, not-so-good compute capabilities (we aim to p50 users, not to the flagship devices) and the need for real-time performance without server dependencies. This talk shares our experience building a production document corner detection system that runs entirely client-side, achieving <20ms inference on desktop and <50ms on mobile browsers. Three key engineering decisions will be covered: First, why we adapted Google's BlazeFace architecture - originally designed for face detection - using 5×5 depthwise separable convolutions that provide excellent accuracy-to-compute ratios on mobile GPUs, achieving ~5px corner localization accuracy on MIDV document datasets. Second, how we achieved 88% size reduction in ONNX Runtime WebAssembly (15MB -> 1.8MB) by building custom binaries with only the 21 operators our model requires, plus techniques like disabling RTTI, exceptions, and using ORT format for pre-optimized graphs. Third, why we chose heatmap regression over direct coordinate prediction, enabling sub-pixel corner precision critical for downstream perspective correction. Practical things include: - why we should embed post-processing into ONNX exports (avoiding JavaScript argmax overhead) - when to use SIMD-only vs. threaded WASM builds - Docker image for reproducible wasm builds I will demonstrate the complete pipeline from PyTorch training through browser deployment.