Event language
UI language
Deploying machine learning models to web browsers presents unique challenges: bundle size limits, not-so-good compute capabilities (we aim to p50 users, not to the flagship devices) and the need for real-time performance without server dependencies. <br>This talk shares our experience building a production document corner detection system that runs entirely client-side, achieving <20ms inference on desktop and <50ms on mobile browsers. <br><br>Three key engineering decisions will be covered: <br><br>First, why we adapted Google's BlazeFace architecture - originally designed for face detection - using 5×5 depthwise separable convolutions that provide excellent accuracy-to-compute ratios on mobile GPUs, achieving ~5px corner localization accuracy on MIDV document datasets. <br><br>Second, how we achieved 88% size reduction in ONNX Runtime WebAssembly (15MB -> 1.8MB) by building custom binaries with only the 21 operators our model requires, plus techniques like disabling RTTI, exceptions, and using ORT format for pre-optimized graphs.<br><br>Third, why we chose heatmap regression over direct coordinate prediction, enabling sub-pixel corner precision critical for downstream perspective correction.<br><br>Practical things include: <br>- why we should embed post-processing into ONNX exports (avoiding JavaScript argmax overhead)<br>- when to use SIMD-only vs. threaded WASM builds<br>- Docker image for reproducible wasm builds<br><br>I will demonstrate the complete pipeline from PyTorch training through browser deployment.