A comprehensive Face Mask Detection System that uses state-of-the-art deep learning models to detect whether people are wearing masks. The system processes static images, video files, and live webcam feeds in real time.
The pipeline uses a two-stage approach: first, a Caffe DNN Single Shot MultiBox Detector (SSD) model detects all faces in the frame with 98%+ accuracy. Each detected face ROI is then cropped, preprocessed to 224×224 pixels, and classified as Mask / No Mask by a fine-tuned VGG16 model achieving 95%+ accuracy.
The system is deployed as a Flask web application with a modern, responsive UI supporting photo upload, camera capture, video upload with background processing, and a live MJPEG webcam stream. The dataset (9,525 images) was published on Kaggle by the author and used for training, testing, and validation. Color-coded bounding boxes (green = mask, red = no mask) annotate the output in real time.
98% face detection accuracy + 95% mask classification accuracy
Real-time webcam stream via MJPEG with live bounding box overlays
Dataset of 9,525 images published on Kaggle by the author
Transfer learning: VGG16 pretrained on ImageNet, fine-tuned on custom dataset
Flask web app with photo upload, camera capture, and video processing
Background video processing with progress tracking
Color-coded bounding boxes: green (mask) / red (no mask)
Evaluation Metrics
95%
accuracy
94%
precision
96%
recall
LATENCY~50ms per image (CPU)
[01]Low accuracy on partially occluded faces — mitigated with aggressive augmentation including random occlusion patches during training
[02]False positives for face-like objects (e.g. posters) — fixed by raising the face detection confidence threshold to 0.7
[03]Camera shows black screen in browser — resolved by ensuring the app is accessed via localhost (not 127.0.0.1) and checking browser camera permissions
What I Learned
[01]Transfer Learning — fine-tuning a pre-trained VGG16 model on a custom binary classification dataset
[02]Two-stage detection pipeline — combining a Caffe DNN SSD detector with a CNN classifier
[03]Data Collection & Annotation — building and publishing a 9,525-image dataset on Kaggle
[04]Real-time video processing — MJPEG streaming with OpenCV frame-by-frame inference