Back to Projects
Computer Vision

Face Mask Detection System

AI-powered real-time face mask detection using deep learning — 98% face detection + 95% mask classification accuracy

95% accuracy
Eval Score
~50ms/image
Latency
9,525 images
Dataset
On-device
Cost/call
System Overview

A comprehensive Face Mask Detection System that uses state-of-the-art deep learning models to detect whether people are wearing masks. The system processes static images, video files, and live webcam feeds in real time.

The pipeline uses a two-stage approach: first, a Caffe DNN Single Shot MultiBox Detector (SSD) model detects all faces in the frame with 98%+ accuracy. Each detected face ROI is then cropped, preprocessed to 224×224 pixels, and classified as Mask / No Mask by a fine-tuned VGG16 model achieving 95%+ accuracy.

The system is deployed as a Flask web application with a modern, responsive UI supporting photo upload, camera capture, video upload with background processing, and a live MJPEG webcam stream. The dataset (9,525 images) was published on Kaggle by the author and used for training, testing, and validation. Color-coded bounding boxes (green = mask, red = no mask) annotate the output in real time.

Architecture
Input (Image / Video / Webcam)
        ↓
Face Detection (Caffe DNN SSD — 300×300)
        ↓
Face Cropping & Preprocessing (224×224, VGG16 normalize)
        ↓
Batch Prediction (VGG16 — Binary Sigmoid)
        ↓
Post-processing & Annotation (bounding boxes, labels)
        ↓
Output (Annotated Image / Video / Stream)
Tech Stack
TensorFlowKerasVGG16OpenCVCaffe DNN SSDFlaskPythonNumPy
Key Highlights
Two-stage pipeline: Caffe DNN SSD face detector → VGG16 mask classifier
98% face detection accuracy + 95% mask classification accuracy
Real-time webcam stream via MJPEG with live bounding box overlays
Dataset of 9,525 images published on Kaggle by the author
Transfer learning: VGG16 pretrained on ImageNet, fine-tuned on custom dataset
Flask web app with photo upload, camera capture, and video processing
Background video processing with progress tracking
Color-coded bounding boxes: green (mask) / red (no mask)
Evaluation Metrics
95%
accuracy
94%
precision
96%
recall
LATENCY~50ms per image (CPU)
[01]Low accuracy on partially occluded faces — mitigated with aggressive augmentation including random occlusion patches during training
[02]False positives for face-like objects (e.g. posters) — fixed by raising the face detection confidence threshold to 0.7
[03]Camera shows black screen in browser — resolved by ensuring the app is accessed via localhost (not 127.0.0.1) and checking browser camera permissions
What I Learned
[01]Transfer Learning — fine-tuning a pre-trained VGG16 model on a custom binary classification dataset
[02]Two-stage detection pipeline — combining a Caffe DNN SSD detector with a CNN classifier
[03]Data Collection & Annotation — building and publishing a 9,525-image dataset on Kaggle
[04]Real-time video processing — MJPEG streaming with OpenCV frame-by-frame inference
[05]Flask backend architecture — request handling, file upload management, background processing
[06]Data Augmentation — rotation, zoom, flip, brightness adjustment for diverse condition robustness
[07]Model Evaluation — precision, recall, F1, confusion matrix analysis across demographic groups
Project Preview
screenshot 1
1 / 8
thumb 1
thumb 2
thumb 3
thumb 4
thumb 5
thumb 6
thumb 7
thumb 8