Computer Vision

Face Mask Detection System

AI-powered real-time face mask detection using deep learning — 98% face detection + 95% mask classification accuracy

View on GitHub All Projects

95% accuracy

Eval Score

~50ms/image

Latency

9,525 images

Dataset

On-device

Cost/call

System Overview

A comprehensive Face Mask Detection System that uses state-of-the-art deep learning models to detect whether people are wearing masks. The system processes static images, video files, and live webcam feeds in real time.

The pipeline uses a two-stage approach: first, a Caffe DNN Single Shot MultiBox Detector (SSD) model detects all faces in the frame with 98%+ accuracy. Each detected face ROI is then cropped, preprocessed to 224×224 pixels, and classified as Mask / No Mask by a fine-tuned VGG16 model achieving 95%+ accuracy.

The system is deployed as a Flask web application with a modern, responsive UI supporting photo upload, camera capture, video upload with background processing, and a live MJPEG webcam stream. The dataset (9,525 images) was published on Kaggle by the author and used for training, testing, and validation. Color-coded bounding boxes (green = mask, red = no mask) annotate the output in real time.

Architecture

Input (Image / Video / Webcam)
        ↓
Face Detection (Caffe DNN SSD — 300×300)
        ↓
Face Cropping & Preprocessing (224×224, VGG16 normalize)
        ↓
Batch Prediction (VGG16 — Binary Sigmoid)
        ↓
Post-processing & Annotation (bounding boxes, labels)
        ↓
Output (Annotated Image / Video / Stream)

Tech Stack

TensorFlowKerasVGG16OpenCVCaffe DNN SSDFlaskPythonNumPy

Key Highlights

Two-stage pipeline: Caffe DNN SSD face detector → VGG16 mask classifier

98% face detection accuracy + 95% mask classification accuracy

Real-time webcam stream via MJPEG with live bounding box overlays

Dataset of 9,525 images published on Kaggle by the author

Transfer learning: VGG16 pretrained on ImageNet, fine-tuned on custom dataset

Flask web app with photo upload, camera capture, and video processing

Background video processing with progress tracking

Color-coded bounding boxes: green (mask) / red (no mask)

Evaluation Metrics

95%

accuracy

94%

precision

96%

recall

LATENCY~50ms per image (CPU)

[01]Low accuracy on partially occluded faces — mitigated with aggressive augmentation including random occlusion patches during training

[02]False positives for face-like objects (e.g. posters) — fixed by raising the face detection confidence threshold to 0.7

[03]Camera shows black screen in browser — resolved by ensuring the app is accessed via localhost (not 127.0.0.1) and checking browser camera permissions

What I Learned

[01]Transfer Learning — fine-tuning a pre-trained VGG16 model on a custom binary classification dataset

[02]Two-stage detection pipeline — combining a Caffe DNN SSD detector with a CNN classifier

[03]Data Collection & Annotation — building and publishing a 9,525-image dataset on Kaggle

[04]Real-time video processing — MJPEG streaming with OpenCV frame-by-frame inference

[05]Flask backend architecture — request handling, file upload management, background processing

[06]Data Augmentation — rotation, zoom, flip, brightness adjustment for diverse condition robustness

[07]Model Evaluation — precision, recall, F1, confusion matrix analysis across demographic groups

Project Preview

1 / 8