Overview
This project implements a custom AI model that can identify and classify 15 different types of vehicles in real-time using YOLOv5 deep learning technology. The system achieves 0.8 mAP accuracy across multiple vehicle brands and types, demonstrating the effectiveness of modern computer vision techniques for automotive applications.
We developed and compared four different YOLOv5 model variants (s, m, l, x) to find the optimal configuration for vehicle detection tasks. The project includes comprehensive dataset creation, model training, and performance analysis using Google Colab's free GPU resources.
What is YOLO?
YOLO (You Only Look Once) revolutionized object detection by processing images in a single pass, making it incredibly fast for real-time applications. Unlike traditional methods that scan images multiple times, YOLO divides the image into a grid and predicts objects within each cell simultaneously.
How YOLO Works:
- Grid Division: Splits image into M×M grid cells
- Object Detection: Each cell predicts objects within it
- Classification: Identifies what type of object it found
- Confidence Scoring: Rates how sure it is about each detection
Problem Statement
Can we train an artificial intelligence model to distinguish between different car brands and types as accurately as a human expert?
We decided to tackle the challenging problem of multi-class vehicle detection - not just finding cars in images, but identifying their specific brands (BMW, Ford, Toyota, etc.) and types (Coupe, Sedan, SUV, Truck) simultaneously.
Key Challenges:
- Multi-class Detection: Distinguishing between 15 different vehicle categories
- Brand Recognition: Identifying subtle visual differences between car manufacturers
- Type Classification: Categorizing vehicles by body style and function
- Real-time Performance: Achieving fast inference speeds for practical applications
- Dataset Quality: Creating high-quality labeled training data
Solution
Our approach employed YOLOv5 with a comprehensive methodology covering dataset creation, model training, and performance evaluation across multiple configurations.
Dataset: 15 Vehicle Categories
Vehicle Brands (11 Classes)
Acura, BMW, Chevrolet, Ford, Honda, Infinity, Lexus, Mercedes, Nissan, Subaru, Toyota
Vehicle Types (4 Classes)
Coupe, Sedan, SUV, Truck
Research Methodology
Dataset Creation
1000+ images manually labeled using professional annotation tools
Model Training
4 YOLOv5 variants tested with different configurations
Performance Analysis
Comprehensive evaluation across metrics and configurations
Insights & Learning
Key findings for optimal model configuration and deployment
Technical Implementation
Step-by-Step Implementation
1. Environment Setup
Set up the development environment using Google Colab for free GPU access.
# Mount Google Drive for data storage
from google.colab import drive
drive.mount('/content/gdrive')
# Clone YOLOv5 repository
!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!pip install -r requirements.txt
2. Dataset Preparation
Collected and manually labeled 1000+ vehicle images using professional annotation tools.
- Data Collection: 1000+ vehicle images from various sources
- Annotation: Bounding boxes using labelImg tool
- Data Split: 70% Train / 20% Val / 10% Test
# Dataset Configuration (data.yaml)
train: ../train/images
val: ../valid/images
test: ../test/images
nc: 15 # number of classes
names: ['BMW', 'Ford', 'Toyota', 'Coupe', 'Sedan', 'SUV', ...]
3. Model Training
Trained multiple YOLOv5 variants to find the best performing configuration.
# Train YOLOv5s model
!python train.py --img 640 --batch 16 --epochs 300 \
--data '../data.yaml' \
--weights yolov5s.pt \
--name vehicle_detection \
--cache
4. Testing & Evaluation
After training, we tested our models on unseen data and analyzed their performance.
# Run detection on test images
!python detect.py --weights runs/train/vehicle_detection/weights/best.pt \
--img 640 --conf 0.5 \
--source ../test/images
YOLOv5 Architecture Details
Key Innovation: Single Pass Detection
- Grid Division: Image split into M×M cells
- Direct Prediction: Each cell predicts objects within it
- Real-time Speed: No multiple passes needed
- End-to-end: 24 conv + 2 FC layers
YOLOv5 Improvements
- PyTorch Framework: Modern implementation
- Multi-scale Detection: 13×13, 26×26, 52×52 grids
- Advanced Augmentation: Mosaic, MixUp techniques
- Model Variants: s/m/l/x for different needs
Results & Performance
Key Achievements
- 0.8 mAP Score: Mean Average Precision across all vehicle classes
- 4 Model Variants: YOLOv5s, m, l, x compared systematically
- 15 Vehicle Classes: Brands + Types detected simultaneously
- <7h Training Time: Even for the largest models on Colab
What We Discovered
Best Model Performance
- YOLOv5s performed surprisingly well for our dataset size
- YOLOv5l showed best loss function optimization
- All models achieved 0.8 mAP with proper training
- Validation accuracy exceeded training accuracy (good generalization!)
Training Insights
- 300 epochs were sufficient for convergence
- Transfer learning significantly improved results
- Confidence thresholds of 0.5-0.9 worked best
- No overfitting observed across any model variant
Image Size Impact
- 416×416 and 640×640 both achieved 0.8 mAP
- Larger images showed better localization accuracy
- Training time difference was manageable on Colab
- Model robustness across input sizes confirmed
Model Variant Comparison
All four YOLOv5 variants achieved similar performance:
- YOLOv5s: 7M params, 0.8 mAP - Best for small datasets
- YOLOv5m: 21M params, 0.8 mAP - Balanced performance
- YOLOv5l: 47M params, 0.8 mAP - Best loss optimization
- YOLOv5x: 87M params, 0.8 mAP - Highest capacity
Lessons Learned
Key Takeaways
- For Small Datasets: YOLOv5s is optimal - smaller models can achieve similar performance with proper training
- Transfer Learning Works: Pre-trained weights significantly improve both training speed and final accuracy
- Google Colab Viable: Free GPU resources sufficient for academic deep learning projects
- Data Quality Matters: High-quality annotations and systematic dataset organization are crucial
Technical Insights
- Parameter Tuning: Proper selection of confidence thresholds and image sizes is critical
- Model Selection: For academic projects, smaller models often perform as well as larger ones
- Training Strategy: 300 epochs with transfer learning provides optimal results
- Evaluation Metrics: mAP provides comprehensive performance assessment across all classes
Practical Applications
- YOLO is excellent for real-time vehicle detection systems
- Multi-class detection enables sophisticated automotive applications
- Cloud-based training makes deep learning accessible for academic projects
- Proper dataset creation is more important than model complexity
Try It Yourself
Everything you need to replicate this project is available in the interactive Jupyter notebook. The notebook includes complete Google Colab setup instructions, dataset preparation guide, step-by-step training code with explanations, and performance analysis.
Requirements
- Google account (for Colab access)
- Basic Python knowledge
- Understanding of machine learning concepts
- ~2-4GB of Google Drive storage
- Time: 4-6 hours for full implementation
Quick Start Guide
- Open notebook in Google Colab
- Run environment setup cells
- Upload your dataset (or use ours)
- Execute training cells
- Test your trained model!
Tips for Success
- Start Small: Begin with YOLOv5s for faster training and testing
- Use Transfer Learning: Always start with pre-trained weights for better results
- Monitor Training: Watch the loss curves to ensure proper convergence
- Quality Data: Invest time in high-quality dataset annotation