Overview

This project implements a custom AI model that can identify and classify 15 different types of vehicles in real-time using YOLOv5 deep learning technology. The system achieves 0.8 mAP accuracy across multiple vehicle brands and types, demonstrating the effectiveness of modern computer vision techniques for automotive applications.

We developed and compared four different YOLOv5 model variants (s, m, l, x) to find the optimal configuration for vehicle detection tasks. The project includes comprehensive dataset creation, model training, and performance analysis using Google Colab's free GPU resources.

What is YOLO?

YOLO (You Only Look Once) revolutionized object detection by processing images in a single pass, making it incredibly fast for real-time applications. Unlike traditional methods that scan images multiple times, YOLO divides the image into a grid and predicts objects within each cell simultaneously.

How YOLO Works:

  1. Grid Division: Splits image into M×M grid cells
  2. Object Detection: Each cell predicts objects within it
  3. Classification: Identifies what type of object it found
  4. Confidence Scoring: Rates how sure it is about each detection

Problem Statement

Can we train an artificial intelligence model to distinguish between different car brands and types as accurately as a human expert?

We decided to tackle the challenging problem of multi-class vehicle detection - not just finding cars in images, but identifying their specific brands (BMW, Ford, Toyota, etc.) and types (Coupe, Sedan, SUV, Truck) simultaneously.

Key Challenges:

  • Multi-class Detection: Distinguishing between 15 different vehicle categories
  • Brand Recognition: Identifying subtle visual differences between car manufacturers
  • Type Classification: Categorizing vehicles by body style and function
  • Real-time Performance: Achieving fast inference speeds for practical applications
  • Dataset Quality: Creating high-quality labeled training data

Solution

Our approach employed YOLOv5 with a comprehensive methodology covering dataset creation, model training, and performance evaluation across multiple configurations.

Dataset: 15 Vehicle Categories

Vehicle Brands (11 Classes)

Acura, BMW, Chevrolet, Ford, Honda, Infinity, Lexus, Mercedes, Nissan, Subaru, Toyota

Vehicle Types (4 Classes)

Coupe, Sedan, SUV, Truck

Research Methodology

Dataset Creation

1000+ images manually labeled using professional annotation tools

Model Training

4 YOLOv5 variants tested with different configurations

Performance Analysis

Comprehensive evaluation across metrics and configurations

Insights & Learning

Key findings for optimal model configuration and deployment

Technical Implementation

Step-by-Step Implementation

1. Environment Setup

Set up the development environment using Google Colab for free GPU access.

# Mount Google Drive for data storage
from google.colab import drive
drive.mount('/content/gdrive')

# Clone YOLOv5 repository
!git clone https://github.com/ultralytics/yolov5
%cd yolov5
!pip install -r requirements.txt

2. Dataset Preparation

Collected and manually labeled 1000+ vehicle images using professional annotation tools.

  • Data Collection: 1000+ vehicle images from various sources
  • Annotation: Bounding boxes using labelImg tool
  • Data Split: 70% Train / 20% Val / 10% Test
# Dataset Configuration (data.yaml)
train: ../train/images
val: ../valid/images
test: ../test/images

nc: 15  # number of classes
names: ['BMW', 'Ford', 'Toyota', 'Coupe', 'Sedan', 'SUV', ...]

3. Model Training

Trained multiple YOLOv5 variants to find the best performing configuration.

# Train YOLOv5s model
!python train.py --img 640 --batch 16 --epochs 300 \
                 --data '../data.yaml' \
                 --weights yolov5s.pt \
                 --name vehicle_detection \
                 --cache

4. Testing & Evaluation

After training, we tested our models on unseen data and analyzed their performance.

# Run detection on test images
!python detect.py --weights runs/train/vehicle_detection/weights/best.pt \
                  --img 640 --conf 0.5 \
                  --source ../test/images

YOLOv5 Architecture Details

Key Innovation: Single Pass Detection

  • Grid Division: Image split into M×M cells
  • Direct Prediction: Each cell predicts objects within it
  • Real-time Speed: No multiple passes needed
  • End-to-end: 24 conv + 2 FC layers

YOLOv5 Improvements

  • PyTorch Framework: Modern implementation
  • Multi-scale Detection: 13×13, 26×26, 52×52 grids
  • Advanced Augmentation: Mosaic, MixUp techniques
  • Model Variants: s/m/l/x for different needs

Results & Performance

Key Achievements

  • 0.8 mAP Score: Mean Average Precision across all vehicle classes
  • 4 Model Variants: YOLOv5s, m, l, x compared systematically
  • 15 Vehicle Classes: Brands + Types detected simultaneously
  • <7h Training Time: Even for the largest models on Colab

What We Discovered

Best Model Performance

  • YOLOv5s performed surprisingly well for our dataset size
  • YOLOv5l showed best loss function optimization
  • All models achieved 0.8 mAP with proper training
  • Validation accuracy exceeded training accuracy (good generalization!)

Training Insights

  • 300 epochs were sufficient for convergence
  • Transfer learning significantly improved results
  • Confidence thresholds of 0.5-0.9 worked best
  • No overfitting observed across any model variant

Image Size Impact

  • 416×416 and 640×640 both achieved 0.8 mAP
  • Larger images showed better localization accuracy
  • Training time difference was manageable on Colab
  • Model robustness across input sizes confirmed

Model Variant Comparison

All four YOLOv5 variants achieved similar performance:

  • YOLOv5s: 7M params, 0.8 mAP - Best for small datasets
  • YOLOv5m: 21M params, 0.8 mAP - Balanced performance
  • YOLOv5l: 47M params, 0.8 mAP - Best loss optimization
  • YOLOv5x: 87M params, 0.8 mAP - Highest capacity

Lessons Learned

Key Takeaways

  • For Small Datasets: YOLOv5s is optimal - smaller models can achieve similar performance with proper training
  • Transfer Learning Works: Pre-trained weights significantly improve both training speed and final accuracy
  • Google Colab Viable: Free GPU resources sufficient for academic deep learning projects
  • Data Quality Matters: High-quality annotations and systematic dataset organization are crucial

Technical Insights

  • Parameter Tuning: Proper selection of confidence thresholds and image sizes is critical
  • Model Selection: For academic projects, smaller models often perform as well as larger ones
  • Training Strategy: 300 epochs with transfer learning provides optimal results
  • Evaluation Metrics: mAP provides comprehensive performance assessment across all classes

Practical Applications

  • YOLO is excellent for real-time vehicle detection systems
  • Multi-class detection enables sophisticated automotive applications
  • Cloud-based training makes deep learning accessible for academic projects
  • Proper dataset creation is more important than model complexity

Try It Yourself

Everything you need to replicate this project is available in the interactive Jupyter notebook. The notebook includes complete Google Colab setup instructions, dataset preparation guide, step-by-step training code with explanations, and performance analysis.

Requirements

  • Google account (for Colab access)
  • Basic Python knowledge
  • Understanding of machine learning concepts
  • ~2-4GB of Google Drive storage
  • Time: 4-6 hours for full implementation

Quick Start Guide

  1. Open notebook in Google Colab
  2. Run environment setup cells
  3. Upload your dataset (or use ours)
  4. Execute training cells
  5. Test your trained model!

Tips for Success

  • Start Small: Begin with YOLOv5s for faster training and testing
  • Use Transfer Learning: Always start with pre-trained weights for better results
  • Monitor Training: Watch the loss curves to ensure proper convergence
  • Quality Data: Invest time in high-quality dataset annotation