← Home

Spam Email Filter

📅 2022 · Tech: Statistics, Machine Learning, Python

In this project, I developed an email spam filter using classification techniques. The goal was to build accurate models to distinguish between legitimate and spam emails, improving user experience and mail security.

Highlights

Bayesian Baseline: Implemented a probabilistic model as a benchmark for classification accuracy.
SVM Model: Designed and tuned an SVM-based classifier to better separate spam from ham emails.
Data Preprocessing: Cleaned and tokenized email text, extracted features, and structured training data.
Model Development: Built and tuned both models using real-world datasets, improving with feature engineering and validation.
Evaluation & Comparison: Measured performance using precision, recall, F1-score, and accuracy to compare the models’ effectiveness.
Real-world Applicability: Demonstrated relevance for mail services seeking to reduce inbox spam automatically.

Outcomes

The project resulted in a functional spam classification pipeline using both probabilistic and margin-based models. The SVM model showed significant improvements over the baseline.

Attachments

📄 Full Report: Download PDF