← Home
Spam Email Filter
📅 2022 · Tech: Statistics, Machine Learning, Python
In this project, I developed an email spam filter using classification
techniques. The goal was to build accurate models to distinguish
between legitimate and spam emails, improving user experience and mail
security.
Highlights
-
Bayesian Baseline: Implemented a probabilistic
model as a benchmark for classification accuracy.
-
SVM Model: Designed and tuned an SVM-based
classifier to better separate spam from ham emails.
-
Data Preprocessing: Cleaned and tokenized email
text, extracted features, and structured training data.
-
Model Development: Built and tuned both models
using real-world datasets, improving with feature engineering and
validation.
-
Evaluation & Comparison: Measured performance using
precision, recall, F1-score, and accuracy to compare the models’
effectiveness.
-
Real-world Applicability: Demonstrated relevance
for mail services seeking to reduce inbox spam automatically.
Outcomes
The project resulted in a functional spam classification pipeline
using both probabilistic and margin-based models. The SVM model showed
significant improvements over the baseline.