Project Details
Few-Shot Learning for Spam Detection with Large Language Models 2025
Project Overview
This project investigates the adaptation of Large Language Models (LLMs) for binary spam detection, utilizing the SmolLM2-135M-Instruct model within a Bayesian Inverse Classification framework. Unlike discriminative approaches, this method leverages the generative capabilities of LLMs to model the likelihood of email content given a label.
We evaluate this approach through three progressive stages: Zero-Shot Learning (baseline evaluation), Naive Prompting (inference with rich context), and Full Fine-Tuning (optimizing parameters for the Enron dataset). Additionally, we analyze the implementation and mechanics of Key-Value (KV) caching in decoder-only transformers.
Technical Approach
Bayesian Inverse Classification Framework
We treat classification as a generative task where the model evaluates the likelihood of the input text conditioned on a candidate label. This approach mitigates bias often seen in direct discriminative predictions by leveraging the model's pre-trained generative priors.
Key-Value (KV) Cache Analysis
We analyze the implementation, efficiency, and memory trade-offs of KV caching during autoregressive generation. The cache stores Key and Value vectors of processed tokens, reducing computational complexity from O(t²) to O(t) per step, though it creates a memory bottleneck for long sequences.
Experimental Results
Zero-Shot Baseline
Achieved 52.25% accuracy, only marginally better than random chance for binary classification.
Naive Prompting
Attempts to improve performance via naive prompting showed limited effectiveness.
Full Fine-Tuning
Fine-tuning the model on the Enron Spam dataset demonstrated improved performance, validating the effectiveness of the Bayesian Inverse Classification approach.
Project Information
CPEN 455 - Deep Learning
Enron Spam Dataset (80% train, 20% validation)
SmolLM2-135M-Instruct
Python, PyTorch, Transformers, Hugging Face, Bayesian Methods
- Bayesian Inverse Classification
- Zero-Shot Learning
- Few-Shot Learning
- Full Fine-Tuning
- KV Cache Optimization
- Decoder-Only Transformer Architecture