Project Details

Few-Shot Learning for Spam Detection with Large Language Models 2025

Project Overview

This project investigates the adaptation of Large Language Models (LLMs) for binary spam detection, utilizing the SmolLM2-135M-Instruct model within a Bayesian Inverse Classification framework. Unlike discriminative approaches, this method leverages the generative capabilities of LLMs to model the likelihood of email content given a label.

We evaluate this approach through three progressive stages: Zero-Shot Learning (baseline evaluation), Naive Prompting (inference with rich context), and Full Fine-Tuning (optimizing parameters for the Enron dataset). Additionally, we analyze the implementation and mechanics of Key-Value (KV) caching in decoder-only transformers.

Technical Approach

Bayesian Inverse Classification Framework

We treat classification as a generative task where the model evaluates the likelihood of the input text conditioned on a candidate label. This approach mitigates bias often seen in direct discriminative predictions by leveraging the model's pre-trained generative priors.

Key-Value (KV) Cache Analysis

We analyze the implementation, efficiency, and memory trade-offs of KV caching during autoregressive generation. The cache stores Key and Value vectors of processed tokens, reducing computational complexity from O(t²) to O(t) per step, though it creates a memory bottleneck for long sequences.

Experimental Results

Zero-Shot Baseline

Achieved 52.25% accuracy, only marginally better than random chance for binary classification.

Naive Prompting

Attempts to improve performance via naive prompting showed limited effectiveness.

Full Fine-Tuning

Fine-tuning the model on the Enron Spam dataset demonstrated improved performance, validating the effectiveness of the Bayesian Inverse Classification approach.

Project Information

Course

CPEN 455 - Deep Learning

Dataset

Enron Spam Dataset (80% train, 20% validation)

Model

SmolLM2-135M-Instruct

Stack

Python, PyTorch, Transformers, Hugging Face, Bayesian Methods

Key Techniques

Bayesian Inverse Classification
Zero-Shot Learning
Few-Shot Learning
Full Fine-Tuning
KV Cache Optimization
Decoder-Only Transformer Architecture

Links

GitHub Repository

•

Research Paper