Fake News Detection System
This project develops a machine learning system that automatically detects whether a news headline is real or fake using Natural Language Processing techniques.
Problem Statement
The uncontrolled proliferation of fake and misleading news on digital platforms, particularly when disseminated by high-profile sources that possess inherent public trust, has created a critical challenge in distinguishing genuine information from disinformation.
Therefore, there is an urgent requirement for an AI-based system that can automatically analyze and classify news as real or fake based on deep linguistic patterns and textual features, offering not just a binary result but a transparent, evidence-backed assessment of its contextual trustworthiness.
Literature Review / Market Research
Fake news spreads rapidly on social media platforms where information circulates without verification. Traditional fact-checking relies on human verification which is slow and inefficient for large-scale data.
Previous research used Naive Bayes, Logistic Regression and SVM classifiers. NLP techniques analyze word frequency and writing style. Bag-of-Words and TF-IDF convert headlines into numerical vectors.
SVM performs well for high-dimensional text classification. Deep learning models like CNN and RNN need large datasets and high computation. For medium datasets, classical machine learning is efficient and interpretable.
Research Gap / Innovation
Most systems analyze full articles and require heavy computation. However users usually see only headlines. A lightweight real-time system is needed.
This project uses headline-based detection with Bag-of-Words and Linear SVM classifier. The model is fast, efficient and suitable for real-time applications.
The innovation is a simple tool allowing users to quickly verify headlines through a user interface while maintaining good accuracy.
SYSTEM METHODOLOGY
Dataset / Input
Fake and Real News Dataset from Kaggle containing approximately 14,000 headlines (7000 fake and 7000 real). Preprocessing includes lowercase conversion, punctuation removal, tokenization and Bag-of-Words feature extraction using CountVectorizer.
Model / Architecture
Each headline is converted into a feature vector and classified using Linear Support Vector Machine (SVM).
Workflow:
Input headline → preprocessing → feature extraction → SVM prediction → Real/Fake output.
Process Flow
Results & Analysis
The Dummy Classifier baseline achieved 51% accuracy, approximately equal to random guessing. The SVM model achieved 84% accuracy by learning contextual linguistic patterns from news headlines. It separates fake and real news using an optimal hyperplane in high-dimensional feature space and provides fast prediction suitable for real-time applications.
Academic Credits
Project Guide
Kirti Paliwal
Team Member 1
Aaryav Krishna
2427030754
Team Member 2
Krishnav Agrawal
2427030767