ID: 2427030754, 2427030767

Fake News Detection System

This project develops a machine learning system that automatically detects whether a news headline is real or fake using Natural Language Processing techniques.

Problem Statement

The uncontrolled proliferation of fake and misleading news on digital platforms, particularly when disseminated by high-profile sources that possess inherent public trust, has created a critical challenge in distinguishing genuine information from disinformation.

Therefore, there is an urgent requirement for an AI-based system that can automatically analyze and classify news as real or fake based on deep linguistic patterns and textual features, offering not just a binary result but a transparent, evidence-backed assessment of its contextual trustworthiness.

Literature Review / Market Research

Fake news spreads rapidly on social media platforms where information circulates without verification. Traditional fact-checking relies on human verification which is slow and inefficient for large-scale data.

Previous research used Naive Bayes, Logistic Regression and SVM classifiers. NLP techniques analyze word frequency and writing style. Bag-of-Words and TF-IDF convert headlines into numerical vectors.

SVM performs well for high-dimensional text classification. Deep learning models like CNN and RNN need large datasets and high computation. For medium datasets, classical machine learning is efficient and interpretable.

Research Gap / Innovation

Most systems analyze full articles and require heavy computation. However users usually see only headlines. A lightweight real-time system is needed.

This project uses headline-based detection with Bag-of-Words and Linear SVM classifier. The model is fast, efficient and suitable for real-time applications.

The innovation is a simple tool allowing users to quickly verify headlines through a user interface while maintaining good accuracy.

SYSTEM METHODOLOGY

Dataset / Input

Fake and Real News Dataset from Kaggle containing approximately 14,000 headlines (7000 fake and 7000 real). Preprocessing includes lowercase conversion, punctuation removal, tokenization and Bag-of-Words feature extraction using CountVectorizer.

Model / Architecture

Each headline is converted into a feature vector and classified using Linear Support Vector Machine (SVM).

Workflow: Input headline → preprocessing → feature extraction → SVM prediction → Real/Fake output.

Live Execution

Click below to run the working model on Google Colab.

VIEW CODE / DEMO

Process Flow

Input Headline

Preprocessing (Lowercase, Remove Punctuation)

Bag-of-Words Vectorization

Linear SVM Model

Binary Prediction (Real / Fake)

Results & Analysis

Our Model (SVM) 84%

The Dummy Classifier baseline achieved 51% accuracy, approximately equal to random guessing. The SVM model achieved 84% accuracy by learning contextual linguistic patterns from news headlines. It separates fake and real news using an optimal hyperplane in high-dimensional feature space and provides fast prediction suitable for real-time applications.

Academic Credits

Project Guide

Kirti Paliwal

Team Member 1

Aaryav Krishna

2427030754

Team Member 2

Krishnav Agrawal

2427030767