Back to Projects
AI/ML Scraper

YoutubeScrapper

An R&D project to explore alternative content recommendation algorithms. It scrapes public YouTube metadata to build a graph of related videos based on semantic similarity of transcripts and tags, rather than view counts.

Flutter J2EE MySQL GitLab

Problem

YouTube's native algorithm often creates filter bubbles. We wanted to find a way to surface semantically related content that a user might otherwise miss.

Solution

A Java Enterprise backend that manages a fleet of scrapers. Data is processed to extract keywords and entities, which are then used to calculate cosine similarity scores between videos.

# System Architecture

Classic 3-tier Architecture. J2EE application server manages business logic and scraper scheduling. MySQL stores video metadata and relationship graphs. Flutter client consumes a REST API.

Key Features

  • Metadata Scraping Engine
  • Semantic Similarity Calculation
  • Cross-platform Mobile UI
  • Keyword Extraction
  • History Tracking
UI
Monolith
Auth
Core
Data
Jobs
DB
Figure 1.0: YoutubeScrapper High-Level Architecture

# User Flow & Journey

Search

Step 01

User inputs topic keyphrase.

Processing

Step 02

Backend searches local index or triggers scraper.

Analysis

Step 03

Similarity engine ranks related videos.

Display

Step 04

List of highly relevant, often overlooked videos shown.

Flow Analysis

Search and discover flow.

Discuss this Project
← Back to Projects Contact Me →