AI/ML Scraper

YoutubeScrapper

An R&D project to explore alternative content recommendation algorithms. It scrapes public YouTube metadata to build a graph of related videos based on semantic similarity of transcripts and tags, rather than view counts.

Flutter J2EE MySQL GitLab

Problem

YouTube's native algorithm often creates filter bubbles. We wanted to find a way to surface semantically related content that a user might otherwise miss.

Solution

A Java Enterprise backend that manages a fleet of scrapers. Data is processed to extract keywords and entities, which are then used to calculate cosine similarity scores between videos.

# System Architecture

Classic 3-tier Architecture. J2EE application server manages business logic and scraper scheduling. MySQL stores video metadata and relationship graphs. Flutter client consumes a REST API.

Key Features

Metadata Scraping Engine
Semantic Similarity Calculation
Cross-platform Mobile UI
Keyword Extraction
History Tracking

Monolith

Auth

Core

Data

Jobs

Figure 1.0: YoutubeScrapper High-Level Architecture

# User Flow & Journey

Search

Step 01

User inputs topic keyphrase.

Processing

Step 02

Backend searches local index or triggers scraper.

Analysis

Step 03

Similarity engine ranks related videos.

Display

Step 04

List of highly relevant, often overlooked videos shown.

Flow Analysis

Search and discover flow.

Discuss this Project

← Back to Projects Contact Me →