An R&D project to explore alternative content recommendation algorithms. It scrapes public YouTube metadata to build a graph of related videos based on semantic similarity of transcripts and tags, rather than view counts.
YouTube's native algorithm often creates filter bubbles. We wanted to find a way to surface semantically related content that a user might otherwise miss.
A Java Enterprise backend that manages a fleet of scrapers. Data is processed to extract keywords and entities, which are then used to calculate cosine similarity scores between videos.
Classic 3-tier Architecture. J2EE application server manages business logic and scraper scheduling. MySQL stores video metadata and relationship graphs. Flutter client consumes a REST API.
User inputs topic keyphrase.
Backend searches local index or triggers scraper.
Similarity engine ranks related videos.
List of highly relevant, often overlooked videos shown.