Can you do more with 10 gigabytes of lyrics than just sing along? We Detjon, Ivan and Luca asked ourselves this question. Our answer is: Yes, you can! In an exciting group project, we not only analyzed huge amounts of song lyrics, but also developed an interactive website. This is based on the style of the well-known platform Genius, but has a special twist: this is where machine learning meets music analysis.

From lyrics to data points: The path from data set to application

Our project began with a huge data set: over 10 gigabytes of song lyrics from various artists, genres and decades. Before we could start analyzing the data, we first had to prepare it. We filtered out irrelevant content, standardized formats and prepared the texts for machine learning.

Core Concept

While digging into our song lyrics dataset, we stumbled upon some surprisingly entertaining findings. Alongside standard attributes like artist, views, year, features, and genre, the data occasionally took wild detours into the bizarre. For instance, some “artists” listed were actually authors, politicians, and even fictional characters. Apparently, someone decided that a speech by a senator counts as a top-charting single.

Even more amusing: a handful of songs were apparently released in the year 10 — yes, ten.. We’re still trying to figure out if Julius Caesar had a Spotify account. These unexpected quirks made analyzing the data not only insightful but genuinely hilarious.

Want to explore the raw dataset yourself?
👉 Check it out on Kaggle

Stay tuned—we’ll be sharing more lyrical oddities and what they might (or might not) tell us about music, culture, and data quality.

🎯 Our Goals Were Clear

  • Recognize frequencies and trends in song lyrics
  • Examine linguistic patterns, for example the choice of words according to genre
  • Use machine learning models to analyze or classify songs
  • Visualize the whole thing interactively on a dedicated website

Technology meets music: this is how we did it

We applied various methods from the fields of explorative data analysis and machine learning:

Text Classification - Which genre fits this song? We analyzed lyrics and trained models that predict a song’s genre based on linguistic features such as word choice, length, and tone. This allows us to automatically assign genres to new texts without any audio input.

Sentiment Analysis – Is the text more positive or melancholic? Using NLP tools like TextBlob, we examined whether a song expresses a positive, neutral, or negative mood. This enabled us to explore how emotional tone has evolved over time in popular music.

Core Concept

We analyzed how sentiment in German song lyrics evolved since the 1950s.
👉 The results show a clear decline in mood — lyrics have become noticeably more melancholic over the decades.

Word Embeddings – Which words often appear in similar contexts? With models like Word2Vec, we revealed semantic relationships between words. For example, words like love, heart, and pain frequently appeared together across various genres, highlighting common lyrical themes.

The website: Analysis meets design

„What is the best analysis if nobody can use it?“ That’s why we presented the results on a Flask-based web app. It has the look and feel of the Genius platform, but offers significantly more interactivity:

  • Dynamic plots that respond to user actions
  • Filter options by genre, year or artist
  • Direct insight into the results of machine learning

This turns pure data into real insights accessible and understandable for all music fans and data enthusiasts.

Our conclusion: data is worth a thousand words

Music is emotional, but it also follows patterns. With our project, we wanted to show how much potential there is in song lyrics if you look at them in a data-driven way. From word choice to mood, from artist style to genre progression: Data analysis and machine learning can make music readable in a whole new way.

💬 What about you?

Which song would you analyze and why?
Feel free to reach us via send us an email — we’re curious to hear your lyrical favorites!