Snoop Kovv — Snoop Dogg songs written by a Hidden Markov Model

Kai Brooks
6 min readJan 14, 2020

“Roll joints bigger than birds” — Snoop Dogg or Kovv or both

Markov Models are great. We use them in reinforcement learning, speech recognition, random walk modeling, gesture recognition, and a host of other applications critical to countless industries. However, their most important (and sadly, least recognized) use is to write Snoop Dogg songs. Today, I will correct this oversight.

Why can’t we use something like an LSTM to do this?

Oh, we absolutely could, and I’ve done so. The upside is that an LSTM (a neural network architecture) employs better context-based language understanding, such as syntax, and recognizes slang phrasings (mother vs. mutha’) as grammatically similar.

The downside of the LSTM is that it takes a lot of time and a large corpus to train correctly, and the output isn’t all that different than using a hidden Markov model, which requires no training and runs in about 10 seconds.

“This is the least professional use of MATLAB I have ever seen” — Anonymous Portland State University researcher

I just came here to download the code

I got you:

For fun, I wrote this one in MATLAB R2018a, but there’s nothing here that’s specific to MATLAB, and we could refactor this project into another language.

Markov and Me

Markov models are effectively probabilistic state machines. If we are in state A, there’s a probability of transitioning to state B, or C, or back to A, or to any number of other states.

Markov Chain (discrete time)

In the above, if we are in state A, there’s a 70% chance of transitioning to state B, and a 30% chance of transitioning back to state A. If we are in state B, there’s a 40% chance of transitioning to state A and a 60% chance of transitioning back to state B.
With this Markov chain, we can begin in any state and use probability to move around to different states, creating a path as we go. So, in this chain, a path might look like A-B-B-B-A-B-A-A… and so on until we decide to stop.
Now, this is just a two-state Markov model, but it can have any number of states and transitions, and the Markov model doesn’t necessarily connect to all others.

How does Snoop Kovv do this?

Snoop Kovv starts with a block of text (or, corpus) — the top 10 Snoop Dogg songs. For each word, it looks at what words come after it, and how often they appear. Snoop Kovv then generates a gigantic, thousand-state Markov chain, with the transition probability of any state being ‘how often does this next word appear.’

As an example, look at the word “rolling”:

State diagram for “rolling”. These probabilities are all made up because the actual corpus has over 1000 states.

Suppose we’re in state ‘rolling’. There’s a 35% chance the next state is ‘up,’ 20% chance the next state is ‘down,’ 5% chance the next state is ‘you,’ and so on.

Each of the adjoining states also will have its states it connects to (which I didn’t draw here), forming a state machine ‘web.’

Now, we can start at a random state (random word), and move around to different states, recording the path we took. For instance, Rolling-Up-To-The… up until however long we want.

In theory, we can transition states forever, making a single, infinitely-long line of text, kind of like hitting the middle suggestion on a text message auto-complete. However, Snoop Kovv terminates lines after a given length and a given probability, based on how long the lines in the corpus are.

Snoop Kovv also terminates lines early when it hits a symbol like a period or exclamation point. Depending on how often these symbols end the line in the corpus, Snoop Kovv follows the same probability. This probability gives the machine-written text roughly the same flow as the original.

What’s the difference between a memory and memoryless system?

Snoop Kovv only looks at the current state (word) when determining the next state (next word). This is a memoryless system, as nothing in the past influences the next state.

We could add memory to this system, whereby the system looks not only at the current word but also the previous word to determine the next state. In this sense, each ‘state’ would be a chain of two states, and the probability of transition would be the probability of a word appearing after the chain of two words in a row.

This memory chain can be any length, not just two words. The larger the chain, the more coherent the output, but the less original.

The Snoop Kovv state diagram

This is the actual diagram for the corpus:

Snoop Kovv state diagram or Seoul subway map?

Instead of numbers for each state transition line, we have a color-coded graph, with blue being a low transition probability, to red being a high transition probability.

Let’s zoom in on that pile in the middle:

Like a close up of Snoop Lion’s dreads

It’s way too messy to make out anything useful, but we can see the individual nodes and the thousands of interconnecting lines. Also, we can vaguely make out where some of the more ‘popular’ words lie, based on the color density of the diagram. For instance, in the upper-left and upper-center, the words there have more connections, versus some of the words on the outskirts of the previous image, like ‘sixty-four’ up top.

Data as a table

We can also look at the output as a table.

In this view, the vertical (left) side is the state we’re in, and the horizontal (top) side is the state we’re transitioning. For example, if we’re in state 2, there’s a 100% chance of transitioning to state 10. If we’re in state 10, there’s a 0.27% chance of transitioning to state 4.

State names

Here we see the associated state names. State 1 is !, state 23 is act, and so on.

The output

Let’s look at this thing in action:

Roll joints bigger than birds
Snoop Dooop Doo

Compare to LSTM

For fun, I ran this same corpus through a LSTM (long short-term memory recursive neural network) I’ve been working on. It trained over 5000 epochs, and took about a day. Here’s some verses, including the original spelling:

Move in oh! Oh yeah , there's something about you beautiful,
I just want you to know oh!

You're my favorite girl oh yeah ,
There's something about you I got a toun' on that
Back me throw it susull thing in the came woblding the doggs
Pow dogg you ganns and sloow in the droogs se way,

We doin' what we do in back of the 'lac ,
I'm like I'm up all for that and every night her body gets strapped

How 'bout that gangsta, gang, gangsta

Now, this corpus is far too short for a solid LSTM use (and needs about ten times the training), but it also emphasizes the power of the Markov model. With a smaller corpus and less ‘training time,’ the Markov model produces similarly coherent results.

Other applications

Another application of this is combining the LSTM with the Markov model. The LSTM generates the text, and the Markov model serves as a ‘grammar check’, whenever words seem a bit off.

For instance, if the text string is “d-o-double-…,” the Markov chain should know that probabilistically speaking, this string of words should end in “g.” “Gin and … “ should end in “juice,” not just any word that follows any arbitrary “and”.

I end this with a quote from the man himself:

Probably a direct quote