If you take a corpus of text and arrange all the words in descending order according to their frequency, you’ll see that the nth ranking word’s frequency is approximately equal to 1/n of the frequency of the most used word. This is the most famous example of Zipf’s law:
an empirical law stating that when a list of measured values is sorted in decreasing order, the value of the n-th entry is often approximately inversely proportional to n.
— WikiPedia
It is also a discrete form of the Pareto distribution which says 20% of the causes are responsible for 80% of the effects.
The fascinating thing about this law is how much it occurs naturally literally everywhere.
It’s also found in city populations, solar flare intensities, protein sequences and immune receptors, the amount of traffic websites get, earthquake magnitudes, the number of times academic papers are cited, last names, the firing patterns of neural networks, ingredients used in cookbooks, the number of phone calls people received, the diameter of Moon craters, the number of people that die in wars, the popularity of opening chess moves, even the rate at which we forget.
— The Zipf Mystery - VSauce (must watch)
Typing random text with spaces explains Zipf’s law mathematically i.e., it’s more probable to write short words than long words. But why it occurs in all of these places is non-trivial if not a complete mystery.
A common explanation for Zipf-ian distributions is preferential attachment processes.
They occur when something - money, views, attention, variation, friends, jobs, anything really is given out according to how much is already possessed.
— The Zipf Mystery - VSauce
Viral things go more viral. Used things are used more.
It’s just weird that it rules so much around us. The world doesn’t feel predictable but this somehow tells me it is? That all our life amounts to a few moments even if we live for years. All the knowledge we gather, all the media we consume, all the moments we live, we are going to forget most of it because as stated above, our forgetting curve is Zipf-ian.
I look at all the books I’ve read and realize that I can’t remember every detail from them, it’s a little disappointing. I mean, why even bother if the Pareto Principle dictates that my ‘Zipf-ian’ mind will consciously remember pretty much only the titles and a few basic reactions years later.
Ralph Waldo Emerson makes me feel better. He once said, “I cannot remember the books I’ve read any more than the meals I have eaten. Even so, they have made me.”
— The Zipf Mystery - VSauce