Iain (“an ex-physicist currently working as a data scientist”) scraped Dark Lyrics and built a dataset of lyrics to 222,623 songs by 7,364 metal bands, then used traditional natural language processing techniques to analyze them.
Iain’s post is a good tour through the natural language processor’s toolkit — Bag of Words Bayesian filtering, log-likelihood ratio, term frequency -Inverse document frequency, cosine distance, etc. The output of the analysis is sometimes fun and interesting, but the value here is mostly as a good primer on how the different techniques work and when you might use them.
Most Metal Words
Rank Word Metalness
1 burn 3.81
2 cries 3.63
3 veins 3.59
4 eternity 3.56
5 breathe 3.54
6 beast 3.54
7 gonna 3.53
8 demons 3.53
9 ashes 3.51
10 soul 3.40
11 sorrow 3.40
12 sword 3.38
13 goodbye 3.28
14 dreams 3.28
15 gods 3.24
16 pray 3.22
17 reign 3.15
18 tear 3.12
19 flames 3.12
20 scream 3.11Least Metal Words
Rank Word Metalness
1 particularly -6.47
2 indicated -6.32
3 secretary -6.29
4 committee -6.16
5 university -6.09
6 relatively -6.08
7 noted -5.85
8 approximately -5.75
9 chairman -5.69
10 employees -5.67
11 attorney -5.66
12 membership -5.64
13 administrative -5.61
14 considerable -5.60
15 academic -5.51
16 literary -5.49
17 agencies -5.48
18 measurements -5.47
19 fiscal -5.45
20 residential -5.45
Heavy Metal and Natural Language Processing – Part 1 [Iain/Degenerate State]
(via Waxy)