Analysing Youtube Comments — Stuff Made Here


Shane Wighton’s Youtube channel Stuff Made Here is one of my favourite Youtube channels. His is an engineering-focused channel, where he makes videos on various innovative inventions. I have been watching his videos since he started back in March 2020. If you haven’t yet, I will definitely recommend you to check his content.


Word — Count table Stuff Made Here comments
Wordcloud from Stuff Made Here comments

Interesting Words

One of the things I noticed during the project is the number of typographical errors people make with ‘lockpickinglawyer’. Other than that, some interesting words were releaselplcut, teamlockpickinglawyer, unpicklockeble.

Making the vizualizations

I made all the visualizations using Python and various libraries. I used youtube-comment-downloader to fetch all the comments into JSON files. I used Natural Language Toolkit to tokenize, count and filter the words.

import nltk
from nltk.tokenize import word_tokenize
from nltk.corpus import stopwords
import json_lines
import os
from import Bar
import json

files = os.listdir("rawdata")
stopwords = stopwords.words('english')

bar = Bar('Progress: ', max=len(files))
data = {}
for file in files:
file_data = json_lines.reader(open('rawdata/'+file,'r'))
for comment in file_data:
tokens = word_tokenize(comment['text'])
for word in tokens:
word = word.lower()
if word not in stopwords and word.isalpha():
if word in data.keys():
data[word] += 1
data[word] = 1

data = {k: v for k, v in sorted(data.items(), key=lambda item: item[1],reverse=True)}
import os
from wordcloud import WordCloud
import numpy as np
from PIL import Image
import json

mask = np.array("mask.png"))
data = json.load(open("wordcount.json","r"))

wc = WordCloud(width=3888,height=5180, background_color="white", max_words=6000,mask=mask,max_font_size=1000, random_state=32)


Student | Developer | Photographer