Globywood

The story of movie industries across time.

Introduction
Selecting the data
Diversity analysis of actors
How long will I be in the theater?
A deeper look at movie summaries

data meme

Selecting the data

This project uses the CMU Movie Summary Corpus, a collection of 42.306 movie plot summaries extracted from Wikipedia. It also includes metadata about the movie and the characters in them. As the aim is to make an analysis of each large movie industry, we only consider movies that come from a single country. More specifically, we keep movies from countries with largest movie industries and that have enough samples in the dataset. We are left with movies from the United States of America, India, United Kingdom, Japan, and France. Splitting our dataset over different decades gives us the following distribution of movies across time:

country distribution

Our samples are ready, let’s take a look at diversity of actors!