Exercise 1: Hollywood's gender divide
In this exercise You will work with the underlying data from this article on “Hollywood’s gender divide and its effect on films”.
The article visualizes movie data based on whether the movie passes the Bechdel test. From the beginning of the article:
To pass, films need to satisfy three requirements:
- It has at least two women in it
- Who talk to each other, about
- Something besides a man
I have provided some similar data on the github page (if you want to see how these data are created check out this page).
You load the data as follows
The names of the data are as follows
Most of the names are self explanatory, but some of them are not.
votes
: Count of votes on IMDBvote_mean
: Mean rating on IMDBrole
: Indicator for actor, director or writercount
: Actor countcount_male
: Count of male actorscount_female
: Count of female actorsbechdel_test
: Count of how many of the Bechdel requirements the film passes
Questions
- Read the article and discuss in groups whether you find the argument and visualizations convincing.
- Generate the following new variables
mean_male
: measure the fraction of male actors in the moviemean_female
: measure the fraction of female actorsstatus
: generate a variable that takes three values (all male
,mixed
orall female
based on the gender composition of the cast)
- Grouped operations: Discuss how best to investigate how the Bechdel score varies by the gender of the director/writer. Write
R
code to carry out your ideas and visualize the results. - Think about other interesting relationships you could investigate. For example, how does ratings, gross earnings and Bechdel score relate to each other? Are the results you found previously related to age of the director instead of gender? Discuss and visualize.