Data Science in the Film Industry Part 2: Movie Trailers and Artificial Intelligence
Developments in technology have led to ground-breaking work in machine learning, which has improved the quality of different industries such as the film industry. From predicting people’s preferences, to creating short movies, machine learning in the film industry has taken large leaps over the past couple of years. Data scientists in this industry study the data collected from audiences to understand their preferences and predict which movies will receive maximum approval. The goal is to determine the success rate of the movie industry as well as the amount of total profit.
In the article “Data Science in the Film Industry Part 1: What is my Preference?”, we went through the process of gauging the public’s interests in terms of entertainment. Different algorithms have been formulated to determine the public’s interests, which have been taken into consideration when creating new movies. In addition, there has been experimentation with machine learning to create short films. Before diving into the film-making process, it is important to understand the movie process from the starting point: the customer’s preference.
How do these production companies know what types of movies the audience will enjoy? Do Data Scientists wave their hands over a crystal ball and get a glimpse of the future?
To ensure maximum user approval, the creators include highlights of the movie while making sure to include elements that resonate with their targeted audience.
In 2018, 20th Century Fox released information about how they used machine learning to analyze the content of movie trailers. The machine vision systems analyze each frame in the trailer and label all the different objects and events. The data collected from the movie trailers are then compared and used to and prognosticate movie preferences of those who viewed a certain trailer. For example, similarities in the identified objects in a pair of movie trailers are used to predict whether the audience of the first movie will choose to watch the second one.
Object-sequences are better at predicting people’s preferences because they are more efficient in representing trailers. Convolution Neural Networks are utilized for this task.
A. Convolution Neural Network
The Convolution Neural Network is an algorithm that takes in an image and assigns a level of importance to different objects in the picture. The algorithm is also able to differentiate between the images.
Movies that are dialogue-heavy possibly contain sequences of close-ups of the actors in the scene. Because of this, the actors who are having a discussion will have their faces being shown back and forth, depending on who is talking. The convolution filters filter out the most important object sequences.
In the video convolution model, video frames are taken from movie trailers. The algorithm samples the videos to one frame/sec. From each frame, a ‘1024’ dimensional image features are extracted using the Inception V3 model. Then a convolution layer puts 1024 convolution filters against the 8x1024 filter. The layer has the dimensions 8x1024x1024. “These filters are convoluted along the temporal dimension with a stride value of 2 for dimensionality reduction” (Sagar).
B. Video Convolution Network
In a nutshell, the machine learns a group of filters where each filter captures a certain object-sequence that suggests a particular action. For instance, a filter pair could learn that images of a country road and a driving car suggests that there is someone driving down a country road. The network’s task is to make the object-specific temporal convolutional filters learn the object-sequence templates, assuming that the object-sequence groups are standard for all movie trailers and across movie trailers, and make sure the storylines and actions adhere to distinct object-sequences.
In the past, many different movie studios have tried to predict customer preferences. Julie Rieger, President, Chief Data Strategist and Head of Media, and Miguel Campo-Rembado, SVP of Data Science assembled a group of Data Scientists to research the way audiences interact with different movie plots. The team has made progress in the field by working with Google Cloud and developing models to identify the audiences’ preferences.
Analyzing a movie’s script is not optimal due to the fact that it is just the basic framework of the plot with nothing that can actually be used to spark interest in the public. This team decided to focus on movie trailers because those are a good indicator of whether a movie will be well-received by the public. The team joined forces with Google’s Advanced Solutions Lab to develop Merlin video, “a computer vision tool that learns dense representations of movie trailers to help predict a specific trailer’s future moviegoing audience.”
The Data Scientists relied on the flexibility of the Cloud ML Engine to be able to repeat and test rapidly without compromising the deep learning model’s fundamental parts. Using Cloud Dataflow simultaneously made the data generated in the Data Studio easier to understand and work with. System maintenance, which mostly involves data absorption, is handled by the data scientists.
After finalizing the framework of the system, the team used the dataset of available YouTube videos and analyzed them. The team used the model from Google to study labels in the video, such as color, face type, object characteristics, etc.
Logan and Merlin
20th Century Fox created Merlin, an artificial intelligence system that predicts movie attendance using recommendation systems, with the aid of open-source AI framework TensorFlow and Google servers. To test Merlin’s capabilities, the film Logan, a dark, slightly unconventional superhero movie, was chosen. The purpose was to predict other movie preferences of the people who watched Logan.
First, Merlin ran the trailer and labeled all the objects seen, such as “car” and “forest.”
After the label frequencies were determined, the 20th Century Fox team wanted to compare the data with the labels generated for other film trailers to find similar movies. The positions of the labels in each trailer matter in the final analysis. It is challenging to take those in consideration as well as the other plethora of elements that determine the audience’s preferences. Merlin’s task was to simultaneously analyze all these parts.
The engineers at 20th Century Fox state that the information in the graph is useful because it correlates with the genre of a film. For instance, if a trailer has quick shots, it is probably an action movie, whereas long close ups of a character are mostly from the trailer of a drama movie. A person who gets a ticket for one type of movie has probably seen other movies that have similar trailers to the present movie. By comparing the different trailers, Merlin can predict the preferences of the people who watched the film Logan.
The graph shows the top 20 films viewed by the audience who also watched the film Logan. The right column is Merlin’s predictions and the left column is data that was collected. The middle column indicates that Merlin got quite a few of the predictions correct. Although Merlin does not succeed in putting the films in the same order of preference, it still is able to recognize the top 5 movies.
Some of Merlin’s incorrect predictions shed light on the machine’s abilities. For example, Merlin forecasts that Logan fans will enjoy The Legend of Tarzan. This phenomenon could be due to both the trailers having an abundance of “trees” and “lights.”
Benjamin is an artificial intelligence software that uses an LSTM algorithm to write short films. Ross Goodwin provided Benjamin AI with loads of prompts and scripts for science fiction movies. Benjamin then learned to predict the order in which letters were usually placed, as well as the order of words and phrases. The LSTM algorithm is much more useful than a Markov chain in this case because it can take a sample of longer letter strings. This makes it easier for predicting larger phrases and paragraphs instead of just a handful of words. Soon, Benjamin was able to take on more complex tasks such as determining the structure of the movie, developing the script, and giving stage directions.
Benjamin struggled with names because of its unpredictability and unconventionality. Because of this, Goodwin made all the names in the film single letters. In the film Sunspring, the characters are named H,H2, and C. Interestingly, in the first run through, there were two characters with the name H. To avoid confusion, one of the character’s names was changed to H2.
Another film created by Benjamin was Zone Out. Benjamin took about 48 hours to put together a film after going through thousands of old films and videos of professional actors acting in front of a green screen.
Figure 5: The given prompt for the film (Source: Tech2)
Oscar Sharp and Ross Goodwin gave Benjamin complete control of the film, meaning the AI was to make the film on its own. Benjamin used voice-generating technologies and face-swapping mechanisms to create the film. In addition, Benjamin developed the script using the actors’ voice recordings to put sentences together.
It’s important to remember that this technology is not perfect. Although there are a few shortcomings, Merlin and Benjamin are a prime example of how software development has evolved over the decade. This is just the beginning for artificial intelligence; much improvements will be made to reach close to perfection.