Predicting video game hits with Machine Learning
In this project, I analyze sales data from 8K video games, identify variables most correlated to hits (games which sell over 1M units) and implement a prediction model to separate winners from losers. Bonus: What 2016 games can still become hits?
1. Data exploration and analysis
Median sales (in millions of units) vs. critic scores
The following four heatmaps show how game sales vary according to critic scores, which are split into six scoring groups. Additionally, each heatmap segments the data further by one of the following features: genre, developer, publisher, and platform (in order of appearance).
Under each heatmap, we identify the categories where games sell best. This is done for okay, good, and great games, as defined by games with scores in the 70s, 80s, and 90s, respectively.
- Genres where great games sell best: Racing, Action
- Genres where good games sell best: Platform, Action/Shooter
- Genres where okay games sell best: Fighting, Misc
- Developers whose great games sell best: Nintendo, Electronic Arts
- Developers whose good games sell best: Nintendo, Namco
- Developers whose okay games sell best: Nintendo, Traveller’s Tales
Interpretation: In the great scores column (last), Nintendo has the highest median sales (in millions of units) per game, at 4.6M. Interestingly, Nintendo also has the highest median sales per game in both the good and okay scoring columns.
- Publishers who sell great games best: Microsoft Game Studios, Warner Bros. Interactive Entertainment
- Publishers who sell good games best: Nintendo / Warner Bros. Interactive Entertainment
- Publishers who sell okay games best: Nintendo, Microsoft Game Studios
- Platforms where great games sell/sold best: PS, X360
- Platforms where good games sell/sold best: PS3, XOne
- Platforms where okay games sell/sold best: PS3, X360
It’s interesting how sensitive game sales in the whole PlayStation line seem to be to high critic scores, especially when sales in the mid-score ranges look relatively on par with other consoles (or at least exhibit a narrower spread).
Top values in the dataset
(By platform, developer, publisher and genre)
Platforms with most games in dataset:
- PS2
- DS
- PS3
Developers with most games in dataset:
- Ubisoft
- EA Sports
- EA Canada
Publishers with most games in dataset:
- Electronic Arts
- Activision
- Namco Bandai Games
Genres with most games in dataset:
- Action
- Sports
- Misc
Dataset correlations
(For numeric and categorical variables)
Strongest correlations:
- Critic score-to-global sales: We’ll take a closer look at this in the next two sections.
- Year of release-to-platform: This makes sense since new platforms are released periodically.
Note: Categorical columns (platform, genre, publisher) were converted to numeric in order of game count, as seen in previous section. The slightly negative correlations they have to global sales can be interpreted as “the higher the ID number, the smaller the [platform, genre, publisher], and thus the slightly lower the sales figure”.
Critic score vs. global sales
(For all years in the dataset)
Kinda messy, right?
We’ll go ahead and use bins to get a better feel for the relationship…
For years ≥ 2014
The relationship looks much clearer now. It’s interesting how the slope gets steeper in the 80’s. It seems once a video game gets a high critic score, every additional point has a higher impact. For example, in this 2014–16 subset, an 8-point increase in critic score seems to have a positive effect on sales of about 250k when starting from a score of 65, but ~1M when starting from 77. Go big or go home, right?
Defining hits as those with sales above 1 million units
This will be the target in our prediction model, where we’ll predict if a game will be a hit or not. The target is binary: 1 if Hit, else 0.
Here’s the relationship between critic scores and VG hits using a 5% sample:
As expected, it seems hits are mostly found near high critic scores, while non-hits can vary in scores but begin to lose presence in the high score ranges (as interpreted by the steepening regression curve near the 70's).
2. Prediction model
(For predicting the likelihood of a given game to reach sales of 1 million units or higher, referred to as “hit” games. Classification approach is applied to separate hits from non-hits.)
Generating features and train/test splitting
Testing prediction accuracy with RFC and LR
Ranking feature performance
Feature ranking (top 10):1. Critic_Score (0.323090)
2. Year_of_Release (0.158740)
3. Publisher_Nintendo (0.030405)
4. Genre_Action (0.024856)
5. Publisher_Activision (0.018035)
6. Genre_Sports (0.016918)
7. Publisher_Electronic Arts (0.016917)
8. Genre_Shooter (0.015722)
9. Platform_PS3 (0.015634)
10. Publisher_Ubisoft (0.014164)
3. Which 2023 video games can still become hits?
Video games with highest probability of becoming hits:
Stay Connected
To stay updated on the latest Xbox game predictions and join in the conversation, be sure to follow our blog and connect with us on social media
#XboxGamePredictions #GamingFuture #XboxCommunity #GameOnXbox #XboxPredictions #GamePredictionProject #GamingAnalytics #XboxFuture #HaloInfinite #ForzaHorizon #EldenRing #Fable #XboxExclusive #Xbox #XboxGaming #XboxCommunity #XboxOne #XboxSeriesX #XboxSeriesS
HAPPY DAY!!!!!
cheers! :)
Article By:
IswaryaSivakumar
Instagram: iswarya._.vijaysiva
LinkedIn: Iswarya
Quora: iswarya