Making sense of over 30 million Steam reviews

It should go without saying, but finding the right game for you is absolutely key to enjoying games and avoiding boredom. But how do you do it? You could rely on recommendations or professional game reviews, but there is no guarantee that you share the same tastes as those people, and you’re putting your trust in a very select few. Instead, you could turn to Steam reviews, and understand the opinion of tens, if not hundreds, of thousands of people who are real gamers and have little bias or agenda.

The problem

It sounds great in theory, but there’s two problems with that:

  1. How do you even begin to process that volume of reviews? And no joke – the most popular games genuinely have that number. You certainly can’t read them all, and beyond classifying a game’s reviews as anywhere between ‘overwhelmingly negative’ and ‘overwhelmingly negative’, Steam does very little to help you make sense of them.
  2. A lot of the reviews on Steam are absolute hot garbage. Low effort content, memes or worse.
Steam reviews for Apex Legends
Steam reviews for Apex Legends. Not exactly helpful

The solution

To help make sense of the sheer number of Steam reviews available, and turn them from car crash into the incredible data resource they should be, we’ve done the following:

  • Built various Python scripts to download reviews en-mass from Steam’s review API, and run a text analysis to understand the content of each individual review. The coding script identifies whether the reviews are made up of 22 different themes – from graphics, to music/sound, to narrative to gameplay and more.
  • Aggregate all this data together for over 2,500 games (selected as top sellers or high-performing games based on positive review %).
  • Put it all in an easily accessible Google sheet for you to use.

The sheet and suggested usage

You can access the sheet here. At the time of writing it has 4 tabs:

  1. About. Version history and contact details.
  2. Notes. Nothing much that hasn’t been said above, other than details about which words go into each theme.
  3. Game look-up. Select a game (from the 2,500 we have data for) and see what it does well on vs other games in the database.
  4. Raw data. Allowing sorting and greater comparison between titles.

Let’s have a look at an example of this in action, for divisive CD Projekt Red title, Cyberpunk 2077:

The Steam review data for Cyberpunk 2077
The Steam review data for Cyberpunk 2077

What does this tell us? Well – the game’s world is its strength, with reviewers much more frequently commenting on this than average. It’s also strong narratively, has memorable characters and a great game feel. It doesn’t do quite as well with ‘gameplay enemies’ – a theme that likely picks up on the game’s weak AI. Hidden from view are a whole range of other themes, the most notable being the one around ‘bugs and stability which the game has had a well-publicised issue with.

And how do you use this? Knowing these things you can better judge whether the game is for you. Are you a sucker for characters, plot and game world? If so; then Cyberpunk 2077 is a good choice. But if you prefer games with a much more solid gameplay loop (which Cyberpunk 2077 doesn’t particularly excel at), then you may want to pass on this one.

The sheet gives you all this information and more (such as average playtime so you can assess value better) for 2,500+ games. So start exploring today!

Feedback

The plan is to periodically update the sheet with new releases, and also build upon its functionality to give users even greater value from the data. If there’s something you think would be useful to see, get in touch here.

So one last time, give it a go!

Leave a Comment

Your email address will not be published.