Exploring Board Game Trends: Insights and Analysis

Exploring Board Game Trends: Insights and Analysis

Company Overview:

You run a board game company that tracks sales and customer reviews of various board games. You have a dataset containing information about different board games, their sales figures, customer reviews, and more.

Dataset Description

The dataset bgg_dataset.csv contains the following columns:

  • ID: Unique identifier for each board game.
  • Name: Name of the board game.
  • Year: Published: Year the game was published.
  • Min: Players: Minimum number of players required to play the game.
  • Max: Players: Maximum number of players the game supports.
  • Play Time: Average playing time in minutes.
  • Min Age: Minimum recommended age to play the game.
  • Users Rated: Number of users who rated the game.
  • Rating Average: Average rating of the game.
  • BGG Rank: Number of people who own the game.
  • Complexity Average: Number of people willing to trade the game.
  • Owned Users: Number of people who want the game.

Dataset: bgg_dataset.csv

Tasks

Data Loading and Cleaning:

  1. Load the dataset into a pandas DataFrame.
1
2
3
4
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
board_games = pd.read_csv("bgg_dataset.csv", sep=";", decimal=",")
  1. Check for and handle any missing values.
1
2
# Check Missing Values
board_games.isnull().sum()
ID                       16
Name                      0
Year Published            1
Min Players               0
Max Players               0
Play Time                 0
Min Age                   0
Users Rated               0
Rating Average            0
BGG Rank                  0
Complexity Average        0
Owned Users              23
Mechanics              1598
Domains               10159
dtype: int64
1
2
3
4
5
6
7
8
9
10
11
# Fill Missing Values for column Domains 
board_games.fillna({'Domains': 'Unknown'}, inplace=True)

# Fill Missing Values for column Mechanics
board_games.fillna({'Mechanics': 'Unknown'}, inplace=True)

# Delete rows where year is not set
board_games.dropna(subset=['Year Published'], inplace=True)

## Could also drop the rows with missing values
# board_games = board_games.dropna()

Data Exploration:

  1. Display the first few rows of the DataFrame.
1
board_games.head(5) # Could be set to any number to see more rows
> Data ommited for brevity
  1. Get a summary of the dataset (mean, median, standard deviation, etc.).
1
board_games.describe()
> Data ommited for brevity
  1. Find the total number of games in the dataset.
1
2
total_games = board_games['ID'].nunique()
total_games
20327

Sales Analysis:

  1. Identify the top 5 board games by the number of owners.
1
2
3
board_games.nlargest(5, 'Owned Users')[['Name', 'Owned Users']] # Option 1
# board_games.sort_values('Owned Users', ascending=False).head(5)[['Name', 'Owned Users']] # Option 2

Name Owned Users
98 Pandemic 155312.0
394 Catan 154531.0
177 Carcassonne 149337.0
60 7 Wonders 112410.0
92 Codenames 107682.0
  1. Find the average rating for games published each year.
1
2
3
4
5
6
# First, we set the values to the correct data type 
board_games['Rating Average'] = board_games['Rating Average'].astype(float)
board_games['Year Published'] = board_games['Year Published'].astype(int)
board_games.groupby('Year Published').agg({'Rating Average': ['mean', 'max', 'min']}).sort_values('Year Published', ascending=False).head(10)


Rating Average
mean max min
Year Published
2022 8.270000 8.27 8.27
2021 7.902292 9.54 5.05
2020 7.470146 9.43 3.16
2019 7.087672 9.31 1.10
2018 6.910215 9.12 3.00
2017 6.790330 9.43 3.55
2016 6.645346 8.82 3.31
2015 6.546667 9.06 3.09
2014 6.465856 9.10 3.50
2013 6.422682 9.14 1.05
  1. Determine the year with the highest number of games published.
1
2
3
# Year with the highest number of games published
top_year = board_games['Year Published'].value_counts().idxmax()
print(f'Year with Highest Number of Games Published: {top_year}')
Year with Highest Number of Games Published: 2017
1
2
board_games.groupby('Year Published').count().sort_values('Name', ascending=False).rename(columns={'Name': 'Total'}).head(5)[['Total']]

Total
Year Published
2017 1274
2016 1257
2018 1254
2019 1134
2015 1131

Player Analysis:

  1. Find the game with the widest range of players (max_players - min_players).
1
2
3
board_games['Range'] =  board_games['Max Players'] - board_games['Min Players']
board_games.nlargest(1, 'Range')[['Name', 'Range']]

Name Range
7025 Start Player: A Kinda Collectible Card Game 997
  1. Calculate the average playing time for games with different minimum age requirements.
1
board_games.groupby('Min Age').agg({'Play Time': 'mean'})

Play Time
Min Age
0 163.941600
1 17.500000
2 9.000000
3 14.578947
4 17.313433
5 19.305046
6 22.352236
7 27.632880
8 37.649446
9 46.558282
10 54.121964
11 92.525773
12 157.144345
13 75.918800
14 162.181461
15 118.136054
16 354.970060
17 60.847458
18 76.198830
21 59.545455
25 20.000000

Rating Analysis:

  1. Plot the distribution of average ratings.
1
2
3
4
5
6
# Distribution of average ratings
plt.hist(board_games['Rating Average'], bins=20, edgecolor='black')
plt.title('Distribution of Average Ratings')
plt.xlabel('Average Rating')
plt.ylabel('Frequency')
plt.show()

Average Rating Distribution

  1. Identify any correlations between the average rating and the number of reviews.
1
board_games[['Rating Average', 'Users Rated']].corr()

Rating Average Users Rated
Rating Average 1.000000 0.169651
Users Rated 0.169651 1.000000
  1. Find the top 10 games by average rating that have been rated by at least 100 users.
1
board_games[board_games['Users Rated'] >= 100].nlargest(10, 'Rating Average')[['Name', 'Rating Average']]

Name Rating Average
2639 Arena: The Contest 8.99
2797 Dungeon Universalis 8.99
3794 Roads to Gettysburg II: Lee Strikes North 8.94
2004 Core Space 8.93
2145 Aeon's End: Outcasts 8.88
5 Gloomhaven: Jaws of the Lion 8.87
6268 Anno Domini 1666 8.87
3985 World At War 85: Storming the Gap 8.85
2226 High Frontier 4 All 8.83
720 Kanban EV 8.82

Complexity Analysis:

  1. Determine the correlation between the average weight and average rating of the games.
1
board_games[['Complexity Average', 'Rating Average']].corr()

Complexity Average Rating Average
Complexity Average 1.000000 0.480833
Rating Average 0.480833 1.000000
  1. Identify the games with the highest and lowest complexity ratings.
1
2
3
4
5
most_complex = board_games.nlargest(1, 'Complexity Average')
least_complex = board_games.nsmallest(1, 'Complexity Average')
print(f'Most Complex Game: {most_complex["Name"]} with weight {most_complex["Complexity Average"]}')
print(f'Least Complex Game: {most_complex["Name"]} with weight {most_complex["Complexity Average"]}')

Most Complex Game: 16581    Empire (Third Edition)
Name: Name, dtype: object with weight 16581    5.0
Name: Complexity Average, dtype: float64
Least Complex Game: 16581    Empire (Third Edition)
Name: Name, dtype: object with weight 16581    5.0
Name: Complexity Average, dtype: float64

Wishlist Analysis:

  1. Determine the top 5 most wished-for games.
1
2
3
4
5
6
7
8
9
# We could use a formula to determine when a board game is wanted or not 
# This is a formula I think could give a Wishability Score, could be incorrect
board_games['Wishability Score'] = (
board_games['Owned Users'] * 0.4 +
board_games['Users Rated'] * 0.3 +
board_games['BGG Rank'] * 0.2 +
board_games['Rating Average'] * 0.1
)
board_games.nlargest(5, 'Wishability Score')['Name']
98        Pandemic
394          Catan
177    Carcassonne
60       7 Wonders
97        Dominion
Name: Name, dtype: object
  1. Determine the correlation between the Owned Users and Users Rated
1
board_games[['Users Rated','Owned Users']].corr()

Users Rated Owned Users
Users Rated 1.000000 0.986028
Owned Users 0.986028 1.000000

Important: This analysis is for educational purposes only and should not be used for commercial purposes.

Conclusion

In the las couple of years, the board game industry has seen a significant rise in popularity, with games like Pandemic, Catan, and Carcassonne leading the pack in terms of ownership and user ratings. The average rating of games has been on the rise, with newer games receiving higher ratings compared to older ones. The complexity of a game seems to have a positive correlation with its rating, indicating that players appreciate more intricate game mechanics. By analyzing the data on player preferences and game ratings, board game companies can gain valuable insights into the evolving trends within the gaming community, helping them develop more engaging and successful games.

Exploring Board Game Trends: Insights and Analysis

http://luislizama.com/2024/06/05/Board-Game-Analysis/

Author

Luis Lizama

Posted on

2024-06-05

Updated on

2024-06-06

Licensed under