Posted Updated Data Analysis9 minutes read (About 1306 words)
Exploring Board Game Trends: Insights and Analysis
Company Overview:
You run a board game company that tracks sales and customer reviews of various board games. You have a dataset containing information about different board games, their sales figures, customer reviews, and more.
Dataset Description
The dataset bgg_dataset.csv contains the following columns:
ID: Unique identifier for each board game.
Name: Name of the board game.
Year: Published: Year the game was published.
Min: Players: Minimum number of players required to play the game.
Max: Players: Maximum number of players the game supports.
Play Time: Average playing time in minutes.
Min Age: Minimum recommended age to play the game.
Users Rated: Number of users who rated the game.
Rating Average: Average rating of the game.
BGG Rank: Number of people who own the game.
Complexity Average: Number of people willing to trade the game.
import pandas as pd import numpy as np import matplotlib.pyplot as plt board_games = pd.read_csv("bgg_dataset.csv", sep=";", decimal=",")
Check for and handle any missing values.
1 2
# Check Missing Values board_games.isnull().sum()
ID 16
Name 0
Year Published 1
Min Players 0
Max Players 0
Play Time 0
Min Age 0
Users Rated 0
Rating Average 0
BGG Rank 0
Complexity Average 0
Owned Users 23
Mechanics 1598
Domains 10159
dtype: int64
1 2 3 4 5 6 7 8 9 10 11
# Fill Missing Values for column Domains board_games.fillna({'Domains': 'Unknown'}, inplace=True)
# Fill Missing Values for column Mechanics board_games.fillna({'Mechanics': 'Unknown'}, inplace=True)
# Delete rows where year is not set board_games.dropna(subset=['Year Published'], inplace=True)
## Could also drop the rows with missing values # board_games = board_games.dropna()
Data Exploration:
Display the first few rows of the DataFrame.
1
board_games.head(5) # Could be set to any number to see more rows
> Data ommited for brevity
Get a summary of the dataset (mean, median, standard deviation, etc.).
Find the average rating for games published each year.
1 2 3 4 5 6
# First, we set the values to the correct data type board_games['Rating Average'] = board_games['Rating Average'].astype(float) board_games['Year Published'] = board_games['Year Published'].astype(int) board_games.groupby('Year Published').agg({'Rating Average': ['mean', 'max', 'min']}).sort_values('Year Published', ascending=False).head(10)
Rating Average
mean
max
min
Year Published
2022
8.270000
8.27
8.27
2021
7.902292
9.54
5.05
2020
7.470146
9.43
3.16
2019
7.087672
9.31
1.10
2018
6.910215
9.12
3.00
2017
6.790330
9.43
3.55
2016
6.645346
8.82
3.31
2015
6.546667
9.06
3.09
2014
6.465856
9.10
3.50
2013
6.422682
9.14
1.05
Determine the year with the highest number of games published.
1 2 3
# Year with the highest number of games published top_year = board_games['Year Published'].value_counts().idxmax() print(f'Year with Highest Number of Games Published: {top_year}')
# Distribution of average ratings plt.hist(board_games['Rating Average'], bins=20, edgecolor='black') plt.title('Distribution of Average Ratings') plt.xlabel('Average Rating') plt.ylabel('Frequency') plt.show()
Identify any correlations between the average rating and the number of reviews.
Most Complex Game: 16581 Empire (Third Edition)
Name: Name, dtype: object with weight 16581 5.0
Name: Complexity Average, dtype: float64
Least Complex Game: 16581 Empire (Third Edition)
Name: Name, dtype: object with weight 16581 5.0
Name: Complexity Average, dtype: float64
Wishlist Analysis:
Determine the top 5 most wished-for games.
1 2 3 4 5 6 7 8 9
# We could use a formula to determine when a board game is wanted or not # This is a formula I think could give a Wishability Score, could be incorrect board_games['Wishability Score'] = ( board_games['Owned Users'] * 0.4 + board_games['Users Rated'] * 0.3 + board_games['BGG Rank'] * 0.2 + board_games['Rating Average'] * 0.1 ) board_games.nlargest(5, 'Wishability Score')['Name']
Determine the correlation between the Owned Users and Users Rated
1
board_games[['Users Rated','Owned Users']].corr()
Users Rated
Owned Users
Users Rated
1.000000
0.986028
Owned Users
0.986028
1.000000
Important: This analysis is for educational purposes only and should not be used for commercial purposes.
Conclusion
In the las couple of years, the board game industry has seen a significant rise in popularity, with games like Pandemic, Catan, and Carcassonne leading the pack in terms of ownership and user ratings. The average rating of games has been on the rise, with newer games receiving higher ratings compared to older ones. The complexity of a game seems to have a positive correlation with its rating, indicating that players appreciate more intricate game mechanics. By analyzing the data on player preferences and game ratings, board game companies can gain valuable insights into the evolving trends within the gaming community, helping them develop more engaging and successful games.
Exploring Board Game Trends: Insights and Analysis