WNBA Statistics Project
Data Analysis & Visualization
Introduction
This project was done to learn and perform data modeling calculations for WNBA statistics and to understand excel parsing and data management.
Research Questions
Question 1: What is the average Player Efficiency Rating among WNBA players from 1997-2019?
- Who: WNBA Players
- What: Basketball, WNBA Statistics
- Where: WNBA League in North America
- When: 1997-2019 WNBA Seasons
- Data needed: CSV with WNBA Player Statistics highlighting Player Efficiency Rating
Question 2: What is the average age of players among WNBA players from 1997-2019?
- Who: WNBA Players
- What: Basketball, WNBA Statistics
- Where: WNBA League in North America
- When: 1997-2019 WNBA Seasons
- Data needed: CSV with WNBA Player Statistics highlighting age and season
Question 3: Who has the highest Player Efficiency Rating among WNBA players from 1997-2019?
- Who: WNBA Players
- What: Basketball, WNBA Statistics
- Where: WNBA League in North America
- When: 1997-2019 WNBA Seasons
- Data needed: CSV with WNBA Player Statistics highlighting Player Name and Player Efficiency Rating
Data Source
The WNBA and other sports statistics websites collect comprehensive data regarding WNBA players and their performance metrics.
- Name: wnba-player-stats
- Source: FiveThirtyEight WNBA Stats
- Author: Neil Paine
- Publication Date: May 25, 2020
- Publisher: FiveThirtyEight
- Format: CSV (comma separated values)
- Size: 530 KB
- Records: 3,884
Sample Data Processing:
import csv
with open('../data/raw/wnba-player-stats.csv', 'r') as f:
reader = csv.reader(f, delimiter=",", quotechar='"')
next(reader, None)
data_read = [row for row in reader]
for i in range(0,3):
print(data_read[i])
Data Processing
Process for extracting, transforming, and cleaning data:
The key columns used were Player Name, Year, Age, and Player Efficiency Rating. Data type conversions were performed to ensure proper analysis, with empty numerical cells filled with 0 values to represent missing statistics.
Data Extraction Functions:
def get_player_name():
with open('../data/raw/wnba-player-stats.csv', 'r') as f:
reader = csv.reader(f, delimiter=",", quotechar='"')
next(reader, None)
data_read = [row for row in reader]
for i in range(len(data_read)):
yield data_read[i][1]
def get_age():
with open('../data/raw/wnba-player-stats.csv', 'r') as f:
reader = csv.reader(f, delimiter=",", quotechar='"')
next(reader, None)
data_read = [row for row in reader]
for i in range(len(data_read)):
if data_read[i][3] == '':
data_read[i][3] = 0
yield int(data_read[i][3])
Visualizations
This visualization shows the number of players per year in the WNBA from 1997-2019. The number of players typically ranges from 150-250 per year, with the highest being 219 active players in 2002.
This visualization shows the average age of players per year in the WNBA. The average age remains relatively stable, ranging from 25-30 years old each season, with a peak average age of 27 in 2003.
Conclusion
The analysis successfully answered all three research questions:
- Average Player Efficiency Rating (1997-2019): 11.90
- Average Age of Players (1997-2019): 26.3 years
- Highest Player Efficiency Rating: 78.9
The additional analysis revealed that the average age of players per year consistently hovers around 26 years old throughout the dataset timeframe, providing insights into the demographic stability of the WNBA.