This project was done to learn and perform data modeling calculations for WNBA statistics and to understand excel parsing and data management.
Question 1: What is the average Player Efficiency Rating among WNBA players from 1997-2019?
The WNBA may already collect statistics and data regarding WNBA players, so I can possibly look on the WNBA website for more details. There are also other websites that are collecting data regarding WNBA players and those may also be useful when trying to find the relevant data to answer these questions.
import csv
with open('../data/raw/wnba-player-stats.csv', 'r') as f:
reader = csv.reader(f, delimiter=",", quotechar='"')
next(reader, None)
data_read = [row for row in reader]
for i in range(0,3):
print(data_read[i])
['montgre01w', 'Renee Montgomery', '2019', '32', 'ATL', '34', '-9.8', 'G', '34', '949', '69.5%', '11.1', '.520', '.727', '.176', '1.0', '4.1', '18.0', '1.7', '0.5', '16.8', '17.5', '0.4', '0.5', '0.9', '.039', '-2.4', '1.22']
['williel01w', 'Elizabeth Williams', '2019', '26', 'ATL', '34', '-9.8', 'C-F', '32', '909', '66.6%', '16.7', '.521', '.000', '.477', '11.4', '12.1', '7.8', '1.4', '4.7', '12.9', '15.9', '1.6', '1.0', '2.7', '.117', '+0.6', '2.51']
['sykesbr01w', 'Brittney Sykes', '2019', '25', 'ATL', '34', '-9.8', 'G', '34', '880', '64.5%', '11.3', '.445', '.308', '.259', '3.0', '9.1', '19.6', '1.2', '1.5', '14.8', '23.1', '-0.8', '0.8', '0.0', '-.001', '-3.4', '0.70']
Fields or Column Headers
Process for extracting, transforming, cleaning incoming data:
Columns that I plan on using are the Player Name, Year, Age, and Player Efficiency Rating columns from my dataset. The Player Name column is already a string which I need. The Year column is tricky, as it is a numerical type but
I still need it in the type of string as it is a categorical data type rather than a quantitative one. For Age and Player Efficiency Rating I will need to convert to Integer and Float types. There are some empty numerical cells for
players and those will be turned into 0 because in this case means that there are not statistics recorded for that player. It will show up as an empty string so it is necessary to fill in those cells. I also need to change some of
the data types for some columns. I'll use a generator to take in values from columns that I want. I will then put this data into a list where I can manipulate and hopefully answer the questions that I came up with.
import csv
def get_player_name():
with open('../data/raw/wnba-player-stats.csv', 'r') as f:
reader = csv.reader(f, delimiter=",", quotechar='"')
next(reader, None)
data_read = [row for row in reader]
for i in range(len(data_read)):
#if data_read[i][1] == None:
yield data_read[i][1]
def get_year():
with open('../data/raw/wnba-player-stats.csv', 'r') as f:
reader = csv.reader(f, delimiter=",", quotechar='"')
next(reader, None)
data_read = [row for row in reader]
for i in range(len(data_read)):
yield data_read[i][2]
def get_age():
with open('../data/raw/wnba-player-stats.csv', 'r') as f:
reader = csv.reader(f, delimiter=",", quotechar='"')
next(reader, None)
data_read = [row for row in reader]
for i in range(len(data_read)):
if data_read[i][3] == '':
data_read[i][3] = 0
yield int(data_read[i][3])
def get_PER():
with open('../data/raw/wnba-player-stats.csv', 'r') as f:
reader = csv.reader(f, delimiter=",", quotechar='"')
next(reader, None)
data_read = [row for row in reader]
print(data_read[12][11])
for i in range(len(data_read)):
if data_read[i][11] == '':
data_read[i][11] = 0
yield float(data_read[i][11])
player_names = list(get_player_name())
years = list(get_year())
ages = list(get_age())
PERS = list(get_PER())
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import numpy as np
y = []
x = []
for year in c:
x.append(year)
y.append(c[year])
y.reverse()
x.reverse()
figure(num=None, figsize=(20, 6), dpi=80, facecolor='w', edgecolor='k')
plt.bar(x, y, align='center', color='#aaddff')
plt.ylim(0, 300)
plt.ylabel('Number of Players')
plt.xlabel('Year')
plt.title('Number of Players Per Year')
plt.show()
This visualization that I came up with shows the number of players per year in the WNBA from 1997-2019. As you can see, the number of players floats around 150-250 players per year, with the highest being in 2002 with 219 active players in the WNBA.
import matplotlib.pyplot as plt
from matplotlib.pyplot import figure
import numpy as np
age_sum = 0
count = 0
avg_ages = []
with open('../data/raw/wnba-player-stats.csv', 'r') as f:
reader = csv.reader(f, delimiter=",", quotechar='"')
next(reader, None)
data_read = [row for row in reader]
for year in x:
age_sum = 0
count = 0
for i in range(len(data_read)):
if data_read[i][2] == year:
age_sum += int(data_read[i][3])
count += 1
avg_ages.append(age_sum/count)
figure(num=None, figsize=(20, 6), dpi=80, facecolor='w', edgecolor='k')
plt.bar(x, avg_ages, align='center', color='#aaddff')
plt.ylim(0, 30)
plt.ylabel('Average Age of Players')
plt.xlabel('Year')
plt.title('Average Age of Players Per Year')
plt.show()
This visualization shows the average age of players per year in the WNBA. As you can see, there is not much fluctuation with the average age of players as the average only ranges from 25-30 years old each WNBA season. We can see that the max average age of players was 27 in 2003.
In conclusion, I was able to answer 3/3 of my questions. For the 1st question, the answer to that is to use the mean() function and put PERS as the parameter. So mean(PERS) returned 11.900386299253112 which was the average Player Efficiency Rating in the WNBA from 1997-2019. For the 2nd question, I was able to do the same thing as above and use mean(ages) to find the answer which is 26.29513262941025. For the last question, I can use max(PERS) to find the answer to that which is 78.9. For the rest of my analysis, I was able to also find an extension to my second question which is to find the average age of players per year in the WNBA from 1997-2019. Above is the graph for that and it shows that the average age always hovers around 26. Overall, my analysis returned some pretty interesting statistics for WNBA players from 1997-2019.