Our Blog

Recently I have been very interested in learning some new technologies, especially those that help gather data for use in Tableau. The NBA has a very robust API, and there are a few great blog posts out there that helped get me started: Greg Rada, Daniel Forsyth, and Savvas Tjortjoglou. All material can be found here: https://github.com/charris7/TabNBA

Background & Requirements

So here is what we are going to do:

  • Using Python – scrape all of the shot chart data for every player on the PPG leaders list from the 2014-15 regular season and send that data to a nice little CSV
  • Using the Tableau Extract API & Python – read that CSV and turn it into a Tableau Data Extract
  • If you are here just for the data and only have Tableau Public, I’ll post an Excel file with all of the data at the bottom of the post

Here is what you need to replicate this:

  • An install of Python 2.7.x
  • I downloaded Anaconda – a package that ships with loads of Python goodies and an application named Spyder that we can use to write and test our code
  • https://www.continuum.io/downloads
  • Download the Tableau Extract API and watch this video for help installing and referencing the module with Python

Python, Scraping, & Creating The Extract

Step 1: Import functionality

import requests
import json
import pandas as pd
import goldsberry
import csv
import datetime
import dataextract as tde

The first section of the code is used to bring in all the modules we need to accomplish our goals. I’m not going to attempt to explain them all, as I’m not fully sure of all their capabilities, but I do know that getting familiar with pandas is huge for working with data in Python.

Step 2: Define variables

#NBA seasons to pull data for
seasons = ['2013-14','2014-15','2015-16','2016-17']

#This part grabs all of the unique player identifiers for the current NBA season
#Do not modify this section
players = goldsberry.PlayerList(Season='2016-17')
players = pd.DataFrame(players.players())
players = players['PERSON_ID'].tolist()
players = [str(i) for i in players]

The second section of code is used to define the NBA season(s) and players we are interested in pulling data for. In this example we are grabbing data from the 2013-14 NBA season all the way to the last game played in the 2016-17 season. To make things easy, instead of tying players back to individual seasons, our code will only look at players who are active in the 2016-17 season and try to find shot data in the previous seasons.

Step 3: Create a csv with the necessary headers

#Create a base .csv file to append the shot chart data to
#Create a single row that includes only the column headers
with open('TabNBA.csv', 'w') as csvfile:
    writer = csv.DictWriter(csvfile, fieldnames=fieldnames)

Humor me if this is the worst solution ever… I’m a novice. I needed a frame to store my result set in and the best way I could think to do so was to create a csv that is simply made up of column headers. Working with the headers from the NBA’s api was a little confusing to me, so this was my way around it. Now that we have a csv with headers, we can append results to it without worry.

Step 4: Create a function to send data to our csv that loops through each individual player and season

#Function that accepts a season and player, and appends all data received from the api to TabNBA.csv that was created in the step above
def shot_charts(season, player):
    shot_chart_url = 'http://stats.nba.com/stats/shotchartdetail?Period=0&VsConference=&LeagueID=00&LastNGames=0&TeamID=0&Position=&PlayerPosition=&Location=&Outcome=&ContextMeasure=FGA&DateFrom=&StartPeriod=&DateTo=&OpponentTeamID=0&ContextFilter=&RangeType=&Season='+season+'&AheadBehind=&PlayerID='+player+'&EndRange=&VsDivision=&PointDiff=&RookieYear=&GameSegment=&Month=0&ClutchTime=&StartRange=&EndPeriod=&SeasonType=Regular+Season&SeasonSegment=&GameID='
    headers = {
        'user-agent': ('Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36'), # noqa: E501
        'Dnt': ('1'),
        'Accept-Encoding': ('gzip, deflate, sdch'),
        'Accept-Language': ('en'),
        'origin': ('http://stats.nba.com')
    response = requests.get(shot_chart_url, headers=headers) 
    data = json.loads(response.text)
    headers = data['resultSets'][0]['headers']
    shot_data = data['resultSets'][0]['rowSet']    
    df = pd.DataFrame(shot_data, columns=headers)
    df['SEASON'] = season
    df['DATE_OF_GAME'] = df['GAME_DATE'].apply(lambda x: pd.to_datetime(str(x), format='%Y%m%d'))
#    g = df.groupby(['PLAYER_ID','SEASON'])
#    g['GAME_ID'].rank(method='dense')
#    df['GAME_NUMBER']=g['GAME_ID'].rank(method='dense')
    with open('TabNBA.csv', 'a') as f:
        df.to_csv(f, header=False)

This is the fun part. If you are coming from a Tableau background.. think of a Python function as a calculated field that gets transformed by a parameter (player). We telling the code to take two inputs named season and player, send those inputs to the URL string, and then request, receive, and organize the data the URL returns.

#Run the function above for all currently active NBA players, for all seasons in the seasons list    
for season in seasons:
    for player in players:
        shot_charts(season, player)

The little bit of code below the function simply says: For each individual player in each individual season.. run the shot_chart function. At the end of the shot_chart function, the organized data gets appended to the .csv we created all the way back in step one. Pretty cool right?

Step 5: Create a Tableau data extract file and fill it up with the data from our .csv

#Create TDE file
tdefile = tde.Extract('TabNBA.tde')

#Read data from the TabNBA.csv that was created in the steps above
csvReader = csv.reader(open('TabNBA.csv','rb'), delimiter=',', quotechar='"')

#Create TDE structure
if tdefile.hasTable('Extract'):
    table = tdefile.openTable('Extract')
    tableDef = table.getTableDefinition()
    tableDef = tde.TableDefinition()
    tableDef.addColumn('SHOT_NUMBER',         tde.Type.INTEGER)
    tableDef.addColumn('GRID_TYPE',           tde.Type.CHAR_STRING)
    tableDef.addColumn('GAME_ID',             tde.Type.INTEGER)
    tableDef.addColumn('GAME_EVENT_ID',       tde.Type.INTEGER)
    tableDef.addColumn('PLAYER_ID',           tde.Type.INTEGER)
    tableDef.addColumn('PLAYER_NAME',         tde.Type.CHAR_STRING)
    tableDef.addColumn('TEAM_ID',             tde.Type.INTEGER)
    tableDef.addColumn('TEAM_NAME',           tde.Type.CHAR_STRING)
    tableDef.addColumn('PERIOD',              tde.Type.INTEGER)
    tableDef.addColumn('MINUTES_REMAINING',   tde.Type.INTEGER)
    tableDef.addColumn('SECONDS_REMAINING',   tde.Type.INTEGER)
    tableDef.addColumn('EVENT_TYPE',          tde.Type.CHAR_STRING)
    tableDef.addColumn('ACTION_TYPE',         tde.Type.CHAR_STRING)
    tableDef.addColumn('SHOT_TYPE',           tde.Type.CHAR_STRING)
    tableDef.addColumn('SHOT_ZONE_BASIC',     tde.Type.CHAR_STRING)
    tableDef.addColumn('SHOT_ZONE_AREA',      tde.Type.CHAR_STRING)
    tableDef.addColumn('SHOT_ZONE_RANGE',     tde.Type.CHAR_STRING)
    tableDef.addColumn('SHOT_DISTANCE',       tde.Type.INTEGER)
    tableDef.addColumn('LOC_X',               tde.Type.INTEGER)
    tableDef.addColumn('LOC_Y',               tde.Type.INTEGER)
    tableDef.addColumn('SHOT_ATTEMPTED_FLAG', tde.Type.INTEGER)
    tableDef.addColumn('SHOT_MADE_FLAG',      tde.Type.INTEGER)
    tableDef.addColumn('GAME_DATE',           tde.Type.INTEGER)    
    tableDef.addColumn('HTM',                 tde.Type.CHAR_STRING)
    tableDef.addColumn('VTM',                 tde.Type.CHAR_STRING)
    tableDef.addColumn('SEASON',              tde.Type.CHAR_STRING)
    tableDef.addColumn('DATE_OF_GAME',        tde.Type.DATE)
    #tableDef.addColumn('GAME_NUMBER',         tde.Type.INTEGER)  
    table = tdefile.addTable('Extract',tableDef)
#Put data into the TDE
newrow = tde.Row(tableDef)
for line in csvReader:    
    newrow.setInteger(0, int(line[0]))
    newrow.setInteger(2, int(line[2]))
    newrow.setInteger(3, int(line[3]))
    newrow.setInteger(4, int(line[4]))
    newrow.setInteger(6, int(line[6]))
    newrow.setInteger(8, int(line[8]))
    newrow.setInteger(9, int(line[9]))
    newrow.setInteger(10, int(line[10]))
    newrow.setInteger(17, int(line[17]))
    newrow.setInteger(18, int(line[18]))
    newrow.setInteger(19, int(line[19]))
    newrow.setInteger(20, int(line[20]))
    newrow.setInteger(21, int(line[21]))
    newrow.setInteger(22, int(line[22]))
    date = datetime.datetime.strptime(line[26], "%Y-%m-%d")
    newrow.setDate(26, date.year, date.month, date.day)
    #newrow.setInteger(27, int(line[27]))


The Extract API tutorial really did all the work here. Using the example in Tableau’s video, I was able to see what needed to be accomplished and convert it to my own use case. One of the reasons I love Tableau is there free content. Always easy to understand and always helpful.

You can see in the code that we are using Python to read the results from our csv. We are then using the extract api to create our tde and define all of its metadata. Pretty simple stuff.. you just need to know what columns you want in the tde and what data types they are. Tableau will do all the rest of the work as usual by separating them into dimensions and measures when you import the data.

Comments ( 2 )

  • themarkscard says:

    Wow, sorry I know this is way old news at this point, but it seems to go in and out every time I try it. Not working anymore unfortunately.. and I don’t know how to get it back.

  • CJ says:

    does shotchartdetail still work for you? i find it broken this season, no matter what

Leave A Comment

Your email address will not be published. Required fields are marked *