Recently, in my data science program, we were given a project where we needed create a model to predict house prices. The training data set we received included property data relating to square feet, bathroom count, construction grade, etc. The data I was most excited to begin my analysis with was the location data. We were given columns containing the latitude and the longitude coordinates and instantly the first thing I wanted to do was to create a heat map showing the distribution of property prices.
Let’s take a peek at our data:
import pandas as pd
import numpy as np
For the first part of this blog, please click here.
Jumping right back into this — we last left off with our cleaned data set made up of our qualified NBA players’ first three seasons statistics. This left us with around 1,300 rows of players and 100+ columns of basketball stats. As a reminder, our target variable is a player being selected to an All-NBA team in seasons four through six. Let’s begin looking deeper into our data now.
First let’s check on what our class distribution looks like:
What is the NBA Leap and how do we define it? I think Zach Lowe, NBA Analyst for ESPN, puts it best:
“A player’s identity typically begins to crystallize in his third or fourth NBA season. Young players have learned the ropes, and veterans have departed or aged, vacating heavy-duty roles that need filling. Everyone involved — players, agents, executives — looks to see what emerges as a player nears the expiration of his rookie contract.”
More or less, the NBA Leap is when a young NBA player transitions from a productive teammate into a bonafide NBA star.
Utilizing Grid-Search on Hyperparameters allows data scientists to optimize how well their models are learning from the data. Before we dive into what hyperparameters are and how grid-search applies to them, let’s figure out where they fit into the overall machine learning process. The flow-chart below illustrates where both hyperparameters & hyperparameter tuning (i.e. Grid-Search) take place during the machine learning process.
Like many terms and acronyms you come across when entering the world of machine learning, understanding the difference between ‘parameters’ and hyperparameters’ can be difficult. Let’s take a quick moment to break these two terms down.
A big part of assessing the performance of logistical models is analyzing the evaluation metrics. In contrast to linear regression, where error was determined by how far estimates were off from actuals, with classification modeling you’re either correct or incorrect. Due to this distinction, when evaluating your model, it’s critical to look into these evaluation metrics.
In this post, we’ll touch upon the 4 major evaluation metrics but mainly focus on Precision and Recall. We’ll dive into what these metrics calculate, how they can influence other important metrics and the circumstances when one could be more important than the other.
Prior to starting at Flatiron School, I was buried under balance sheets and P&L’s working as an accountant. Like my other accounting brethren, I spent the majority of my time either in an ERP system, like SAP or NetSuite, or hammering away ‘V-lookups’ and various other formulas in Excel.
With accounting, there are only so many ways things can be done. Publicly traded companies must follow guidelines stated by GAAP, the SEC and other governing bodies for regulatory reasons. A lot of the work can be relatively procedural due to this. So, you might understand why I was so overwhelmed…
Data Science | Sports Analytics | Data Visualization