Recently, in my data science program, we were given a project where we needed create a model to predict house prices. The training data set we received included property data relating to square feet, bathroom count, construction grade, etc. The data I was most excited to begin my analysis with was the location data. We were given columns containing the latitude and the longitude coordinates and instantly the first thing I wanted to do was to create a heat map showing the distribution of property prices.
Let’s take a peek at our data:
import pandas as pd
import numpy as np
Most everyone who has worked with any type of data has likely used Microsoft Excel’s PivotTable function. It’s a quick, user-friendly tool that allows users to to calculate, aggregate, and summarize data sets enabling further analysis of patterns and trends. Excel provides an intuitive GUI that allows analysts to simply click, drag and drop data and easily apply whichever aggregation function they choose. It’s fantastic tool to use and aids when building Excel visualizations for business presentations.
Python’s Pandas library — which specializes in tabular data, similar to Excel — also has a .pivot_table() function that works in the same…
A year ago, if you had asked me if I thought there was a better way to analyze tabular data than utilizing Excel, I would have likely shaken my head ‘no.’ I had spent the first 8 years of my career building-up my Excel skills from beginner to advanced working various roles in corporate finance. An accounting department is a prime example of where Microsoft Excel or Google Sheets likely outshines utilizing Python’s Pandas package, as it’s more user-friendly to non-techies and easy to build reports such as account reconciliations with. …
For the first part of this blog, please click here.
Jumping right back into this — we last left off with our cleaned data set made up of our qualified NBA players’ first three seasons statistics. This left us with around 1,300 rows of players and 100+ columns of basketball stats. As a reminder, our target variable is a player being selected to an All-NBA team in seasons four through six. Let’s begin looking deeper into our data now.
First let’s check on what our class distribution looks like:
What is the NBA Leap and how do we define it? I think Zach Lowe, NBA Analyst for ESPN, puts it best:
“A player’s identity typically begins to crystallize in his third or fourth NBA season. Young players have learned the ropes, and veterans have departed or aged, vacating heavy-duty roles that need filling. Everyone involved — players, agents, executives — looks to see what emerges as a player nears the expiration of his rookie contract.”
More or less, the NBA Leap is when a young NBA player transitions from a productive teammate into a bonafide NBA star.
Utilizing Grid-Search on Hyperparameters allows data scientists to optimize how well their models are learning from the data. Before we dive into what hyperparameters are and how grid-search applies to them, let’s figure out where they fit into the overall machine learning process. The flow-chart below illustrates where both hyperparameters & hyperparameter tuning (i.e. Grid-Search) take place during the machine learning process.
Like many terms and acronyms you come across when entering the world of machine learning, understanding the difference between ‘parameters’ and hyperparameters’ can be difficult. Let’s take a quick moment to break these two terms down.
A big part of assessing the performance of logistical models is analyzing the evaluation metrics. In contrast to linear regression, where error was determined by how far estimates were off from actuals, with classification modeling you’re either correct or incorrect. Due to this distinction, when evaluating your model, it’s critical to look into these evaluation metrics.
In this post, we’ll touch upon the 4 major evaluation metrics but mainly focus on Precision and Recall. We’ll dive into what these metrics calculate, how they can influence other important metrics and the circumstances when one could be more important than the other.
Prior to starting at Flatiron School, I was buried under balance sheets and P&L’s working as an accountant. Like my other accounting brethren, I spent the majority of my time either in an ERP system, like SAP or NetSuite, or hammering away ‘V-lookups’ and various other formulas in Excel.
With accounting, there are only so many ways things can be done. Publicly traded companies must follow guidelines stated by GAAP, the SEC and other governing bodies for regulatory reasons. A lot of the work can be relatively procedural due to this. So, you might understand why I was so overwhelmed…