Beginners guide to geospatial data plotting

Recently, in my data science program, we were given a project where we needed create a model to predict house prices. The training data set we received included property data relating to square feet, bathroom count, construction grade, etc. The data I was most excited to begin my analysis with was the location data. We were given columns containing the latitude and the longitude coordinates and instantly the first thing I wanted to do was to create a heat map showing the distribution of property prices.

Let’s take a peek at our data:

import pandas as pd
import numpy as np

An Excel favorite in Python

Most everyone who has worked with any type of data has likely used Microsoft Excel’s PivotTable function. It’s a quick, user-friendly tool that allows users to to calculate, aggregate, and summarize data sets enabling further analysis of patterns and trends. Excel provides an intuitive GUI that allows analysts to simply click, drag and drop data and easily apply whichever aggregation function they choose. It’s fantastic tool to use and aids when building Excel visualizations for business presentations.

Python’s Pandas library — which specializes in tabular data, similar to Excel — also has a .pivot_table() function that works in the same…

When Pandas outshines Excel

A year ago, if you had asked me if I thought there was a better way to analyze tabular data than utilizing Excel, I would have likely shaken my head ‘no.’ I had spent the first 8 years of my career building-up my Excel skills from beginner to advanced working various roles in corporate finance. An accounting department is a prime example of where Microsoft Excel or Google Sheets likely outshines utilizing Python’s Pandas package, as it’s more user-friendly to non-techies and easy to build reports such as account reconciliations with. …

Classification of Future All-NBA Stars

If you haven’t already, please read Part I & Part II.


  • Business Question — can we predict if NBA players will be selected to All-NBA teams in seasons 4–6, based off their first 3 seasons in the league?
  • Dataset — after web-scraping and data formatting, we were left with ~1,300 qualified players and over 100 columns of statistical features.
  • Metric — Precision, in efforts to minimize false positives
  • EDA — only 7% of qualified players hit our target. …

Classification of Future All-NBA Stars

For the first part of this blog, please click here.

Jumping right back into this — we last left off with our cleaned data set made up of our qualified NBA players’ first three seasons statistics. This left us with around 1,300 rows of players and 100+ columns of basketball stats. As a reminder, our target variable is a player being selected to an All-NBA team in seasons four through six. Let’s begin looking deeper into our data now.

Exploratory Data Analysis

First let’s check on what our class distribution looks like:

An Intro to Model Tuning & Evaluation

Machine Learning Process

Utilizing Grid-Search on Hyperparameters allows data scientists to optimize how well their models are learning from the data. Before we dive into what hyperparameters are and how grid-search applies to them, let’s figure out where they fit into the overall machine learning process. The flow-chart below illustrates where both hyperparameters & hyperparameter tuning (i.e. Grid-Search) take place during the machine learning process.

Parameter vs Hyperparameter; Internal vs External

Like many terms and acronyms you come across when entering the world of machine learning, understanding the difference between ‘parameters’ and hyperparameters’ can be difficult. Let’s take a quick moment to break these two terms down.

An Introduction to Evaluation Metrics


A big part of assessing the performance of logistical models is analyzing the evaluation metrics. In contrast to linear regression, where error was determined by how far estimates were off from actuals, with classification modeling you’re either correct or incorrect. Due to this distinction, when evaluating your model, it’s critical to look into these evaluation metrics.

In this post, we’ll touch upon the 4 major evaluation metrics but mainly focus on Precision and Recall. We’ll dive into what these metrics calculate, how they can influence other important metrics and the circumstances when one could be more important than the other.


A Beginners Guide to Python Coding

Prior to starting at Flatiron School, I was buried under balance sheets and P&L’s working as an accountant. Like my other accounting brethren, I spent the majority of my time either in an ERP system, like SAP or NetSuite, or hammering away ‘V-lookups’ and various other formulas in Excel.

With accounting, there are only so many ways things can be done. Publicly traded companies must follow guidelines stated by GAAP, the SEC and other governing bodies for regulatory reasons. A lot of the work can be relatively procedural due to this. So, you might understand why I was so overwhelmed…

Ryan Lewis

Data Science | Sports Analytics | Data Visualization

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store