Why Streamlit and a few tips on deploying a data dashboard app from Python.

Image for post
Image for post
Photo by Mark Cruz on Unsplash

I recently built and deployed a data application from my Mac using Python and did not write one line of HTML, CSS, or JavaScript — all made possible with a nifty package called Streamlit.

I recently deployed an early alpha version of a data dashboard with Steamlit. Although the app is in its infancy, I’m happy to have it in some kind of production state and I am excited to continue improving on it. Based on my experience, this article contains some of the biggest gotchas and tips on breaking through a few final issues that you may encounter with a Streamlit deployment. …

How to transform a Pandas DataFrame to JSON with flare-like hierarchy to produce D3 Sunburst visualizations.

Image for post
Image for post
Photo by Jeremy Bishop on Unsplash

Data is king but colors are cool!

If your data and insights are worth telling, the right colors and design can make or break the crucial connection to your audiences. As a result, whether you are trying to win a competition or just trying to turn in a class assignment, chances are that colors can help make your case.

As I learned from personal experience, tricky data transformations on the back-end often put the desired visualization out of reach. For example, despite various libraries in Pandas, I was surprised to discover there is not a clear-cut way to transform a DataFrame to the exact JSON format required to make D3 work. When short on time, instead of mucking around with code, a seemingly easy option is to click, copy, and paste data but this type of manual work is error-prone and does not scale. …

The bool on when NA is not False, False is False, and NA is not Null.

Image for post
Image for post
Photo by Nerfee Mirandilla on Unsplash

When something is nothing, and nothing is something…for boolean data in Pandas, there is crucial difference between NaN, Null, NA, and bools — a brief on when and how to use them.

Task: Clean a Pandas DataFrame comprising boolean (true/false) values to optimize memory. A constraint is to retain all null values as nulls, i.e. don’t turn null values to False because that is a meaningful change.

Action: Explicitly transform column dtypes, i.e. use float32 instead of float64 to conserve memory and bool instead of object.

Problem: When transforming selected columns to bool, all rows evaluate to either all True or all False and returns a bad headache. …

How to visualize area plots and trends lines over grouped time periods with interactive Plotly graph objects.

Image for post
Image for post
Photo by Hari Nandakumar on Unsplash

In Brief: Create time series plots with regression trend lines by leveraging Pandas Groupby(), for-loops, and Plotly Scatter Graph Objects in combination with Plotly Express Trend Lines.


  • Data: Counts of things or different groups of things by time.
  • Objective: Visualize a time series of data, by subgroup, on a daily, monthly, or yearly basis with a trend line.
  • Issues: Confusion over syntax for Plotly Express and Plotly Graph Objects and combining standard lines charts with regression lines.
  • Environment: Python, Plotly, and Pandas

Pitfalls to avoid when deploying a data app from Python with Streamlit.

Image for post
Image for post
Photo by Science in HD on Unsplash

Deploy Code — Crash App — Learn Lessons

What happens when you deploy a data app without coding for memory optimization? In my case, at least, the app crashed and I spent days painfully refactoring code. If you are luckier or smarter (or both), then you have nothing to worry about. Otherwise, consider lessons from my mistakes and some helpful resources to avoid your own special headaches.

How to avoid my optimization mistakes to deploy your app for the win.

Coding for the Web Vs. Coding for Me

I have always acknowledged the importance of writing optimized code but I did not fully appreciate what it meant until deploying a Web app. On my laptop, even the most poorly written code will likely run, albeit slowly. However, the consequences on the Web are far more severe — memory leaks and inefficient code can cripple the experience. …

Have wacky dates in your data? Instead of dropping or filtering them, impute or substitute them with a reasonable, best-guess.

Image for post
Image for post
Photo by Ramón Salinero on Unsplash

The easy choice is to drop missing or erroneous data, but at what cost?

Dealing with missing, null, or erroneous values is one of the most painful and common exercises that we encounter in data science and in machine learning. In some cases, it is acceptable or even preferred to drop such records — algorithms are fragile and can seize up on missing values. However, in other cases when the inclusion of every record is important, what then? The easy choice is to drop missing erroneous data, but at what cost?

What if a single record represents a real person and dropping this record means a person’s story is not counted? …

Getting Started

A beginner’s guide to understanding the M-Step of Expectation-Maximization in Gaussian Mixture of Models.

Image for post
Image for post
Photo by Element5 Digital on Unsplash

I like the kind of math that can be explained to me like I’m five years old.

In this Article

My take on explaining, like I’m five years old, the math behind a key component of Gaussian Mixture Models (GMM) known as Expectation-Maximization (EM) and how to translate the concepts to Python. The focus this story is on the M of EM, or M-Step.

Note: This is not a comprehensive explanation about the end-to-end GMM algorithm. For a deeper dive, check out this article from Towards Data Science, another one on GMM, documenation from sci-kit learn, or Wikipedia.

Source: Based on my notes from studying machine learning; the source materials are derived from and credited to this university class. …

How to randomly sample NumPy arrays in Python without scikit-learn or Pandas.

Image for post
Image for post
Photo by Sergi Viladesau on Unsplash

Although there are packages such as sklearn and Pandas that manage trivial tasks like randomly selecting and splitting samples, there may be times when you need to perform these tasks without them.

In this article we will learn how to randomly select and manage data in NumPy arrays for machine learning without scikit-learn or Pandas.

Split and Stack Arrays

In machine learning, a common way to think about data structures is to have features and targets. In a simple case, let’s say we have data about animals that are either dogs or cats. …

Troubleshooting GeoPandas installation in an Anaconda (Conda) environment with PyCharm on a MacOS.

Image for post
Image for post
Photo by delfi de la Rua on Unsplash

GeoPandas is a great tool to analyze geospatial data in Python, but getting it to work can be tricky. Unlike other libraries, GeoPandas has a web of intricate dependencies and, as I recently discovered, the smallest environmental issue can be enough to derail the whole project. While troubleshooting, I found plenty of related support articles but nothing that specifically resolved my situation.

In this Article

How to install GeoPandas for Python in a conda virtual environment with PyCharm and MacOS.

Problems to be Solved

The recommended installation method, based on the documentation, is to leverage conda to install GeoPandas which manages all of its dependencies. But, depending on your base environment and other imports, this may fail. …

Learning to code with DrRacket? Here’s an unofficial starter guide to Beginning Student Language (BSL), Intermediate Student Language (ISL), ISL with Lambdas (ISL+), and Racket.

Image for post
Image for post
Photo by Wengang Zhai on Unsplash

There’s No Love for Racket

There’s no love, that is, for Racket the programming and teaching language — at very least, that’s the vibe I get from a recent Medium.com search. For example, the top stories for Racket are mostly about corruption schemes or extortion. However, somewhere in the top five search results for Racket, there ought to be something about the basics of program design or functional programming.

My friends, it is time to lift DrRacket out of the basement of programming blogs with more content on BSL, ISL, ISL+, and Racket. …


Justin Chae

@justinhchae Grad student (MSAI at Northwestern) seeking Summer ’21 Internship (SWE, AI/ML, Data Science). My views are my views. LinkedIn justin-chae

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store