CODEX

How to Identify CPU and Memory Inefficiencies

To take your code to the next level, you need to ensure that your solutions are as efficient as possible! But how do we identify the bottlenecks in speed or memory usage? We profile!

Image by Tianyi Ma on Unsplash

CPU Usage and Timing

With a Custom Decorator

Decorators are functions that take other functions as arguments — an example of a higher-order function. In short, they add additional functionality (pun intended) without modifying the original function. In this case, our fn_timer decorator takes in a function and adds a timer by storing a start time, running the function, then storing an end time and returning a value of elapsed time!

import time
from functools…


What I learned interviewing with companies of all sizes in one week

This will be a more personal blog than some of my previous technical articles. The reason is there in the title! I had several interviews this week at companies of very different sizes and I wanted to share my experience. This is, of course, only my subjective opinion.

Image Courtesy of Dylan Gillis on Unsplash

Small

My first interview was with a small health-tech startup with fewer than 50 employees. While the role was technical, few of the questions (at this point at least) were technical, instead there was a much heavier emphasis on company culture, teamwork and collaboration, and authentic buy-in for the product and mission. When…


Data Science

Proteomics with Biopython

Human biology is incredibly complex. Even with our ever-growing understanding, our answers only uncover more and more questions. The completion of the Human Genome Project gave many scientists confidence that we could solve pressing issues in biology through genomics. However, as our understanding of biology has grown, we’ve recognized that other factors influence how an organism’s genome is utilized. Thus, new fields of study were born to address these interconnected and flexible domains, including transcriptomics (study of mRNA) and proteomics (study of proteins).

GIF by Author

As I covered in my previous blog, the Biopython package is quite powerful and can visualize and…


Data Science

Your One Stop Shop for All Things NetworkX

Graph theory is an incredibly potent data science tool that allows you to visualize and understand complex interactions. As part of an open-source project, I’ve collected information from many primary sources to build a graph of relationships between professional theatre lighting designers in New York City.

Image by Author

I used NetworkX, a Python package for constructing graphs, which has mostly useable defaults, but leveraging matplotlib allows us to customize almost every conceivable aspect of the graph. I knew what I wanted it to look like in my head, but after many hours of searching through documentation and StackOverflow I decided to create…


Data Science

Creating Insightful and Beautiful Bioinformatics Visualizations

There’s a misconception that science is dry, humorless, and overly technical, but that doesn’t need to be the case! Effective visualizations can transform raw DNA into stunning figures that relay a lot of information with clarity.

Biopython is the go-to, open-source bioinformatics, or computational molecular biology, package for Python. It provides custom classes and methods, parsing for standard biological file formats, and interfacing to other bioinformatics programs, such as BLAST, EMBOSS, or Clustalw. And for the purposes of this article, it can create amazing visuals using biological data.

We’ll start by diving in to Biopython’s methods for dealing with DNA…


Data Science

Why data ≠ analysis

As I researched some applications for data in precision medicine, I came across an interesting claim in “From Big Data to Precision Medicine” an article in Frontiers in Medicine. The article states:

“However, ‘Big data’ no longer means what it once did. The term has expanded and now refers not to just large data volume, but to our increasing ability to analyze and interpret those data. Tautologies such as ‘data analytics’ and ‘data science’ have emerged to describe approaches to the volume of available information as it grows ever larger.” [1]

Image Courtesy of Dhruv Weaver on Unsplash

We should probably start with understanding what a tautology


Data Science

Understanding a key assumption in statistics and its implications

The main purpose of data science generally, and machine learning specifically, is to use the past to predict the future. Beyond the specific assumptions of various statistical models, the inescapable assumption is that the future can be predicted from past events.

Image Courtesy of Ary Attab on Unsplash

We assume that there is some function we can describe that takes in our observed data and outputs a probability of some future event — P(Y | X), a probability of Y output given X inputs. We have to construct our model carefully to ensure that the information provided by X is useful and reliable for predicting Y. …


Python

Scraping a ‘table’ from a webpage

A recent passion project exploring the data of Broadway demanded some data that I was having trouble tracking down. But then, success! I found a webpage with a table of earnings and performance information by Broadway show! The problem? How to get that data into a usable format.

The answer? Time to scrape! While web scraping is an amazing tool in the data scientist’s toolbox, it wasn’t something I had much exposure to, especially ‘in the wild’.

I hope this guide will be useful for those looking to get into web scraping or to build some more experience.

Image Courtesy of Fotis Fotopoulos on Unsplash

Step 1: Requests

We’ll start…


Data Science

Insights for customer segmentation and extrinsic factors

After a few back-to-back projects featuring neural networks (which you can explore here and here) and deciding to build a NN from scratch, I wanted to start an EDA-focused project to keep my pandas skills sharp and get back to data science’s roots. Generating insightful, actionable understanding and recommendations from data.

Image Courtesy of Yuanbin Du on Unsplash

The Data

The Capital Bikeshare dataset is pretty widespread. It can be found on UCI or dataworld. It contains some 17,000 observations from 2011 and 2012. While the data is fairly clean, it’s a little tricky because the grain of the data, or what one observation represents, is one hour of…


Machine Learning

Going Beyond “import keras”

I’m sure you’ve heard this advice before. I know I did. I read it in other blogs, I heard it in many podcasts and interviews, and even had the thought myself, “I really should do that.” But for a long time, I didn’t.

If you want to set yourself apart from other data scientists, this is a great project. More than showcasing your mathematical and programming understanding, it shows that you have another important and intangible quality, follow through. Teams want a colleague who does what they say they will and who will go the extra mile!

Image Courtesy of Nick Hillier on Unsplash

It’s never been…

Aren Carpenter

Data Scientist. Exploring the intersection between AI and Healthcare/Oncology. Flatiron alum.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store