To take your code to the next level, you need to ensure that your solutions are as efficient as possible! But how do we identify the bottlenecks in speed or memory usage? We profile!
Decorators are functions that take other functions as arguments — an example of a higher-order function. In short, they add additional functionality (pun intended) without modifying the original function. In this case, our fn_timer decorator takes in a function and adds a timer by storing a start time, running the function, then storing an end time and returning a value of elapsed time!
import time from…
This will be a more personal blog than some of my previous technical articles. The reason is there in the title! I had several interviews this week at companies of very different sizes and I wanted to share my experience. This is, of course, only my subjective opinion.
My first interview was with a small health-tech startup with fewer than 50 employees. While the role was technical, few of the questions (at this point at least) were technical, instead there was a much heavier emphasis on company culture, teamwork and collaboration, and authentic buy-in for the product and mission. When…
Human biology is incredibly complex. Even with our ever-growing understanding, our answers only uncover more and more questions. The completion of the Human Genome Project gave many scientists confidence that we could solve pressing issues in biology through genomics. However, as our understanding of biology has grown, we’ve recognized that other factors influence how an organism’s genome is utilized. Thus, new fields of study were born to address these interconnected and flexible domains, including transcriptomics (study of mRNA) and proteomics (study of proteins).
As I covered in my previous blog, the Biopython package is quite powerful and can visualize and…
Graph theory is an incredibly potent data science tool that allows you to visualize and understand complex interactions. As part of an open-source project, I’ve collected information from many primary sources to build a graph of relationships between professional theatre lighting designers in New York City.
I used NetworkX, a Python package for constructing graphs, which has mostly useable defaults, but leveraging matplotlib allows us to customize almost every conceivable aspect of the graph. I knew what I wanted it to look like in my head, but after many hours of searching through documentation and StackOverflow I decided to create…
There’s a misconception that science is dry, humorless, and overly technical, but that doesn’t need to be the case! Effective visualizations can transform raw DNA into stunning figures that relay a lot of information with clarity.
Biopython is the go-to, open-source bioinformatics, or computational molecular biology, package for Python. It provides custom classes and methods, parsing for standard biological file formats, and interfacing to other bioinformatics programs, such as BLAST, EMBOSS, or Clustalw. And for the purposes of this article, it can create amazing visuals using biological data.
We’ll start by diving in to Biopython’s methods for dealing with DNA…
As I researched some applications for data in precision medicine, I came across an interesting claim in “From Big Data to Precision Medicine” an article in Frontiers in Medicine. The article states:
“However, ‘Big data’ no longer means what it once did. The term has expanded and now refers not to just large data volume, but to our increasing ability to analyze and interpret those data. Tautologies such as ‘data analytics’ and ‘data science’ have emerged to describe approaches to the volume of available information as it grows ever larger.” [1]
We should probably start with understanding what a tautology…
The main purpose of data science generally, and machine learning specifically, is to use the past to predict the future. Beyond the specific assumptions of various statistical models, the inescapable assumption is that the future can be predicted from past events.
We assume that there is some function we can describe that takes in our observed data and outputs a probability of some future event — P(Y | X), a probability of Y output given X inputs. We have to construct our model carefully to ensure that the information provided by X is useful and reliable for predicting Y. …
A recent passion project exploring the data of Broadway demanded some data that I was having trouble tracking down. But then, success! I found a webpage with a table of earnings and performance information by Broadway show! The problem? How to get that data into a usable format.
The answer? Time to scrape! While web scraping is an amazing tool in the data scientist’s toolbox, it wasn’t something I had much exposure to, especially ‘in the wild’.
I hope this guide will be useful for those looking to get into web scraping or to build some more experience.
We’ll start…
After a few back-to-back projects featuring neural networks (which you can explore here and here) and deciding to build a NN from scratch, I wanted to start an EDA-focused project to keep my pandas skills sharp and get back to data science’s roots. Generating insightful, actionable understanding and recommendations from data.
The Capital Bikeshare dataset is pretty widespread. It can be found on UCI or dataworld. It contains some 17,000 observations from 2011 and 2012. While the data is fairly clean, it’s a little tricky because the grain of the data, or what one observation represents, is one hour of…
I’m sure you’ve heard this advice before. I know I did. I read it in other blogs, I heard it in many podcasts and interviews, and even had the thought myself, “I really should do that.” But for a long time, I didn’t.
If you want to set yourself apart from other data scientists, this is a great project. More than showcasing your mathematical and programming understanding, it shows that you have another important and intangible quality, follow through. Teams want a colleague who does what they say they will and who will go the extra mile!
Data Scientist. Exploring the intersection between AI and Healthcare/Oncology. Flatiron alum.