"Most Read" Data Science Articles in 2014

"Most Read" Data Science Articles in 2014



  • The Current State of Machine Intelligence
    I spent the last three months learning about every artificial intelligence, machine learning, or data related startup I could find — my current list has 2,529 of them to be exact. Yes, I should find better things to do with my evenings and weekends but until then...

  • A Data Analyst's Blog Is Transforming How New Yorkers See Their City
    It may have been the fire hydrants that certified Ben Wellington as the king of New York's "open data" movement. Earlier this year Wellington pored over New York City's parking ticket data and identified two hydrants on consecutive blocks that were generating $55,000 a year in tickets, all from cars that appeared to be parking legally...

  • Starting data analysis/wrangling with R: Things I wish I'd been told
    R is a very powerful open source environment for data analysis, statistics and graphing, with thousands of packages available. After my previous blog post about likert-scales and metadata in R, a few of my colleagues mentioned that they were learning R through a Coursera course on data analysis. I have been working quite intensively with R for the last half year, and thought I'd try to document and share a few tricks, and things I wish I'd have known when I started out...

  • On Starting a New Job
    I am starting a new job in November. This is not a prank like last time. But before the grand reveal of where, first I’ll subject you to a lengthy blog post about my thoughts about the how and why. Hopefully this provides an additional perspective to the excellent posts by Lana Yarosh and Jason Yip on their experiences on the computer/information science academic job market. But those of you who know the rhythms of the academic job market are already realizing that (spoiler alert), I’m not starting a tenure-track faculty role. Instead, I’m going to spend the next few years being a data scientist...

  • Bayes Rule in an animated gif
    Say Pr(A)=5% is the prevalence of a disease (% of red dots on top fig). Each individual is given a test with accuracy Pr(B|A)=Pr(no B| no A) = 90% . The O in the middle turns into an X when the test fails. The rate of Xs is 1-Pr(B|A). We want to know the probability of having the disease if you tested positive: Pr(A|B). Many find it counterintuitive that this probability is much lower than 90%; this animated gif is meant to help...

  • How To Choose A Data Science Project For Your Data Science Portfolio
    You want to create a data science portfolio to showcase you can “do” data science. That you know how to take in a data set, clean it up, use various techniques to extract useful information from it, and then communicate the results. The problem is that you aren’t sure where to start, what projects to do, what languages to use, or even what techniques to use...


If you're interested in reading the rest of our "most read" articles this year (i.e., from other quarters), you can check them out here:

Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.