Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
Issue
24
May 8, 2014

Editor's Picks

  • Wine Classification using Neural Networks
    Neural networks can solve some really interesting problems once they are trained. They are particularly well suited for complex decision boundary problems over many variables. In this demo we will try to build a neural network that can classify wines from three wineries by thirteen attributes...
  • Spark is on fire
    Spark is on the rise, to an even greater degree than I thought last month...



Data Science Articles & Videos

  • How to create a Data-Driven Organization: One Year On
    A year ago, I wrote a well-received post here entitled How do you create a data-driven organization?". I had just joined Warby Parker and set out my various thoughts on the subject at the time, covering topics such as understanding the business and customer, skills and training, infrastructure, dashboards and metrics. One year on, I decided to write an update. So, how did we do?...
  • Spatial Localization of Recent Ancestors for Admixed Individuals
    Ancestry analysis from genetic data plays a critical role in studies of human disease and evolution. Recent work has introduced explicit models for the geographic distribution of genetic variation and has shown that such explicit models yield superior accuracy in ancestry inference over non-model-based methods. Here we extend such work to introduce a method that models admixture between ancestors from multiple sources...
  • Smart Umbrellas 'could collect Rain Data'
    How would you fancy being a mobile weather station? Rolf Hut, from Delft University of Technology in The Netherlands, plans to turn our umbrellas into rain gauges. His prototype smart brolly has a sensor that detects raindrops falling on its canvas, and uses bluetooth to send this information via a phone to a computer...
  • Eurovision 2014: First predictions
    For the last two years, I’ve been publishing the results of a statistical model for predicting the results of the Eurovision Song Contest. This year’s final takes place on Saturday in an abandoned shipyard in Copenhagen, so it’s time for some more predictions. I’ve made some small changes to the model this year, which have had huge consequences for the results, which I think should be a lot more accurate now....
  • Kaggle LSHTC4 Winning Solution
    Our winning submission to the 2014 Kaggle competition for Large Scale Hierarchical Text Classification (LSHTC) consists mostly of an ensemble of sparse generative models extending Multinomial Naive Bayes. This document describes the models and software used to build our solution...
  • Intuition for Simulated Annealing
    This post develops the intuition behind simulated annealing via lots of pictures. It's self-contained and ought to be accessible to those without a math-centric background. It also serves as a gentle introduction to more technical discussions...



Jobs


Training & Resources

  • Yann LeCun will be doing an AMA in /r/MachineLearning on May 15 4PM EST
    I'm happy to announce Director of AI Research at Facebook/NYU Professor Yann LeCun will be stopping by /r/MachineLearning on May 15 4:00-6:00 PM EST for an AMA. Based on the success of the last AMA, a thread will be created before the official AMA time for those who won't be able to attend...
  • Billion Words: Because today's language modeling standard should be higher
    We [Google Research] are releasing scripts that convert a set of public data into a language model consisting of over a billion words, with standardized training and test splits, described in an arXiv paper. Along with the scripts, we’re releasing the processed data in one convenient location, along with the training and test data...
  • JHU Data Science: More is More
    Today Jeff Leek, Brian Caffo, and I are launching 3 new courses on Coursera as part of the Johns Hopkins Data Science Specialization...
  • 15 In-Depth Data Scientist Interviews
    Over the past few months we have been lucky enough to conduct in-depth interviews with 15 different Data Scientists for our blog. The 15 interviewees have varied roles and focus areas: from start-up founders to academics to those working at more established companies; working across healthcare, energy, retail, agriculture, travel, dating, SaaS and more...


Books


  • Data Just Right: Introduction to Large-Scale Data & Analytics
    Released Dec 2013 this book is well rated (4.7 out of 5 stars on Amazon)...
    "If you work with expensive enterprise strength data management/analysis products like SAS and Oracle and you want a book that will give you a map to cover the open source tools for dealing with "big data" (i.e., Hadoop, Hive, and Pig) get this. It does an amazingly good job of explaining the utility of the various tools that are used to manage *HUGE* data."...


Easy to unsubscribe at any time. Your e-mail address is safe.