Data Science Weekly Newsletter

Issue

171

March 2, 2017

‍

Editor's Picks

‍

On the Origin of Deep Learning
This paper is a review of the evolutionary history of deep learning models. It covers from the genesis of neural networks when associationism modeling of the brain is studied, to the models that dominate the last decade of research in deep learning like convolutional neural networks, deep belief networks, and recurrent neural networks, and extends to popular recent models like variational autoencoder and generative adversarial nets...

Mathematicians becoming data scientists: Should you? How to?
I was talking the other day with a former student at UW, Sarah Rich, who’s done degrees in both math and CS and then went off to Twitter. I asked her: so what would you say to a math Ph.D. student who was wondering whether they would like being a data scientist in the tech industry? How would you know whether you might find that kind of work enjoyable? And if you did decide to pursue it, what’s the strategy for making yourself a good job candidate?...

Self-driving cars in the browser
The goal of this project was to create a fully self-learning agent, that would be able to control a car in a 2D bottom-down environment. Written solely in JavaScript...

‍

‍

Yhat Demo Webinar

Join Yhat cofounder Greg Lamp for a live tour of Yhat's product suite using a beer recommender algorithm as an example. We'll demo our open-source Python IDE, Rodeo, our centralized data science hub, Bandit, and finally our model deployment platform, ScienceOps.The webinar will take place on Wednesday, March 22 at 2 PM EST. Get your invite to the Yhat webinar today!

‍

‍

Beyond The Tip: A Data-Driven Exploration of Archer
Archer has run for 7 seasons with an 8th on the way, it follows the title character and a team of spies and administrative staff as they battle rival spy agencies, the KGB, arms dealers, drug lords, kidnappers, paramilitarios, Welsh separatists, cyborgs, clones, tigers, crocodiles, alligators, and if we're in the Orinoco drainage basin, the black cayman, which can grow up to 20 feet long. In an attempt to better see that structure, we've used data analysis and data visualization of the captioning of the shows...

'Computer bots are like humans, having fights lasting years'
Researchers say 'benevolent bots', otherwise known as software robots, that are designed to make articles on Wikipedia better often end up having online fights lasting years over changes in content...

Voronoï playground : interactive weighted Voronoï study
This block experiments weighted Voronoï diagram. Weighted Voronoï diagram comes in several flavours (additive/multiplicative, powered/not-powered, 2D/3D and highier dimensions, ...), but this block focuses on the 2D additive weighted power diagram. It helps me to understand the basics (properties, underlying computations, meanings, ...) of such diagram...

Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US
Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to accurately estimate income, race, education, and voting patterns, with single-precinct resolution...

Surprise Maps: Showing the Unexpected
Surprise maps are useful when the raw numbers, by themselves, don’t tell us much: visual patterns might look complex but convey only statistical noise, or patterns may look simple but hide the really interesting features...

What’s wrong with my time series? Model validation without a hold-out set
Time series modeling sits at the core of critical business operations such as supply and demand forecasting and quick-response algorithms like fraud and anomaly detection. Small errors can be costly, so it’s important to know what to expect of different error sources. The trouble is that the usual approach of cross-validation doesn’t work for time series models. The reason is simple: time series data are autocorrelated so it’s not fair to treat all data points as independent and randomly select subsets for training and testing. In this post I’ll go through alternative strategies for understanding the sources and magnitude of error in time series...

Billion-scale similarity search with GPUs
This paper tackles the problem of better utilizing GPUs for this task. While GPUs excel at data-parallel tasks, prior approaches are bottlenecked by algorithms that expose less parallelism, such as k-min selection, or make poor use of the memory hierarchy. We propose a design for k-selection that operates at up to 55% of theoretical peak performance, enabling a nearest neighbor implementation that is 8.5x faster than prior GPU state of the art...

Deep and Hierarchical Implicit Models
Implicit probabilistic models are a very flexible class for modeling data. They define a process to simulate observations, and unlike traditional models, they do not require a tractable likelihood function. In this paper, we develop two families of models: hierarchical implicit models and deep implicit models. They combine the idea of implicit densities with hierarchical Bayesian modeling and deep neural networks...

‍

‍

Data Scientist - Gilt Groupe - NYC
The Data team is composed of data engineers and data scientists, and sits within the Gilt Tech organization. Data engineers extract, load and transform data, then empower business users to build dashboards and interpret data. Data scientists use the tools of statistics and machine learning to solve hard problems around the business. We have data crying out for attention. Whether you’re interested in consumer behavior, pricing and online commerce, retail and fashion, logistics and operations - we have rich, clean data to tackle nearly any subject...

‍

‍

Matplotlib Tutorial: Python Plotting
The tutorial focuses on explaining some key concepts of this Python data visualization package with some answers to FAQs...

How to build a scaleable crawler to crawl million pages with a single machine in just 2 hours
There’ve been lots of articles about how to build a python crawler . If you are a newbie in python and not familiar with multiprocessing or multithreading , perhaps this tutorial will be right choice for you...

Pre-trained word vectors
We are publishing pre-trained word vectors for 90 languages, trained on Wikipedia. These are vectors in dimension 300, trained with the default parameters of fastText...

‍

‍

Data Scientists at Work
"A collection of interviews with 16 of the world's most influential and innovative data scientists from across the spectrum of this hot new profession - from Yann LeCun at Facebook to Jake Porway at DataKind"...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

‍