Data Science Weekly Newsletter

Issue

372

January 7, 2021

‍

Editor's Picks

‍

Why is Artificial Intelligence So Useless for Business?
AI research has structural problems that limit how much it can impact business. But understanding why gives a way to determine what will work and what won’t, as well as reveal new business opportunities. Each business process is a chance for automation, and therefore an opportunity for a business to save an entire industry much time and money and pocket some of the profits. If one can understand the AI research well enough, understand the business process thoroughly enough, and then gather enough data to train a model, then one can build a profitable company out of it...

A Case for Cooperation Between Machines and Humans
A computer scientist argues that the quest for fully automated robots is misguided, perhaps even dangerous. His decades of warnings are gaining more attention...

Symbolic Mathematics Finally Yields to Neural Networks
Neural networks have always lagged in one conspicuous area: solving difficult symbolic math problems. The situation changed late last year when Guillaume Lample and François Charton unveiled a successful first approach to solving symbolic math problems with neural networks. As a result, Lample and Charton’s program could produce precise solutions to complicated integrals and differential equations — including some that stumped popular math software packages with explicit problem-solving rules built in. Not that that’s happened yet, of course. But it’s clear that the team has answered the decades-old question — can AI do symbolic math? — in the affirmative...

‍

A Message From This Week's Sponsor

‍

From notebooks to production

Valohai is the MLOps platform for your whole machine learning team. The platform offers hosted notebooks for data scientists experimenting with models and pipelines for ML engineers automating the retraining and optimization of models. Best of all, your work as a data scientist can be seamlessly integrated into to company-wide ML workflow. Every experiment is automatically versioned in a central repository so you can always reproduce what you did two days or even months ago.

‍

Data Science Articles & Videos

‍

How to Serve Models
There are many ways to serve ml(machine learning) models, but these are the most common 3 patterns I observed over the years...1. Materialize/Compute predictions offline and serve through a database, 2. Use model within the main application, model serving/deployment can be done with main application deployment, and 3. Use model separately in a microservice architecture where you send input and get output...In this post, I want to go over these different architectures/patterns and then outline the advantages and disadvantages in a more objective manner...

Machine Learning and Causal Inference [Video]
This talk will review a series of recent papers that develop new methods based on machine learning methods to approach problems of causal inference, including estimation of conditional average treatment effects and personalized treatment assignment policies. Approaches for randomized experiments, environments with unconfoundedness, instrumental variables, and panel data will be considered...

Machine Learning is not just about Deep Learning [Reddit Discussion]
I understand how mind blowing the potential of deep learning is, but the truth is, majority of companies in the world dont care about it, or do not need that level of machine learning expertise...What I see is that most youngsters join this bandwagon of machine learning with hopes of working on these mind-blowing ideas, but when they do get a job at a descent company with a good pay, but are asked to produce "medicore" models, they feel like losers...Since when did the people who use Gradient Boosting, Logistic regression, Random Forest became oldies and medicore...The result is that, most of the [people] we interwiew for a role know very little about basics and hardly anything about the underlying maths. The just know how to use the packages on already prepared data...

Pose Animator - An open source tool to bring SVG characters to life in the browser via motion capture
The PoseNet and Facemesh (from Mediapipe) TensorFlow.js models made real time human perception in the browser possible through a simple webcam. As an animation enthusiast who struggles to master the complex art of character animation, I saw hope and was really excited to experiment using these models for interactive, body-controlled animation...The result is Pose Animator, an open-source web animation tool that brings SVG characters to life with body detection results from webcam. This blog post covers the technical design of Pose Animator, as well as the steps for designers to create and animate their own characters...

A Complete 4-Year Course Plan for an Artificial Intelligence Undergraduate Degree
Having been out of school for a while now, I’ve had a lot of time to reflect on how well certain courses prepared me for my career in artificial intelligence and machine learning. I finally decided to put my thoughts to the page and design a complete curriculum for a 4-year undergraduate degree in artificial intelligence...These courses are intended to provide both breadth and depth to newcomers in the fields of artificial intelligence and computer science. This curriculum is inspired heavily by the courses that I took and is a reflection of the skills I believe are necessary to succeed in an artificial intelligence career today...While you might be able to acquire some knowledge of AI through a single Coursera class, my emphasis here is instead on developing a deep conceptual understanding coupled with practical application of those concepts...

Modern Rule-Based Models
Machine learning models come in many shapes and sizes. While deep learning models currently have the lion’s share of coverage, there are many other classes of models that are effective across many different problem domains. This post gives a short summary of several rule-based models that are closely related to tree-based models (but are less widely known)...To start, let’s discuss the concept of rules more generally...

DE⫶TR: End-to-End Object Detection with Transformers
We believe that object detection should not be more difficult than classification, and should not require complex libraries for training and inference. DETR is very simple to implement and experiment with, and we provide a standalone Colab Notebook showing how to do inference with DETR in only a few lines of PyTorch code. Training code follows this idea - it is not a library, but simply a main.py importing model and criterion definitions with standard training loops...For details see End-to-End Object Detection with Transformers by Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, and Sergey Zagoruyko...

Why TinyML will be huge
In this episode of the Data Exchange I speak with Pete Warden, Staff Research Engineer at Google...Most recently, Pete has been focused on implementing machine learning in ultra-low power systems (TinyML)...Our conversation focused on TinyML and other topics including: a) The early days of using deep learning for computer vision, b) TensorFlow – Pete was part of the team at Google that originated TF, c) What is TinyML and why is going to be an important topic in the years ahead, d) Privacy and security in the context of TinyML, and more...

Self Supervised Representation Learning in NLP
While Computer Vision is making amazing progress on self-supervised learning only in the last few years, self-supervised learning has been a first-class citizen in NLP research for quite a while...The Word2Vec paper from 2013 popularized this paradigm and the field has rapidly progressed applying these self-supervised methods across many problems...At the core of these self-supervised methods lies a framing called “pretext task” that allows us to use the data itself to generate labels and use supervised methods to solve unsupervised problems...In this post, I will provide an overview of the various pretext tasks that researchers have designed to learn representations from text corpus without explicit data labeling. The focus of the article will be on the task formulation rather than the architectures implementing them...

‍

Training

‍

Help meet the growing demand in data analytics.

The Data Analytics Career Track is a 6-month, self-paced online course that will pair you with your own industry expert mentor as you learn skills to land a role in data analytics. Technical skills are not enough to get hired, but with Springboard’s data analytics course, you’ll gain the strategic thinking, problem-solving, and communication skills hiring managers are looking for. Learn more today
*Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Data Scientist - Amazon Demand Forecasting - New York

The Amazon Demand Forecasting team seeks a Data Scientist with strong analytical and communication skills to join our team. We develop sophisticated algorithms that involve learning from large amounts of data, such as prices, promotions, similar products, and a product's attributes, in order to forecast the demand of over 190 million products world-wide. These forecasts are used to automatically order more than $200 million worth of inventory weekly, establish labor plans for tens of thousands of employees, and predict the company's financial performance. The work is complex and important to Amazon. With better forecasts we drive down supply chain costs, enabling the offer of lower prices and better in-stock selection for our customers...

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Papers With Code
The mission of Papers With Code is to create a free and open resource with Machine Learning papers, code and evaluation tables...We believe this is best done together with the community and powered by automation...We've already automated the linking of code to papers, and we are now working on automating the extraction of evaluation metrics from papers...

The Big Bad NLP Database
A collection of 481 NLP datasets for various tasks in Natural Language Processing...

The Ultimate Guide to Linear Regression
In this post we are going to discuss the linear regression model used in machine learning. Modeling for this post will mean using a machine learning technique to learn - from data - the relationship between a set of features and what we hope to predict. Let’s bring in some data to make this idea more concrete...

‍

Books

‍

Seven Databases in Seven Weeks:
A Guide to Modern Databases and the NoSQL Movement
"A book that tries to cover multiple database is a risky endeavor, a book that also provides hands on on each is even riskier but if implemented well leads to a great package. I loved the specific exercises the authors covered. A must read for all big data architects who don’t shy away from coding..."... For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page
.
P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍