Receive the Data Science Weekly Newsletter every Thursday

Easy to unsubscribe at any time. Your e-mail address is safe.

Data Science Weekly Newsletter
Issue
436
March 31, 2022

Editor's Picks

  • Stop aggregating away the signal in your data
    By aggregating our data in an effort to simplify it, we lose the signal and the context we need to make sense of what we’re seeing...For five years as a data analyst, I forecasted and analyzed Google’s revenue. For six years as a data visualization specialist, I’ve helped clients and colleagues discover new features of the data they know best. Time and time again, I’ve found that by being more specific about what’s important to us and embracing the complexity in our data, we can discover new features in that data...
  • Taking our next step in the City by the Bay
    This morning in San Francisco, a fully autonomous all-electric Jaguar I-PACE, with no human driver behind the wheel, picked up a Waymo engineer to get their morning coffee and go to work. Since sharing that we were ready to take the next step and begin testing fully autonomous operations in the city, we’ve begun fully autonomous rides with our San Francisco employees. They now join the thousands of Waymo One riders we’ve been serving in Arizona, making fully autonomous driving technology part of their daily lives...
  • Graph machine learning with missing node features
    Graphs are a core asset at Twitter, describing how users interact with each other through Follows, Tweets, Topics, and conversations. Graph Neural Networks (GNNs) are a powerful tool that allow learning on graphs by leveraging both the topological structure and the feature information for each node. However, GNNs typically run under the assumption of a full set of features available for all nodes...This post aims to show that feature propagation is an efficient and scalable approach for handling missing features in graph machine learning applications and that it works surprisingly well despite its simplicity...



A Message From This Week's Sponsor



Retool is the fast way to build an interface for any database With Retool, you don't need to be a developer to quickly build an app or dashboard on top of any data set. Data teams at companies like NBC use Retool to build any interface on top of their data—whether it's a simple read-write visualization or a full-fledged ML workflow. Drag and drop UI components—like tables and charts—to create apps. At every step, you can jump into the code to define the SQL queries and JavaScript that power how your app acts and connects to data. The result—less time on repetitive work and more time to discover insights.


Data Science Articles & Videos

  • A Roadmap for Big Model
    The Beijing Academy of Artificial Intelligence and others have released their 200 page Roadmap for scaling the largest Foundation Models....
  • Expert opinion: Regulating AI in Europe
    The subject of this paper is the European Commission proposal for the Artificial Intelligence Act (‘the AI Act’), published on the 21 April 2021 and the draft Council position also since published...It is supported by a policy briefing which provides specific recommendations for EU policymakers for changes to be implemented into the final version of the AI Act. The briefing will also be of interest to global policymakers with an interest in emerging AI regulation...
  • Using AI to deliver more inclusive biographical content on Wikipedia
    Wikipedia, consistently ranked one of the top 10 most visited websites, is often the first stop for many people looking for information about historical figures and changemakers. But not everyone is equally represented on Wikipedia. Only about 20 percent of biographies on the English site are about women, according to the Wikimedia Foundation, and we imagine that percentage is even smaller for women from intersectional groups, such as women in science, women in Africa, and women in Asia...For my PhD project as a computer science student at the Université de Lorraine, CNRS, in France, I worked with my adviser, Claire Gardent, to develop a new way to address this imbalance using artificial intelligence....
  • How LinkedIn Personalized Performance for Millions of Members using Tensorflow.js
    The Performance team at LinkedIn optimizes latency to load web and mobile pages...At LinkedIn we have used the relationship between engagement and speed to selectively customize the features on LinkedIn Lite - a lighter, faster version of LinkedIn, specifically built for mobile web browsers...To do this, we trained a deep neural network to identify if a request to LinkedIn would result in a fast page load in real time...
  • Exploring Plain Vision Transformer Backbones for Object Detection
    We explore the plain, non-hierarchical Vision Transformer (ViT) as a backbone network for object detection...Surprisingly, we observe: (i) it is sufficient to build a simple feature pyramid from a single-scale feature map (without the common FPN design) and (ii) it is sufficient to use window attention (without shifting) aided with very few cross-window propagation blocks...
  • Are we being too harsh on junior candidates? [Reddit Discussion]
    As part of our hiring process for ML Engineers, we're looking for both senior and juniors, but for the latter we request a home task, which is about building an arch and train process for a small dataset, the domain is vision, and we don't expect an actually trained model, but something that shows that the candidate has some basic knowledge..."Mistakes" people are making...
  • How Does AI Improve Human Decision-Making? Evidence from the AI-Powered Go Program
    How does artificial intelligence (AI) improve human decision-making? Answering this question is challenging because it is difficult to assess the quality of each decision and to disentangle AI’s influence on decisions...Our analysis of 750,990 moves in 25,033 games by 1,242 professional players reveals that APGs significantly improved the quality of the players’ moves as measured by the changes in winning probability with each move. We also show that the key mechanisms are reductions in the number of human errors and in the magnitude of the most critical mistake during the game...
  • Domain Specific Architectures for Deep Neural Networks: Three Generations of Tensor Processing Units (TPUs) [Video]
    The recent success of deep neural networks (DNN) has inspired a resurgence in domain specific architectures (DSAs) to run them...DNNs have two phases: training, which constructs accurate models, and inference, which serves those models. Google's first generation Tensor Processing Unit (TPUv1) offered 50X improvement in performance per watt over conventional architectures for inference. We naturally asked whether a successor could do the same for training...This talk reviews TPUv1 and explores how Google built the first production DSA supercomputer for the much harder problem of training, which was deployed in 2017...
  • Deep Neural Networks and Tabular Data: A Survey
    This work provides an overview of state of the art deep learning methods for tabular data. We start by categorizing them into three groups: data transformations, specialized architectures, and regularization models. We then provide a comprehensive overview of the main approaches in each group. A discussion of deep learning approaches for generating tabular data is complemented by strategies for explaining deep models on tabular data. Our primary contribution is to address the main research streams and existing methodologies in this area, while highlighting relevant challenges and open research questions...
  • Data Science at Shopify
    This week’s guest is Wendy Foster, Director of Engineering & Data Science at Shopify. We discussed applications of data science within Shopify, how they organize their data teams, the lifecycle of a data science project within the company, and how they approach emerging challenges like Responsible AI, large language models, and multimodal models...



Summit



You're invited to the first-ever Metrics Store Summit Transform is hosting the first-ever industry summit on the metrics layer. The first-ever Metrics Store Summit on April 26, 2022 will bring discussions around the semantic layer into one event—providing context with use cases for metrics stores, highlighting applications for metrics, and sharing ideas from leaders across the modern data stack.You can expect to hear from Airbnb, Slack, Spotify, Atlan, Hex, Mode, Hightouch, AtScale and many more in this action-packed 1-day event. We would love to see you there! Register today for free. *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!



Jobs


Training & Resources

  • A detailed guide to colors in data vis style guides
    I’ve heard you’re interested in creating a color palette as part of a data vis style guide. Maybe you decided to use a custom design theme at Datawrapper to make your charts more consistent-looking, and our support team asked you for some colors. Maybe you’re the first proper data vis designer at your organization, and want to bring order to chaos. Or maybe you want to redesign an existing palette because your requirements have changed...This guide is very extensive — and can be a bit overwhelming. If you’re designing your very first color palette, don’t sweat. It’s simple...


Books



P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

Easy to unsubscribe at any time. Your e-mail address is safe.