Data Science Weekly Newsletter

Issue

411

October 7, 2021

‍

Editor's Picks

‍

Failing to freelance in data visualization
After freelancing in data visualization for a little over a year, I decided to call it quits. I reflect on what went wrong and what you could learn from my mistakes...

Colorizing Klimt’s Vanished Paintings with Artificial Intelligence and Klimt Experts
Gustav Klimt’s three masterpieces: Medicine, Jurisprudence, and Philosophy were destroyed during the Second World War. Only black and white photos and articles describing the paintings remain...Once the color information was sourced, Emil Wallner...developed an algorithm to use Dr. Smola’s research to restore the Faculty Paintings. Instead of manually coloring the paintings, Wallner’s algorithm does a statistical analysis of Klimt’s existing artworks and learns how to mimic Klimt’s colorization style...

Machine Learning Practices Outside Big Tech: How Resource Constraints Challenge Responsible Development
We contribute a qualitative analysis of 17 interviews with stakeholders from organizations which are less represented in prior studies. We uncover a number of tensions which are introduced or exacerbated by these organizations' resource constraints -- tensions between privacy and ubiquity, resource management and performance optimization, and access and monopolization. Increased academic focus on these practitioners can facilitate a more holistic understanding of ML limitations, and so is useful for prescribing a research agenda to facilitate responsible ML development for all...

‍

A Message From This Week's Sponsor

‍

Online Data Science Programs from Drexel University Find your algorithm for success with an online data science degree from Drexel University. Gain essential skills in tool creation and development, data and text mining, trend identification, and data manipulation and summarization by using leading industry technology to apply to your career. Learn more.

‍

Data Science Articles & Videos

‍

SeriesHeat
Search for a TV Series to see a heatmap of average IMDb ratings for each episode. The "Flip" toggle determines if the Seasons are shown as columns or rows. Click on a cell to see its IMDb page...

How to Build Your Data Analytics Team
As businesses recognize the decisive power of data to achieve business goals, most are hoping to put data in the driver’s seat of their business and product strategies. This entails putting together a strong data team which can effectively propagate its insights across different areas of the business. Unfortunately, this is no easy task...To be truly data driven, companies need to build three capabilities: data strategy, data governance and data analytics...

How I Got a Job at DeepMind as a Research Engineer (without a Machine Learning Degree!)
I recently landed a job at DeepMind as a Research Engineer! It’s a dream come true for me, I still can’t believe it! (if you feel like an imposter sometimes, trust me, you’re not alone…)..I don’t have a Ph.D. in ML...I don’t have a master's in ML...In fact, I don’t have any type of degree in ML...So how in the world did I pull it off? (hint: I’m not that smart, you can do it!)...In this blog, I’ll try to tell you the whole story. I’ll be very transparent in order to help you...out as much as possible.

Differentiable biology: using deep learning for biophysics-based and data-driven modeling of molecular mechanisms
In this Perspective, we describe an emerging ‘differentiable biology’ in which phenomena ranging from the small and specific (for example, one experimental assay) to the broad and complex (for example, protein folding) can be modeled effectively and efficiently, often by exploiting knowledge about basic natural phenomena to overcome the limitations of sparse, incomplete and noisy data. By distilling differentiable biology into a small set of conceptual primitives and illustrative vignettes, we show how it can help to address long-standing challenges in integrating multimodal data from diverse experiments across biological scales. This promises to benefit fields as diverse as biophysics and functional genomics...

Robservable: Use of Observable notebooks (or parts of them) as htmlwidgets in R
This package allows the use of Observable notebooks (or parts of them) as htmlwidgets in R...Note that it is not an iframe embedding a whole notebook – cells are <div> included directly in your document or application. You can choose what cells to display, update cell values from R, and add observers to cells to get their values back into a Shiny application...The following GIF shows a quick example of reusing a bar chart race notebook inside R with our own data...

Deep Neural Networks and Tabular Data: A Survey
This work provides an overview of state-of-the-art deep learning methods for tabular data. We start by categorizing them into three groups: data transformations, specialized architectures, and regularization models. We then provide a comprehensive overview of the main approaches in each group. A discussion of deep learning approaches for generating tabular data is complemented by strategies for explaining deep models on tabular data. Our primary contribution is to address the main research streams and existing methodologies in this area, while highlighting relevant challenges and open research questions...

Amazon launches new Alexa Prize SimBot Challenge
Human-robot interaction has long been investigated within the field of artificial intelligence, including using dialogue as the interaction mechanism for completing tasks. The SimBot Challenge will focus on navigation, object manipulation, and machine perception and reasoning within a virtual world...

Simulation-based Bayesian inference for multi-fingered robotic grasping
Multi-fingered robotic grasping is an undeniable stepping stone to universal picking and dexterous manipulation. Yet, multi-fingered grippers remain challenging to control because of their rich nonsmooth contact dynamics or because of sensor noise. In this work, we aim to plan hand configurations by performing Bayesian posterior inference through the full stochastic forward simulation of the robot in its environment, hence robustly accounting for many of the uncertainties in the system...

Exploring the Limits of Large Scale Pre-training
Recent developments in large-scale machine learning suggest that by scaling up data, model size and training time properly, one might observe that improvements in pre-training would transfer favorably to most downstream tasks. In this work, we systematically study this phenomena and establish that, as we increase the upstream accuracy, the performance of downstream tasks saturates. In particular, we investigate more than 4800 experiments on Vision Transformers, MLP-Mixers and ResNets with number of parameters ranging from ten million to ten billion, trained on the largest scale of available image data (JFT, ImageNet21K) and evaluated on more than 20 downstream image recognition tasks...

Rules-based labelling tool for NLP
After working in ML for more than a decade, I became frustrated over time with the lack of tools to create baselines using simple rules and heuristics. It is well known that most business problems can achieve decent baselines using only heuristics. So this is why I have just open-sourced DataQA, a rules-based labelling tool for NLP...

‍

Training

‍

Join Impact 2021 on November 3, 2021: The First-Ever Data Observability Summit. Join Today's Leading Data Pioneers. Hear from data leaders pioneering the technologies & processes shaping data engineering. Featuring First Chief Data Scientist of the U.S., founder of the Data Mesh and many more! Get Your Free Ticket ... *Sponsored post. If you want to be featured here, or as our main sponsor, contact us!

‍

Jobs

‍

Entry Level Data Scientist: 2022 - IBM - Multiple Locations As a Data Scientist at IBM, you will help transform our clients’ data into tangible business value by analyzing information, communicating outcomes and collaborating on product development. Work with Best in Class open source and visual tools, along with the most flexible and scalable deployment options. Whether it’s investigating patient trends or weather patterns, you will work to solve real world problems for the industries transforming how we live.

Want to post a job here? Email us for details >> team@datascienceweekly.org

‍

Training & Resources

‍

Clustering with Scikit-Learn in Python
This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithms to textual data in order to discover thematic groups. After finishing this tutorial, you will be able to use clustering in Python with Scikit-learn applied to your own data...

FedJAX: Federated Learning Simulation with JAX
we are excited to introduce FedJAX, a JAX-based open source library for federated learning simulations that emphasizes ease-of-use in research...In this post we discuss the library structure and contents of FedJAX. We demonstrate that on TPUs FedJAX can be used to train models with federated averaging on the EMNIST dataset in a few minutes, and the Stack Overflow dataset in roughly an hour with standard hyperparameters...

How to Easily Draw Neural Network Architecture Diagrams
Instead of explaining the model in words, diagram visualizations are way more effective in presenting and describing a neural network’s architecture...We have probably written enough code for the rest of the year, so let’s take a look at a simple no-code tool for drawing custom architecture diagrams — diagrams.net (formerly known as draw.io)...

‍

Books

‍

Hands-On Machine Learning with scikit-learn and Scientific Python Toolkits
Integrate scikit-learn with various tools such as NumPy, pandas, imbalanced-learn, and scikit-surprise and use it to solve real-world machine learning problems...

For a detailed list of books covering Data Science, Machine Learning, AI and associated programming languages check out our resources page...

P.S., Enjoy the newsletter? Please forward it to your friends and colleagues - we'd love to have them onboard :) All the best, Hannah & Sebastian

‍