Open in app

Sign In

Write

Sign In

Assaf Pinhasi
Assaf Pinhasi

365 Followers

Home

About

Published in

Feature Stores for ML

·Dec 8, 2022

Feature pipelines and feature stores — deep dive into system engineering and analytical tradeoffs

Introduction Feature calculation and serving are the heart of every classical machine learning system. The goal of these components is to capture analytical signals in the data, and reduce them to the form of features which allow the model to predict the outcome accurately. Calculating and serving features at “descent” scale…

Mlops

17 min read

Feature pipelines and feature stores — deep dive into system engineering and analytical tradeoffs
Feature pipelines and feature stores — deep dive into system engineering and analytical tradeoffs
Mlops

17 min read


Nov 23, 2022

What I’ve learnt about deep work after years of being distracted

What is deep work? Cal Newport, a computer science professor at Georgetown University, defines deep work as: “… a professional activity performed in a state of distraction-free concentration... These efforts create new value, improve your skill, and are hard to replicate.” If you want to do deep work, you need to focus for long…

Productivity

9 min read

What I’ve learnt about deep work after years of being distracted
What I’ve learnt about deep work after years of being distracted
Productivity

9 min read


Published in

Towards Data Science

·Nov 15, 2022

Building Spark Data Pipelines in the Cloud —What You Need to Get Started

Common engineering challenges and recipes for solutions — Introduction Over the last ten years or so, authoring and executing Spark jobs has become considerably simpler, mainly thanks to: High level APIs — which make it easier to express logic. Managed cloud-based platforms — highly scalable object storage and one click ephemeral clusters based on spot instances make it infinitely…

Data Engineering

20 min read

Spark data pipelines in the cloud
Spark data pipelines in the cloud
Data Engineering

20 min read


Published in

Towards Data Science

·Oct 20, 2022

From Raw Videos to GAN Training

Implementing a data pipeline and a lightweight Deep Learning data lake using ClearML on AWS — Introduction Hour One is an AI-centric start-up, and its main product transforms text into videos of virtual human presenters. Generating realistic, smooth, and compelling videos of human presenters speaking and gesturing in multiple languages based on text alone is a challenging task, that requires training complex Deep Learning models — and…

Deep Learning

15 min read

From raw videos to GAN training — implementing a data pipeline and a lightweight Deep Learning…
From raw videos to GAN training — implementing a data pipeline and a lightweight Deep Learning…
Deep Learning

15 min read


Published in

Towards Data Science

·Jun 10, 2022

Deep Lake — an architectural blueprint for managing Deep Learning data at scale — part I

Introduction In the past few years, machine learning data management practices have evolved dramatically, with the introduction of new design patterns and tools such as feature stores, data and model monitoring practices, and feature generation frameworks. Most advances in data management for machine learning are focused on classical (feature-based) data, and…

Deep Learning

15 min read

Deep Lake — an architectural blueprint for managing Deep Learning data at scale — part I
Deep Lake — an architectural blueprint for managing Deep Learning data at scale — part I
Deep Learning

15 min read


Dec 26, 2021

What I’ve learnt from two years of being an expert hands-on technology consultant

What led me here? A couple of years ago, I decided I needed a change. I was wrapping up 2.5 years as a VP of R&D at a medical imaging startup, building deep-learning-based solutions for the radiology domain. I was leading a group of 35 engineers and DL researchers, in a very complex and…

Consulting

14 min read

What I’ve learnt from two years of being an expert hands-on technology consultant
What I’ve learnt from two years of being an expert hands-on technology consultant
Consulting

14 min read


Published in

Towards Data Science

·Jul 26, 2021

How to run CPU-based Workloads for Deep Learning Using Thousands Of Spot Instances on AWS and GCP Without Getting a Headache

Deep learning is notorious for consuming large amounts of GPU resources during training. However, there are multiple parts within the Deep Learning workflow that require large amounts of CPU resources : Running large scale inference jobs Pre-processing input data — and materializing it on disk as preparation for training These…

Deep Learning

10 min read

How to run CPU-based Workloads for Deep Learning Using Thousands Of Spot Instances on AWS and GCP…
How to run CPU-based Workloads for Deep Learning Using Thousands Of Spot Instances on AWS and GCP…
Deep Learning

10 min read


Published in

PyTorch

·Jun 29, 2021

A Step by Step Guide to Building A Distributed, Spot-based Training Platform on AWS Using TorchElastic and Kubernetes

This is part II of a two-part series, describing our solution for running distributed training on spot instances using TorchElastic and Kubernetes. Part I introduced our overall technology selection, design principles and benchmarks. In Part II, we will walk you through the process of creating a simplified version of the…

Pytorch

10 min read

A Step by Step Guide to Building A Distributed, Spot-based Training Platform on AWS Using…
A Step by Step Guide to Building A Distributed, Spot-based Training Platform on AWS Using…
Pytorch

10 min read


Published in

PyTorch

·Jun 17, 2021

How 3DFY.ai Built a Multi-Cloud, Distributed Training Platform Over Spot Instances with TorchElastic and Kubernetes

Deep Learning development is becoming more and more about minimizing the time from idea to trained model. To shorten this lead time, researchers need access to a training environment that supports running multiple experiments concurrently, each utilizing several GPUs. Until recently, training environments with tens or hundreds of GPUs were…

Pytorch

9 min read

How 3DFY.AI Built a Multi-Cloud, Distributed Training Platform Over Spot Instances with…
How 3DFY.AI Built a Multi-Cloud, Distributed Training Platform Over Spot Instances with…
Pytorch

9 min read


May 26, 2021

How to convert a PyTorch DataParallel project to use DistributedDataParallel
122
1

Omri Bar

Great writeup, thanks for sharing.

Great writeup, thanks for sharing. I am under the impression that the backwards pass is in fact the synchronization barrier. When your "main" worker is busy doing stuff like running the validation or writing checkpoints, for example at the end of the nth epoch, the other workers already start the next epoch. The other workers will complete exactly one mini-batch, and then get "stuck" since they will wait for the "main" worker to exchange gradients with them.

1 min read

1 min read

Assaf Pinhasi

Assaf Pinhasi

365 Followers

Machine Learning and Engineering Leader and consultant. https://www.linkedin.com/in/assafpinhasi

Following
  • Ben Rogojan

    Ben Rogojan

  • Cassie Kozyrkov

    Cassie Kozyrkov

  • Alberto Romero

    Alberto Romero

  • Anna Geller

    Anna Geller

  • Ori Cohen

    Ori Cohen

See all (27)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech

Teams