Chaim Rand, Author at Towards Data Science

Optimizing PyTorch Model Inference on AWS Graviton

Chaim Rand — Wed, 10 Dec 2025 12:00:00 +0000

Tips for accelerating AI/ML on CPU — Part 2

The post Optimizing PyTorch Model Inference on AWS Graviton appeared first on Towards Data Science.

Optimizing PyTorch Model Inference on CPU

Chaim Rand — Mon, 08 Dec 2025 12:00:00 +0000

Flyin’ Like a Lion on Intel Xeon

The post Optimizing PyTorch Model Inference on CPU appeared first on Towards Data Science.

On the Challenge of Converting TensorFlow Models to PyTorch

Chaim Rand — Fri, 05 Dec 2025 12:30:00 +0000

How to upgrade and optimize legacy AI/ML models

The post On the Challenge of Converting TensorFlow Models to PyTorch appeared first on Towards Data Science.

Overcoming the Hidden Performance Traps of Variable-Shaped Tensors: Efficient Data Sampling in PyTorch

Chaim Rand — Wed, 03 Dec 2025 17:00:00 +0000

PyTorch Model Performance Analysis and Optimization — Part 11

The post Overcoming the Hidden Performance Traps of Variable-Shaped Tensors: Efficient Data Sampling in PyTorch appeared first on Towards Data Science.

Capturing and Deploying PyTorch Models with torch.export

Chaim Rand — Wed, 20 Aug 2025 01:20:00 +0000

A demonstration of PyTorch’s exciting new export feature on a HuggingFace model

The post Capturing and Deploying PyTorch Models with torch.export appeared first on Towards Data Science.

Maximizing AI/ML Model Performance with PyTorch Compilation

Chaim Rand — Mon, 18 Aug 2025 18:31:20 +0000

Since its inception in PyTorch 2.0 in March 2023, the evolution of torch.compile has been one of the most exciting things to follow. Given that PyTorch’s popularity was due to its “Pythonic” nature, its ease of use, and its line-by-line (a.k.a., eager) execution, the success of a just-in-time (JIT) graph compilation mode should not have been taken […]

The post Maximizing AI/ML Model Performance with PyTorch Compilation appeared first on Towards Data Science.

The Crucial Role of NUMA Awareness in High-Performance Deep Learning

Chaim Rand — Thu, 10 Jul 2025 05:18:59 +0000

PyTorch model performance analysis and optimization — Part 10

The post The Crucial Role of NUMA Awareness in High-Performance Deep Learning appeared first on Towards Data Science.

Pipelining AI/ML Training Workloads with CUDA Streams

Chaim Rand — Thu, 26 Jun 2025 20:15:54 +0000

PyTorch Model Performance Analysis and Optimization — Part 9

The post Pipelining AI/ML Training Workloads with CUDA Streams appeared first on Towards Data Science.

A Caching Strategy for Identifying Bottlenecks on the Data Input Pipeline

Chaim Rand — Thu, 26 Jun 2025 18:37:50 +0000

PyTorch model performance analysis and optimization — Part 8

The post A Caching Strategy for Identifying Bottlenecks on the Data Input Pipeline appeared first on Towards Data Science.

The Case for Centralized AI Model Inference Serving

Chaim Rand — Wed, 02 Apr 2025 01:52:26 +0000

Optimizing highly parallel AI algorithm execution

The post The Case for Centralized AI Model Inference Serving appeared first on Towards Data Science.

Debugging the Dreaded NaN

Chaim Rand — Thu, 27 Feb 2025 21:52:06 +0000

Capturing and reproducing failures in PyTorch training with Lightning

The post Debugging the Dreaded NaN appeared first on Towards Data Science.

Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetrics

Chaim Rand — Fri, 07 Feb 2025 01:22:43 +0000

Metric collection is an essential part of every machine learning project, enabling us to track model performance and monitor training progress. Ideally, metrics should be collected and computed without introducing any additional overhead to the training process. However, just like other components of the training loop, inefficient metric computation can introduce unnecessary overhead, increase training-step […]

The post Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetrics appeared first on Towards Data Science.