Chaim Rand, Author at Towards Data Science https://towardsdatascience.com Publish AI, ML & data-science insights to a global community of data professionals. Wed, 10 Dec 2025 12:00:00 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.3 https://towardsdatascience.com/wp-content/uploads/2025/02/cropped-Favicon-32x32.png Chaim Rand, Author at Towards Data Science https://towardsdatascience.com 32 32 Optimizing PyTorch Model Inference on AWS Graviton https://towardsdatascience.com/optimizing-pytorch-model-inference-on-aws-graviton/ Wed, 10 Dec 2025 12:00:00 +0000 https://towardsdatascience.com/?p=607814 Tips for accelerating AI/ML on CPU — Part 2

The post Optimizing PyTorch Model Inference on AWS Graviton appeared first on Towards Data Science.

]]>
Optimizing PyTorch Model Inference on CPU https://towardsdatascience.com/optimizing-pytorch-model-inference-on-cpu/ Mon, 08 Dec 2025 12:00:00 +0000 https://towardsdatascience.com/?p=607812 Flyin’ Like a Lion on Intel Xeon

The post Optimizing PyTorch Model Inference on CPU appeared first on Towards Data Science.

]]>
On the Challenge of Converting TensorFlow Models to PyTorch https://towardsdatascience.com/on-the-challenge-of-converting-tensorflow-models-to-pytorch/ Fri, 05 Dec 2025 12:30:00 +0000 https://towardsdatascience.com/?p=607800 How to upgrade and optimize legacy AI/ML models

The post On the Challenge of Converting TensorFlow Models to PyTorch appeared first on Towards Data Science.

]]>
Overcoming the Hidden Performance Traps of Variable-Shaped Tensors: Efficient Data Sampling in PyTorch https://towardsdatascience.com/overcoming-the-hidden-performance-traps-of-variable-shaped-tensors-efficient-data-sampling-in-pytorch/ Wed, 03 Dec 2025 17:00:00 +0000 https://towardsdatascience.com/?p=607796 PyTorch Model Performance Analysis and Optimization — Part 11

The post Overcoming the Hidden Performance Traps of Variable-Shaped Tensors: Efficient Data Sampling in PyTorch appeared first on Towards Data Science.

]]>
Capturing and Deploying PyTorch Models with torch.export https://towardsdatascience.com/capturing-and-deploying-pytorch-models-with-torch-export/ Wed, 20 Aug 2025 01:20:00 +0000 https://towardsdatascience.com/?p=606888 A demonstration of PyTorch’s exciting new export feature on a HuggingFace model

The post Capturing and Deploying PyTorch Models with torch.export appeared first on Towards Data Science.

]]>
Maximizing AI/ML Model Performance with PyTorch Compilation https://towardsdatascience.com/maximizing-ai-ml-model-performance-with-pytorch-compilation/ Mon, 18 Aug 2025 18:31:20 +0000 https://towardsdatascience.com/?p=606877 Since its inception in PyTorch 2.0 in March 2023, the evolution of torch.compile has been one of the most exciting things to follow. Given that PyTorch’s popularity was due to its “Pythonic” nature, its ease of use, and its line-by-line (a.k.a., eager) execution, the success of a just-in-time (JIT) graph compilation mode should not have been taken […]

The post Maximizing AI/ML Model Performance with PyTorch Compilation appeared first on Towards Data Science.

]]>
The Crucial Role of NUMA Awareness in High-Performance Deep Learning https://towardsdatascience.com/the-crucial-role-of-numa-awareness-in-high-performance-deep-learning/ Thu, 10 Jul 2025 05:18:59 +0000 https://towardsdatascience.com/?p=606544 PyTorch model performance analysis and optimization — Part 10

The post The Crucial Role of NUMA Awareness in High-Performance Deep Learning appeared first on Towards Data Science.

]]>
Pipelining AI/ML Training Workloads with CUDA Streams https://towardsdatascience.com/pipelining-ai-ml-training-workloads-with-cuda-streams/ Thu, 26 Jun 2025 20:15:54 +0000 https://towardsdatascience.com/?p=606438 PyTorch Model Performance Analysis and Optimization — Part 9

The post Pipelining AI/ML Training Workloads with CUDA Streams appeared first on Towards Data Science.

]]>
A Caching Strategy for Identifying Bottlenecks on the Data Input Pipeline https://towardsdatascience.com/a-caching-strategy-for-identifying-bottlenecks-on-the-data-input-pipeline/ Thu, 26 Jun 2025 18:37:50 +0000 https://towardsdatascience.com/?p=606434 PyTorch model performance analysis and optimization — Part 8

The post A Caching Strategy for Identifying Bottlenecks on the Data Input Pipeline appeared first on Towards Data Science.

]]>
The Case for Centralized AI Model Inference Serving https://towardsdatascience.com/the-case-for-centralized-ai-model-inference-serving/ Wed, 02 Apr 2025 01:52:26 +0000 https://towardsdatascience.com/?p=605383 Optimizing highly parallel AI algorithm execution

The post The Case for Centralized AI Model Inference Serving appeared first on Towards Data Science.

]]>
Debugging the Dreaded NaN https://towardsdatascience.com/debugging-the-dreaded-nan/ Thu, 27 Feb 2025 21:52:06 +0000 https://towardsdatascience.com/?p=598513 Capturing and reproducing failures in PyTorch training with Lightning

The post Debugging the Dreaded NaN appeared first on Towards Data Science.

]]>
Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetrics https://towardsdatascience.com/efficient-metric-collection-in-pytorch-avoiding-the-performance-pitfalls-of-torchmetrics/ Fri, 07 Feb 2025 01:22:43 +0000 https://towardsdatascience.com/?p=597508 Metric collection is an essential part of every machine learning project, enabling us to track model performance and monitor training progress. Ideally, metrics should be collected and computed without introducing any additional overhead to the training process. However, just like other components of the training loop, inefficient metric computation can introduce unnecessary overhead, increase training-step […]

The post Efficient Metric Collection in PyTorch: Avoiding the Performance Pitfalls of TorchMetrics appeared first on Towards Data Science.

]]>
Optimizing Transformer Models for Variable-Length Input Sequences https://towardsdatascience.com/optimizing-transformer-models-for-variable-length-input-sequences-19fb88fddf71/ Tue, 26 Nov 2024 14:45:19 +0000 https://towardsdatascience.com/optimizing-transformer-models-for-variable-length-input-sequences-19fb88fddf71/ How PyTorch NestedTensors, FlashAttention2, and xFormers can Boost Performance and Reduce AI Costs

The post Optimizing Transformer Models for Variable-Length Input Sequences appeared first on Towards Data Science.

]]>
Increasing Transformer Model Efficiency Through Attention Layer Optimization https://towardsdatascience.com/increasing-transformer-model-efficiency-through-attention-layer-optimization-fefa6f87b1d6/ Mon, 18 Nov 2024 20:16:54 +0000 https://towardsdatascience.com/increasing-transformer-model-efficiency-through-attention-layer-optimization-fefa6f87b1d6/ How paying "better" attention can drive ML cost savings

The post Increasing Transformer Model Efficiency Through Attention Layer Optimization appeared first on Towards Data Science.

]]>
On the Programmability of AWS Trainium and Inferentia https://towardsdatascience.com/on-the-programmability-of-aws-trainium-and-inferentia-cd455826e26c/ Fri, 01 Nov 2024 08:17:22 +0000 https://towardsdatascience.com/on-the-programmability-of-aws-trainium-and-inferentia-cd455826e26c/ Accelerating AI/ML Model Training with Custom Operators - Part 4

The post On the Programmability of AWS Trainium and Inferentia appeared first on Towards Data Science.

]]>
AI Model Optimization on AWS Inferentia and Trainium https://towardsdatascience.com/ai-model-optimization-on-aws-inferentia-and-trainium-cfd48e85d5ac/ Sun, 20 Oct 2024 07:19:11 +0000 https://towardsdatascience.com/ai-model-optimization-on-aws-inferentia-and-trainium-cfd48e85d5ac/ Tips for accelerating ML with AWS Neuron SDK

The post AI Model Optimization on AWS Inferentia and Trainium appeared first on Towards Data Science.

]]>
Implementing Sequential Algorithms on TPU https://towardsdatascience.com/implementing-sequential-algorithms-on-tpu-41d75c6aaa95/ Mon, 07 Oct 2024 20:50:21 +0000 https://towardsdatascience.com/implementing-sequential-algorithms-on-tpu-41d75c6aaa95/ Accelerating AI/ML Model Training with Custom Operators - Part 3.A

The post Implementing Sequential Algorithms on TPU appeared first on Towards Data Science.

]]>
The Rise of Pallas: Unlocking TPU Potential with Custom Kernels https://towardsdatascience.com/the-rise-of-pallas-unlocking-tpu-potential-with-custom-kernels-67be10ab846a/ Sun, 06 Oct 2024 09:16:53 +0000 https://towardsdatascience.com/the-rise-of-pallas-unlocking-tpu-potential-with-custom-kernels-67be10ab846a/ Accelerating AI/ML Model Training with Custom Operators - Part 3

The post The Rise of Pallas: Unlocking TPU Potential with Custom Kernels appeared first on Towards Data Science.

]]>
Training AI Models on CPU https://towardsdatascience.com/training-ai-models-on-cpu-3903adc9f388/ Sun, 01 Sep 2024 18:59:40 +0000 https://towardsdatascience.com/training-ai-models-on-cpu-3903adc9f388/ Revisiting CPU for ML in an Era of GPU Scarcity

The post Training AI Models on CPU appeared first on Towards Data Science.

]]>
Unleashing the Power of Triton: Mastering GPU Kernel Optimization in Python https://towardsdatascience.com/unleashing-the-power-of-triton-mastering-gpu-kernel-optimization-in-python-160a3f52701e/ Tue, 13 Aug 2024 08:04:16 +0000 https://towardsdatascience.com/unleashing-the-power-of-triton-mastering-gpu-kernel-optimization-in-python-160a3f52701e/ Accelerating AI/ML Model Training with Custom Operators - Part 2

The post Unleashing the Power of Triton: Mastering GPU Kernel Optimization in Python appeared first on Towards Data Science.

]]>