TDS Newsletter: How to Keep LLMs Effective and Reliable Over Time

On the nitty-gritty details of evaluations, guardrails, and ongoing optimization

Oct 9, 2025

4 min read

Photo by Patricia Serna via Unsplash

Never miss a new edition of The Variable, our weekly newsletter featuring a top-notch selection of editors’ picks, deep dives, community news, and more.

Subscribe today

Those of you who’ve worked with LLM-powered applications know this: by now, building and deploying these tools is (relatively) straightforward, but maintaining their reliability and long-term value for the organization is not.

There’s no magic solution to this challenge, but several approaches have emerged to make your life as data and ML professionals easier. Our weekly highlights zoom in on the nitty-gritty details of evaluations, guardrails, and ongoing optimization, so if you’d like to expand your LLM know-how and be more effective in your role — read on.

AI Engineering and Evals as New Layers of Software Work

Clara Chong‘s compelling premise is that “the real work is about solving business problems with the tools we already have.” She unpacks AI’s impact on tech workers’ daily rhythms: writing code might have become a lot easier (or at least faster), but ensuring it follows the best practices of eval-driven development introduces several layers of complexity into your projects.

AI Engineering and Evals as New Layers of Software Work

Notes on LLM Evaluation

If you’re ready to dig deeper into the intricacies of evals, Felipe Adachi recently shared a comprehensive, step-by-step guide to the components that make up a robust pipeline. It zooms in on data preparation, the choices you might face along the way, and the adjustments you’ll need to implement once the results are in.

Notes on LLM Evaluation

RAG Explained: Reranking for Better Answers

Retrieval-augmented generation is a technique for improving LLM performance, but it, too, often requires fine-tuning and optimization. Maria Mouschoutzi introduces us to reranking and its potential to boost LLM outputs’ relevance.

RAG Explained: Reranking for Better Answers

Introducing the AI-3P Assessment Framework: Score AI Projects Before Committing Resources

Sometimes, tweaking a tool post-deployment might just be too little, too late. Marina Tosic presents a novel framework to help you avoid that fate by focusing on projects that are likelier to succeed.

Introducing the AI-3P Assessment Framework: Score AI Projects Before Committing Resources

This Week’s Most-Read Stories

From DataViz basics to AI agents, here are the recent articles that resonated the most with our audience.

How to Build Effective Agentic Systems with LangGraph, by Eivind Kjosbakken

How to Build Effective Agentic Systems with LangGraph

Data Visualization Explained (Part 2): An Introduction to Visual Variables, by Murtaza Ali

Data Visualization Explained (Part 2): An Introduction to Visual Variables

MCP in Practice, by Sruly Rosenblat, Ilan Strauss, Isobel Moure, and Tim O’Reilly

MCP in Practice

Meet Our New Authors

We hope you take the time to explore the excellent work from the latest cohort of TDS contributors:

Nidhin Karunakaran Ponon offers practical insights on guardrails for your AI applications (and how to create them).

How To Build Effective Technical Guardrails for AI Applications

Kenneth McCarthy charts the visual “fingerprints” of 20 languages with the help of basic statistics.

What Makes a Language Look Like Itself?

Ankit Singh Chauhan published a lucid writeup of recent research that promises “a smarter way to scale reasoning tasks without wasting a massive amount of computation.”

Smarter, Not Harder: How AI’s Self-Doubt Unlocks Peak Performance

We love publishing articles from new authors, so if you’ve recently written an interesting project walkthrough, tutorial, or theoretical reflection on any of our core topics, why not share it with us?