"Systems thinking helps me put the big picture front and center"

In the Author Spotlight series, TDS Editors chat with members of our community about their career path in data science and AI, their writing, and their sources of inspiration. Today, we’re thrilled to share our conversation with Shuai Guo.

Shuai is an industrial AI researcher working with physics, data, and machine learning to solve real-world problems in engineering, security, and intelligent systems. He holds a PhD at the intersection of computational mechanics and machine learning. His work spans various topics, including anomaly detection, digital twins, physics-informed learning, and LLM/agentic applications.

Your LangGraph piece walks the reader through the process of building a deep research agent. When you actually tried it end-to-end, what surprised you the most, and what would you do differently next time?

I would say what surprised me the most was how easily the deep research agent can make mistakes when running it end-to-end. That whole “generate query → search → reflect → repeat” loop looks great on paper, but it falls apart pretty fast. There are two main issues I remember clearly. First, from time to time, the agent starts mixing up what it found with what it remembers from pre-training. This is not ideal, as I only want the LLMs to synthesize information and identify knowledge gaps, while fully relying on the web search to ground the answer.

Another issue that constantly gives me headaches is information contamination, i.e. when search brings back similar stuff but the model treats it like it’s exactly what you asked for. For example, I once tested the deep research agent by researching a specific bug report (say, issue #4521 of a codebase), and the search would return content related to issue #4522 and start mixing in their symptoms like they’re all the same problem.

Beyond these two main issues, I also experienced challenges in handling conflicting information and determining sufficiency for terminating the deep research. None of those problems can be solved by simply adding more search results or running more iterations.

The key realization for me is that guardrails are as critical, if not more so, than the agent architecture, if we want to go beyond “just a demo” and build a system that actually works in production. I think the mindset of “test-driven development” fits nicely here: define what “good” looks like before you build. Next time, I’d start by defining clear rules, and then build the agent architecture around those constraints.

You’ve written that analytical AI (SQL/BI + classical ML) isn’t going away just because agents are hot. If you were designing a modern data stack today, what work would you give to agents and what would you keep in the analytics lane?

Analytical AI is reproducible and numerically precise. LLM-based agents, on the other hand, are good at digesting unstructured context, translating results, and communicating with people. For allocating tasks between analytical AI and agentic AI, I would say if a task is more quantitatively geared, I would default to analytical AI; but if it’s more qualitatively geared, e.g., synthesis, storytelling, or judgment, I would consider LLM/agents as better alternatives.

We can consider a concrete problem of building a customer churn prediction system. On a high level, it usually involves two steps: identifying the at-risk customers, and acting on them. For the first step of flagging at-risk customers, I would lean on analytical AI to engineer informative features, train gradient boosting models on historical behavioral data, and use the trained models to calculate churn propensity scores. In addition, I would also run a SHAP analysis to get feature importance scores for explaining the prediction. Every step is precise and reproducible, and there are a ton of best practices available for getting accurate and reliable results.

But then comes the fun part: what do you actually do with those predictions? This is where the LLM-based agents can take over. They can draft personalized retention emails by pulling in the customer’s history, maybe suggest relevant product features they haven’t tried yet, and adjust the tone based on how their past support tickets went. There is no math here. Just speaking in a contextually smart way.

What’s one skill you invested in early that now gives you an advantage as AI tools get more capable?

Systems thinking.

To me, systems thinking is basically asking how to decompose systems into components. How do different components talk to each other? What are the handoff points? Where are the feedback loops? If I touch this, what else changes?

I picked this up at university. I majored in aerospace engineering with a focus on aero-engine design. The thing about jet engines is that everything affects everything, and studying it really helped me develop three habits: decompose the system, define clean interfaces, and always look out for coupling effects.

It is true that AI tools are getting more capable, e.g. we got better coding assistants, more effective RAG pipelines, or LLMs that can handle longer context, but most of the advancements happen in narrow slices. Instead of always chasing the hottest tool and trying to incorporate it somehow in my existing work, systems thinking helps me put the big picture front and center. For an LLM application, I would always start by sketching the components, determining interaction and inputs/outputs between the components, making sure checks and guardrails are added, and then swapping components as tools improve.

In fact, building LLM applications reminds me a lot of designing jet engines: new technology comes and goes, but a solid system design compounds value.

If you zoom out, what part of data science or AI is changing too fast right now, and what part isn’t changing fast enough?

I think multi-agent AI systems are definitely one of the hottest fields that are moving very fast. We see fancy demos (be it coding assistant or research assistant) every now and then. New open-sourced frameworks that enable developers to efficiently build their own multi-agent applications also pop up constantly. All of this is exciting. But here is the thing: are we pushing out these complicated systems way faster than we understand how they’ll actually behave in practice?

That’s where I see the gap: the whole “assurance” layer around those multi-agent systems isn’t evolving fast enough. To address this challenge, we can (and probably should) treat those multi-agent systems just like any other industrial system. In the manufacturing industry, it is a common practice to adopt data-driven approaches to assist system design, control, condition monitoring, and fault analysis. This same approach could benefit multi-agent systems as well. For instance, how about we use Bayesian optimization to design the multi-agent architecture? How about using ML-based anomaly detection to monitor the agents’ performance and catch security threats?

The good news is there’s momentum building. We’re seeing observability platforms for LLMs, evaluation frameworks, etc., and they’re laying the groundwork for applying those industrial-grade, data-driven methods. I see a lot of opportunities in this space and that’s what gets me excited: the chance to bring the rigor of industrial systems to agentic AI and make those tools reliable and trustworthy.

To learn more about Shuai‘s work and stay up-to-date with his latest articles, you can follow him on TDS or LinkedIn.