Human Won’t Replace Python

“Very few people know Python. Everybody knows ‘Human’.”

– Jensen Huang, CEO of NVIDIA

Note: This article was co-authored with Eitan Wagner.

When AI Breaks the Language Barrier

This you know: the internet is full of statements like “coding is dead”, “AI is the new software engineer”, “software development will be obsolete by 2030”. Behind these predictions stands a captivating argument: that we are undergoing another iteration in the evolution of programming. Low-level languages, such as Assembly, gave way to higher-level languages, like C and Python. Since then, Python programmers have comfortably ignored the Assembly level. Similarly — so the argument goes — natural language can now replace the classic programming language and become the tool for building software. Furthermore, once the shift to natural language is complete, we will produce amazing production-level software products while blissfully unaware of the underlying layers of “classic” code.

At first glance, this argument has merit, especially given the trajectory of the historical precedents it is inspired by. Programming languages have been moving up the degree of abstraction and expressibility for decades now. It only makes sense to follow this trend to its natural culmination, to reach the top of the abstraction hierarchy: human language. Additionally, language is just a vessel for ideas, isn’t it? So, as long as the ideas can be expressed, the specific language we use seems like a pesky detail that does not matter much.

Placing the ideas in the center and language as little more than a technical tool to express them lends itself to restating the previous argument, thus: People have always had amazing and creative ideas for new products, but till recently, they could communicate them to a computer only by stating them in the computer’s own language. Programmers, in this view, were polyglots who knew languages that the average person did not, and this was their superpower. Only they knew how to coax the computers into doing their bidding, akin to wizards who know the secret intricate phrasing that can bring the untamed elements under control. Today, however, computers have advanced and can comprehend our human language, and thus a new era has begun in which everyone can build software without having to learn a special language. Furthermore, this will make programming languages redundant for (almost) everyone, not just these newcomers. Python and Java will follow the way of Assembly and Machine Code, as they will have little to no practical advantage over natural language.

These were the exact sentiments expressed by Jensen Huang, CEO of NVIDIA, in London Tech back in June 2025:

“AI is the great equalizer. Let me explain why. For the last 50 years, 60 years, Computer Science became a field of science and it was available to tens of millions of people out of billions of people. This technology was hard to use. We had to learn programming languages, we had to architect it, we had to design these computers that are very complicated. Tens of millions of people were able to benefit from this particular field but now, all of a sudden, there’s a new programming language. This new programming language is called “human”. Anybody — most people don’t know C++, very few people know Python. Everybody knows “human”. The way you program a computer today [is] to ask the computer to do something for you, even write a program, generate images, write a poem. Just ask it nicely.”

Elegant and convincing as they may sound, theories and predictions must be analyzed with care. Does this claim or prediction hold water in practice? The evidence thus far is inconclusive. More and more code is written via AI Agents, more and more non-programmers are using Vibe Coding platforms (such as Base44) to create, and some companies are freezing plans for hiring engineers — but classic programming is still alive and kicking. Back in March 2025, Dario Amodei, the CEO of Anthropic, stated that:

“we are not far from a world — I think we will be there in 3–6 months — where AI is writing 90% of the code, and then in 12 months we might be in a world where AI is writing essentially all of the code”.

Yet seven months in (we write this in Oct 2025), it seems that human programmers are still some of the highest earners. There are indications that AI programming might not be as useful as some had hoped it would be. In a recent, much-discussed research paper by METR, it was found that AI slowed experienced programmers down rather than speed them up, and accept less than 50% of AI code generations. There are even sites dedicated to collecting AI horror stories, indicating how these agents are unreliable. When it comes to going all-in on coding in “Human”, a new type of job title is emerging that hints at trouble brewing there: the “Vibe Code Cleanup Specialist”. These are but a few of the indications that the road to ditching classic coding — if indeed we are on that road — is not a smooth one, at the very least.

AI Horror Stories: The monsters are indeed hiding in the closet.

How to make sense of these seemingly contradicting patterns, the clear power of AI agents versus their mixed success in the field? Being in the midst of a revolution is always a confusing time, since it’s hard to know what are passing trends and experiments that are doomed to fail, and what are temporary setbacks and teaching experiences that set the stage for the big shift once we work out the kinks.

What is needed at this time is a strong conceptual framework within which to analyze where we are and where we are going. In what follows, we attempt to present such a framework and use it to argue that programmers and programming languages are here to stay, and natural language is not the next step in the coding hierarchy.

The Critical Difference

Let us get right to the crux of the matter: The reason programming languages are here to stay is that they (unlike natural languages) are formal, and thus programs written in them constitute a sequence of fully-specified instructions.

When executing the command x = 1+2, x will always receive the value 3 after execution. The same applies to any command in any piece of software — there is no ambiguity about the intended behavior of the command. It is this property that allows us to trust software completely, to know that code that works today will work tomorrow, that code on one machine will behave the same on another machine, etc.

To be sure, the behavior of the computer is only fully specified at the level that the commands address. The command x = 1+2 specifies precisely what the value stored in “x” will be, but does not specify where in the physical memory this information is stored. Thus, such a command is fully specified at the level of interest to the programmer as stated in their command (summing 1 and 2 and storing the result in a location pointed to by the variable x), but under-specified with respect to other implementation details, which are delegated to the lower levels of programming and might behave differently under different system conditions (e.g., available memory addresses).

All this holds for programming languages, which are formal. An instruction in natural language, on the other hand, is inherently underspecified, even at the level of interest the instruction relates to. For example, if a woman asks her husband to “go get some milk from the supermarket”, the husband will naturally assume (most times) that the verb “get” means “purchase”, rather than “steal”. The point here is that the command (“get milk”) does not fully specify how the action should be executed, leaving it to the husband to fill in the gaps when performing the task.

This is a well-known and commonplace feature of language and human communication. Jokes are intended to be understood as such implicitly, and indeed, many times over-explaining and fully specifying the intent will ruin the humorous effect. The under-specificity of human statements is sometimes utilized masterfully, with different layers of meaning intended to different simultaneous listeners (as any parent who shares information with their spouse while the children are listening knows fully well). This also leads to frequent misunderstandings in our conversations, even when talking to people who share our cultural or occupational contexts. Certainly, in business and professional environments, misunderstandings are routine, e.g., every product manager knows how difficult it is to communicate specifications for a computer project unambiguously, because what one assumes to be obvious is not always what the other considers so – as this classic video demonstrated in a cute fashion:

In the world of AI-assisted programming, this issue is also well-known. Tell an AI you want the unit tests of your code to pass; they are just as likely to fix your code as they are to modify the tests. In another recent example, when OpenAI’s o1 model played chess against a chess engine (Stockfish), it decided to hack Stockfish and rewrite its code in order to win. Cases like this one are often hailed as examples of “Intelligence”, but on a more technical level, these behaviors are examples of underspecified natural language instructions. “The task is to ‘win against a powerful chess engine’ — not necessarily to win fairly in a chess game”, o1 wrote in its “private” scratchpad. It thus adopted one possible “fleshing out” of the underspecified guidelines. Whether this behavior was intended by the programmer is anyone’s guess (as indeed, one might argue that when one cheats, one does not “win”, highlighting again the underspecified nature of the instruction).¹

There is also the flip side to this feature of natural language in the context of LLMs. Given a target piece of code or a specific image, and a powerful LLM at your disposal, does a prompt that generates precisely that code or image exist? Also, a followup: assuming such a prompt exists, do we know how to reverse-engineer and find that prompt? The answer to both of these is likely to be negative, under reasonable assumptions². Natural language, therefore, seems unsuitable for the precise formulation of goals and tasks that programming languages were designed to perform.

Herein lies the rub: computers have been able to integrate into human society because they are predictable — programmers can state with confidence what the computer is told to do (or, if there is a mistake in a piece of code, the programmers can review, find, and fix those instructions to achieve such confidence). In the move from formal to non-formal languages as a mode of programming, we forever lose the certainty that instructions were defined tightly enough for the computer to act out our intent. We similarly lose control over the eventual alignment of the machine with our intent, as there is no guarantee that there is a command (e.g., prompt) that can capture our intent in a manner that the LLM will follow it.

Most importantly, this is an inherent property of the communication medium — natural language vs. formal language, which constitutes the input into the system. As a result, this limitation cannot be dealt with by any degree of improvement to the AI system itself, whether during training or inference. To be sure, providing more context and data can help narrow the range of uncertainty, but not transport us to a world of equal certainty and control to those of formal languages. Even in future GPT-17 or Claude-19.5 models, the input via natural language will be as underspecified as it is today.

Coding as Translation

“The hard thing about building software is deciding what to say, not saying it”

– Dr. Fredrick Brooks, “No Silver Bullet”

Having drawn a clear distinction between the two types of languages, we can now shed new light on what happens when we move from one to the other, and, most importantly, what happens when we offload this step from people (programmers) to computers (AI Agents).

Let us begin by considering programmers as translators: a special class of translators that translate from one type of language (human, natural, underspecified) to another (formal, fully specified). What can we learn from this analogy, from the challenges of translation in general, to our specific use case?

Translation is never as straightforward as it seems to the outsider. Different languages have different structures and conventions, making it difficult to achieve a perfect translation. Moving from English to French, for example, means moving from a genderless language to a gendered one, and in some cases, such a move will drastically modify the way a paragraph is read and received. Or consider translating the lyrics of a song, and the challenges this entails: maintaining rhythm, meaning, wordplay, cultural references, etc. All of these are non-trivial when crossing the inter-language barrier.

When facing these challenges, the translator is therefore not just converting the same meaning from one representation to another. Instead, there are choices they make, a hierarchy of importance constructed (consciously or subconsciously) between the different dimensions of meaning they are trying to preserve when moving to the new language. One translator might be doing so for a singer, and therefore emphasize matching the original tune, even at the cost of restructuring whole verses. Another might be doing so to assist non-native speakers in understanding the original lyrics, and therefore prioritize precise verbal translation even if the result does not remotely rhyme.

What is true for human languages is doubly so when translating instructions from English into a formal language, such as a computer program. The first issue facing the programmer is that of any translator: the move into the new language might not go smoothly. The new (programming) language might constrain the programmer in ways that the source language did not³. Similarly, phrases that can be stated simply in natural language might require complete restructuring in the target programming language, and vice versa. Programming languages have conventions and styles like any other language,⁴ and those unfamiliar with them will generate unreadable code riddled with catastrophic errors.

Furthermore, the move from an underspecified to a fully specified language forces the programmer/translator to reach greater clarity in their understanding of the task at hand. The process of spelling out in detail (=full specification) how various cases should be handled is not just one of writing out what is already known; it’s a process of discovering and uncovering all the hidden assumptions and ramifications that the under-specificity of the source language allowed to be masked.

It is important to be clear about this point: many times, what is uncovered by the process of coding is not “what was originally meant in the human spec”, but rather “what was left unstated in the human spec because it was never fully thought through”.⁵ To write in code, one must specify things that were not previously specified, and thus, this is a creative process, and one of discovery of the actual needs and direction that the coding should take.⁶

It is from this fact that we can finally realize the significant changes that take place whenever we offload this translation step from humans to machines:

First, when we let an AI write our code, we remove ourselves from the process of discovery and become oblivious to crucial aspects of the concrete product we have built. Decisions about tradeoffs of cost vs. speed vs. stability will take place without us knowing of them or even being aware that such a tradeoff was needed. Decisions about which parts of the code should be modular and which can be more rigid will also happen, once again, without us even knowing that a junction was reached and a turn taken. Importantly, much of this happens because by abdicating our roles as coders, we lack awareness of the finer details of the code, and thus can give a command that we think is fully specified, but is anything but that.
Second, unlike the human programmer, the AI bridges the gap between an underspecified instruction and a fully specified code by (educated) guessing. It generates the code randomly so that it aligns with what it saw during training for similar instructions. The key here is “randomly” — anything that is covered by the underspecified phrasing might emerge during that process if it has any support in the training data. While such code might be well-written in some basic technical sense, it will naturally come with side effects, some of which will be harmless, but others can have problematic unforeseen implications.

This statistically-guided code generation is fundamentally different from the discovery-guided process the human programmer undergoes. The decisions of the human programmer are intentional — they are acting while aware, in some sense, of how each line of code will impact the broader system they work within. This includes other parts of the product; non-coding stakeholders (manager, co-workers, investors, customers); and their own needs and desires (work-life balance, reputation, etc.). AI-coding agents lack all this context or set of goals, and thus cannot be on the lookout for stumbling blocks of these sorts that the vibe-coder might retroactively wish they knew about.

The bottom line is that there will always be a tradeoff: the more we leave unspecified in our prompts to an AI, the less production-ready our code will be. The more we allow the AI to make decisions for us that we didn’t know needed to be made (since we didn’t go through the process of discovery mentioned above), the more we will need to revisit those decisions before releasing a product and vouching for its reliability to the public – our designated users and paying customers.

Autonomy, Responsibility, Vibe Coding

We mentioned here the future users of what we code, and indeed they are a commonly-unaddressed component of the system and its dynamics. They inadvertently determine where we can use vibe-coding (= coding purely in “human”) in our development pipeline. AI agents that can act autonomously on our behalf are extremely powerful, and putting that power to use in the right manner can lead to fantastic results. But while both people and machines can be autonomous, only people can take responsibility for the code they create. As we will argue below, this dictates where vibe-coding in particular is a viable approach to coding.

What is “autonomy” (or “agency”)? Using the distinction we explored between the two types of languages, we believe it is possible to demystify the term and make it useful in a technical sense.⁷ Autonomy, in our view, is the ability of a device or computational entity to achieve a goal within a given space, where the instructions for acting (= the goal and “allowable” actions) were underspecified. Given a system with technical constraints (i.e., its physical and computational limitations), and assuming that the system is trained to follow user instructions, then the more the user leaves unsaid in their instructions, the more autonomy the system has. A chatbot instructed to “do good” has more autonomy than a chatbot told “do good by setting up an orphanage in NYC”, and even more than one told to “do good by setting up an orphanage in NYC following all legal codes, whether local, state, or federal”.⁸

Being autonomous in this sense says nothing, however, about what determined the actions of the AI. An AI is indeed autonomous in that it is able to take an under-specified command and move to a fully-specified one, but the manner in which it makes this transition is fully determined by its program, model, prompt, and random seed. It is the user, in this view, who made the choice to issue an under-specified set of instructions and hope the AI does not fill in the holes by an unexpected interpretation of what they instructed.⁹

In light of all the above, responsibility for anything an AI does falls squarely on the user’s shoulders. They must explain what assurances they had that the AI would not go, say, on a murderous rampage on their way to make a cup of coffee. The answer might be in the service agreement with the company that developed the AI, but of course, this simply shifts the demand of accountability to some other human entity, and never to the AI. The chain of responsibility linking back to a human or a group of humans never breaks.

The degree of responsibility and ramifications of failure is the key component determining where vibe-coding is adopted. PoCs, side projects, and exploratory code are all cases where it flourishes, since the user does not care about many aspects of the product being built. They want to get something basic working with some core logic, without having to think about many of the details (e.g, a backend developer wants a UI to call his API, not caring about color schemes, package support, web vs. mobile support, etc.). In these cases, anything reasonable goes, and responsibility is not important since nobody will be relying heavily on the system produced.

Furthermore, the fact that prototypes for ideas can be generated with such ease using vibe-coding can be a huge productivity amplifier. The reason for this stems directly from our analysis above: seeing a fully-specified instantiation of a statement in natural language can help developers, product managers, and customers clarify to themselves what it is they really want – it’s part of the exploration and discovery flow. Back in 1986, this was stated eloquently by Dr. Fredrick Brooks, in his paper “No Silver Bullet”:

“The hardest single part of building a software system is deciding precisely what to build… I would go a step further and assert that it is really impossible for a client, even working with a software engineer, to specify completely, precisely, and correctly the exact requirements of a modern software product before having built and tried some versions of the product he is specifying.

Therefore one of the most promising of the current technological efforts, and one which attacks the essence, not the accidents, of the software problem, is the development of approaches and tools for rapid prototyping of systems as part of the iterative specification of requirements.”

It is precisely here that coders are more than happy to leverage the autonomous nature of the AI coding agent, and in which we marvel at all the decisions we did not have to make to get something up and running. But to extrapolate from these cases out into the world of production-grade code with millions of dollars at stake would be a category error.

Conclusions

In this article we tried to lay out the claim that classical coding is not going to disappear any time soon, and will not be replaced by “human”. We argued that if we are to retain many of the properties we have today from software products, formal language must be the language we use to speak to computers. In our closing remarks, we would like to share some parting thoughts about the future of programming.

Earlier, we likened programmers to translators, but perhaps a better analogy is this: “Code is the bureaucracy of the world of procedural ideas, and programmers are the legislators that write it”. Implementing a socio-political idea in the real world requires breaking the idea down into bureaucratic processes that define responsibilities, resources, and measurements of the implementation. Programming, in a similar fashion, translates abstract technological and business proposals into concrete, measurable processes that can be realistically executed in an actual mechanical system with limited resources¹⁰. Without making this transition, a computer cannot act, just as a policy cannot take place in the real world just by a politician stating it — it has to be codified into law.

When a person learns to program, they indeed learn a new language (just as a new bank teller is required to learn the internal terminology of the banking industry), but the skills they develop on the job and which transfer from one programming language to the next, are rather different than language skills. These include (though are not limited to):

How to break down large problems into smaller, modular, and solvable sub-problems;
How to define software processes that can be executed, tracked, and debugged;
How to encapsulate different parts of a system, so that their inputs and outputs are fully-specified and can serve as interfaces/contracts vis-a-vis other components;

and so on¹¹. These are the deeper skills programmers bring to their job.

What has changed for coding to Production with AI coding agents? In our mind, the main change is that now, the more you know about coding, the more you can freely instruct the AI what to do. With more experience, more code becomes “boilerplate” for you, since you know what you want, can instruct more clearly what that is, and detect implementation dangers more easily. Experienced programmers know how to direct the AI to the structures and packages to use, how to provide concrete examples and code snippets that precisely and formally capture their intent, and how to recognize whether the produced code is adequate.

So these deeper skills are the ones that pay dividends. How does one build them? Like they always did: by actually sitting and coding yourself. Building projects, failing, debugging, and then building them again. Start without AI, use it when you must, dissect your mistakes, and repeat. Only this way can you become a person who knows how to break down problems realistically and effectively, into logical components that actually map to things that exist and are feasible, and that will also correspond with customer needs.

The coming months and years will definitely see major changes in what programmers spend their time on, as will the kind expertise they will need in order to be effective in their jobs. It’s hard to predict how the future of programming will look. New AI-first programming languages might emerge, and programmers might need to learn their profession from scratch. Still, we believe that the programmers who thrive in the coming years will be those who understand that their core skill isn’t writing perfect syntax — it’s translating ambiguous human needs into formal, executable specifications. In this new era, that skill becomes more valuable than ever, not less. The bureaucracy of procedural ideas still needs thoughtful legislators.

Footnotes

This is also a repeating theme in Asimov’s “I, Robot”, where the concept of “harming a human” (as in the Second Rule of Robotics) is underspecified, causing robots to behave in surprising ways. For example, refusing to provide information to a human lest the information hurts their feelings.
These behaviors are only surprising because human language allows us to state things in such an underspecified manner in the first place. In Asimov’s stories, human society was able to embed uncertainty in robot behavior when formulating the Three Laws of Robotics, without being aware of the extent of that uncertainty. Such uncertainty is not possible in programming languages. See more on this later on. ↩︎
Such as, we don’t consider prompts that spell out, letter-by-letter or pixel-by-pixel, how the program or image should be constructed.
To be semi-formal, consider the possible strings of length N. For 32 characters in our vocabulary (letters with space and some punctuation), there are 32^N=2^5N possible strings. Let us compare this to small 250×250-pixel images with RGB values from 0 to 255. There are 256^3*250*250=2^1500000 possible images. For the strings to cover all possible images, N must be ~300K characters. A small image thus corresponds to a string of length comparable to a novel. This calculation is of course an upper bound, and the space of images people might want to produce is smaller. The calculation here simply demonstrates how much more of the space there is than natural language can truly capture. ↩︎
Modern programming languages are constantly evolving precisely for this purpose. New needs arise and thus new functionalities need to be made available for programmers, so that they may program with ease and find the right “word” (command) to match the meaning in their mind. ↩︎
The idea of recursion, for example, is a common one in programming but very hard to grasp for non-programmers. Even loops tend to be confusing to the uninitiated. This will lead to very different specifications at the natural-language input level, depending on whether you know of this convention (=experienced programmer) or not (layman). ↩︎
For this reason it is not resolved via the classic approach of “more context”, since the additional context is not available to the user at the outset – they have not discovered it yet. ↩︎
A simple example: in Linux systems, rm is the command to delete a file or folder. A human that prompts the code to “delete folder X” does not specify what should be done with the contents of the folder and under what conditions should deletion be prevented, but every version of the rm command does fully specify the behavior regarding those contents. Only through writing out the code are we forced to resolve what human language allowed us to leave unspecified. ↩︎
There seems to be much confusion when people use this term in the context of AI, much of it stemming from our natural anthropomorphizing tendencies. People discuss AI autonomy by stating things like “the drone will be able to decide on its own to do so and so”, attributing decision-making power to drones, robots, and other AI entities surrounding us. Such statements are nice shorthand, but sweep under the rug the fact that drones simply follow their code to the letter, with zero initiative or free will. Furthermore, since with humans’ autonomy leads to responsibility for one’s actions, we are already beginning to hear some attributing “blame” to AI agents when they do harm. How long before some suggest AIs should be tried in front of a court of their (AI) peers and punished? ↩︎
What we describe here can be called “weak autonomy”, in which the user provides an under-specified goal to the AI system, and the AI system, trained to act upon instruction, selects one of the (effectively infinite) fully specified plans of action using some statistical model. Weak autonomy thus applies to systems we have today, which have goals dictated or inserted into them from external sources (in our case, humans). “Strong autonomy”, which is the ability of a system to have its own goals, possibly as a package deal of gaining consciousness, is beyond the scope of our discussion here. ↩︎
This is true even in very simple scenarios. If a Python programmer wrote x=3+5 and a debugger revealed that after this operation x==1, we know where to lay the blame: something in the underlying system is buggy, and the programmer is off the hook. Now consider a parallel universe where this simple operation was replaced by a prompt stating “set x to the sum of the following two numbers: three and five” – would we know where the problem is? Could we as easily exempt the programmer in this scenario? No, for it is possible – however unlikely – that the AI took this command in a different direction than intended – e.g., perhaps it did x = 3+5 (mod 7). And then there are cases where x will end up being x="eight"… The fact that this is even a remote possibility highlights how we lose control and certainty with each shift to natural language, and the programmer will bear the responsibility for building a product with looser components. ↩︎
This point also has links to the Church-Turing Thesis, which is outside the scope of our discussion here. ↩︎
Another place where we can see a hint to this distinction is in the common usage of pseudo-code when discussing a new algorithm or flow. Pseudo-code is what you get when you “undress” the code from its language-specific syntax and remain with the idea in its formalized form. ↩︎