For several months now, GPT-5 has symbolized a new ambition for generative artificial intelligence: redesigned architecture, native multimodality, and enhanced reasoning capabilities.

OpenAI refers to a technological leap forward, confirmed by certain benchmarks in which GPT-5 outperforms GPT-4o in coding, mathematics, and image analysis tasks. However, despite these advances, the model remains unfinished. Powerful, but unstable; more refined, yet still fallible. GPT-5 marks a step forward, not a breakthrough.
It is a more sophisticated model, but not a revolutionary one.
The main change is in its architecture: an internal router selects the most appropriate sub-model in real time, depending on the request.
This logic improves the accuracy of reasoning and the ability to follow complex instructions. Multimodality—text, image, audio—is also becoming more fluid. However, the gap with GPT-4 remains less spectacular than expected. In symbolic logic and general reasoning tests, some competitors still outperform it. The GAIA benchmark, designed to evaluate the performance of complex tasks, ranks GPT-5 Medium behind other models. Above all, beyond the scores, the limitations appear in concrete uses: in operational productions such as the creation of infographics, presentations, or the development of structured code, GPT-5’s outputs often lag behind certain competitors, notably Gemini. Generative AI is now progressing more through successive adjustments than through real qualitative leaps.
Power that also amplifies risks.
Because it can generate text, images, audio, or code, GPT-5 mechanically increases the risks associated with misinformation. Deepfakes are becoming more realistic, while detection methods are becoming less reliable. This has led the European Union to impose, via the AI Act, a transparency requirement on synthetic content, and France to strengthen the fight against manipulation with the SREN1 law. But these measures alone will not be enough. Education on how to use AI, source verification, and a culture of reasonable doubt remain the first lines of defense. More powerful AI requires more vigilant users.
Technical limitations remind us that AI is not automatically true.
Despite the announced improvements, GPT-5 continues to produce factual errors, inconsistent deductions, and approximations. Its performance is superior, but its reliability is not guaranteed. Independent analyses point to persistent difficulties: fragile reasoning without step-by-step instructions, simple mathematical errors, and poorly executed complex code projects. Even its hallucination rate, although declining according to Vectara2, remains incompatible with total delegation to the machine. These weaknesses fuel a broader debate: has generative AI reached a plateau? Several researchers believe that current architectures are showing their limitations and that a real technological breakthrough will be necessary to reach a new milestone.
A model that is still very Anglo-centric.
Like all major American and Chinese models, GPT-5 has been trained primarily on English-language data. The result is clear to see: cultural biases, misunderstood French nuances, and fluctuating tone. Since its launch, GPT-5 has been criticized for having a colder style than GPT-4, a direct consequence of internal optimizations. For less represented languages, performance remains mechanical. Nuances, subtext, local culture: all of this remains difficult to reproduce. European models are progressing, but American and Chinese dominance in terms of data and computing power remains overwhelming.
GPT-5 confirms that generative AI has become an indispensable tool, while remaining far from truly autonomous. The more mixed reception of GPT-5.2 illustrates the current tensions in the sector: in a context of increased competition, particularly in the face of rapid advances by Gemini, some observers refer to a release that is perceived as rushed, accompanied by uneven performance depending on the use case. This climate fuels expectations that are sometimes disappointed and reminds us that the race to make announcements does not, on its own, guarantee a lasting qualitative leap.
Its limitations show that no model can replace human analysis, discernment, or responsibility. Companies would therefore be wise to integrate these technologies with a clear head: AI serves to augment teams, not replace them. The future of the sector will not be determined solely by the size or speed of model deployment, but by the ability to combine innovation, transparency, and controlled use. GPT-5 marks a new stage, and human vigilance remains the determining factor.
1The SREN law (Securing and Regulating the Digital Space), enacted in May 2024, strengthens the framework for large platforms in terms of protecting the public and combating the manipulation of information, particularly content generated or altered by AI. The text expands ARCOM’s powers to identify, report, and sanction deepfakes, and imposes new transparency obligations on digital players. Sources: Légifrance (Law No. 2024-449 of May 21, 2024), National Assembly.
2Vectara is a platform specializing in assessing the factual reliability of large language models. It has developed a hallucination evaluation model (Hughes Hallucination Evaluation Model, HHEM) and a Hallucination Leaderboard, which are used to measure and compare the rate of “hallucinations,” i.e., factually incorrect or invented information, of different LLMs when they summarize or generate content.
Written by Mathieu Changeat, Co-founder of Dydu, published on économie matin.