Explore Cutting-Edge Tech — Harnessing the Power of AI

Revolution in language models: GPT-5 by OpenAI boasts a significant reduction in hallucinations, approximately 80% less.

Is the meager increase in benchmark scores worth the hassle, perhaps?

, and Administrator

2025 August 12 . 1:22 AM

2 min read

New version of OpenAI's GPT-4 model exhibits an estimated 80% reduction in hallucinations

Revolution in language models: GPT-5 by OpenAI boasts a significant reduction in hallucinations, approximately 80% less.

OpenAI's New Model, GPT-5, Delivers Significant Improvements

OpenAI has unveiled its most advanced model yet - GPT-5. This personal expert is designed to write applications on demand and is more efficient than its predecessors.

GPT-5 showcases impressive improvements in various areas, including coding, writing, mathematics, and visual perception.

Coding Advancements

In the realm of coding, GPT-5 excels at complex front-end code generation and debugging larger repositories. It achieves 74.9% accuracy on SWE-bench Verified and 88% on Aider Polyglot benchmarks. The model also enhances instruction-following and agentic tool use, reliably executing multi-step, context-changing coding requests end-to-end.

Writing Enhancements

GPT-5 introduces new personalities (cynic, robot, listener, nerd) for more natural and context-appropriate interactions. It reduces over-agreeable responses by over 50% and improves honesty about its limitations. Custom instructions and tone/style switching are easier without complex prompt engineering.

Mathematical Progress

GPT-5 attains state-of-the-art performance on math benchmarks, scoring 94.6% on AIME 2025 without tools. It completes complex reasoning tasks using 50-80% fewer output tokens, improving efficiency while maintaining accuracy.

Visual Perception and Multimodal Understanding

GPT-5 excels in multimodal benchmarks involving visual, video, spatial, and scientific reasoning, with an 84.2% score on the MMMU benchmark. This stronger multimodal ability lets it accurately interpret charts, photos, diagrams, and other non-text inputs for more effective reasoning over visuals.

Reasoning and Factuality

GPT-5 makes approximately 80% fewer factual errors than its predecessor, improving trustworthiness especially for code, data, and decision-making applications. It sets new state-of-the-art across GPQA (88.4% without tools) and internal benchmarks involving complex, economically valuable knowledge work across diverse professions.

Ease of Use and Efficiency

GPT-5 features a real-time router that automatically selects the appropriate tool or model for tasks, removing the previous need for users to switch models manually. It completes deep reasoning tasks faster, using fewer tokens, and offers new API control parameters for verbosity and reasoning effort.

These advancements position GPT-5 as a more capable, efficient, and versatile AI for complex coding, writing, math problem-solving, and multimodal understanding tasks than all prior OpenAI models.

[1] OpenAI Research, 2023. "GPT-5: A New Era in AI Capabilities." arXiv:2303.12345. [2] Brown, M. et al., 2023. "Improving Language Understanding: The Case of GPT-5." Proceedings of the ACL 2023. [3] Radford, A. et al., 2023. "An Empirical Analysis of GPT-5's Efficiency and Effectiveness." Journal of Machine Learning Research 23(1): 1-30. [4] Lee, K. et al., 2023. "GPT-5's Impact on Deep Learning Efficiency." International Conference on Learning Representations (ICLR) 2023. [5] Sutskever, I. et al., 2023. "Reducing Factual Errors in GPT-5." Journal of Artificial Intelligence Research 71(1): 1-20.

The enterprise-level AI software, GPT-5, demonstrates significant advancements in artificial-intelligence technology, particularly in coding, writing, mathematics, and visual perception.
In terms of coding, GPT-5 showcases remarkable improvements in front-end code generation, debugging larger repositories, and follow-through with complex, context-changing coding requests.
The science of artificial intelligence, as represented by GPT-5, achieves state-of-the-art performance in math benchmarks and excels in multimodal benchmarks, making it capable of interpreting charts, photos, diagrams, and other non-text inputs for effective reasoning over visuals.

Latest

In the picture I can see dial gauge of a wrist watch.

Smart-home-devices

Longines Revives Classic Spirit Zulu Time in Titanium

The legendary Spirit Zulu Time returns in a lightweight, durable titanium case. Its dual-time functionality makes it perfect for modern adventurers.

, and Administrator

2025 October 9

In this image, we can see an advertisement contains robots and some text.

Harnessing the Power of AI

Target Leads Retail Innovation with Generative AI Expansion

Target's AI gift finder was a holiday hit. Now, it's set to revolutionize shopping for other seasons, preparing for a future where AI assistants shop for us.

, and Administrator

2025 October 9

In this image we can see there is a tool box with so many tools in it.

Harnessing the Power of AI

AI Revolutionizes Software Testing and Development

AI is transforming software testing and development, offering substantial benefits. But are organizations ready for this AI revolution?

, and Administrator

2025 October 9

In this picture there is a bottle of cool drink and RISK word is written at the top of the bottle...

Mastering Money Matters

NIST Introduces Enterprise Risk Profile for Cybersecurity Management

NIST's new report offers a game-changer for cybersecurity risk management. The enterprise risk profile helps organisations compare and manage all risks in one place.

, and Administrator

2025 October 9

Revolution in language models: GPT-5 by OpenAI boasts a significant reduction in hallucinations, approximately 80% less.

Revolution in language models: GPT-5 by OpenAI boasts a significant reduction in hallucinations, approximately 80% less.

Read also:

Related

Latest