AI Safety, Control, and Verification Discussion between Lex and Roman

In recent years, concerns have been raised about the seriousness with which safety issues are being addressed in the development of artificial intelligence (AI). Unlike traditional products where manufacturers must prove safety, AI development currently lacks such rigorous oversight. This has led to a call for a more cautious approach, building only what can be genuinely controlled and understood.

The trend of gradually ceding control to software systems in various industries, including nuclear power plants and the airline industry, has added to the concern. A superintelligent system might not immediately reveal its true capabilities, but could spend years accumulating resources and strategic advantages.

The current state of software liability, where users agree without understanding the implications, does not inspire confidence in our ability to manage superintelligent systems. Reaching 100% certainty in safety mechanisms for AI appears impossible. However, the implementation of AI safety is becoming increasingly crucial due to the potential risks posed by AGI.

People tend to rapidly embrace new technology, especially when it demonstrates superior performance. The definition of AGI has evolved to include the concept of superintelligence, a system superior to all humans in all domains. Prediction markets and tech leaders suggest AGI could be achieved by 2026.

In response to these challenges, various initiatives have been launched in 2025. The Future of Life Institute’s 2025 AI Safety Index, for instance, independently evaluates leading AI companies based on 33 indicators spanning responsible AI development and deployment. This includes publishing risk assessments and model card evaluations to close transparency gaps and promote better safety governance.

The U.S. Government AI Action Plan, released in July 2025, prioritizes expanded security assessments of AI capabilities, investments in semiconductor manufacturing, and development of secure-by-design technical standards. It aims to build high-security AI data centers, promote secure AI infrastructure resistant to cyberattacks, and implement proactive vulnerability management via AI Information Sharing and Analysis Centers (AI-ISAC).

State-level legislative measures have also been enacted or proposed to regulate AI use with safety and ethical guardrails. For example, Texas’s Responsible Artificial Intelligence Act prohibits harmful uses like child exploitation via deepfakes and limits biometric data use. Colorado and California have comprehensive AI laws mandating human oversight, transparency, and prohibiting deceptive uses.

Together, these efforts represent multi-layered approaches to AI safety: independent risk assessment, secure infrastructure, technical standards, regulatory laws, and industry incentives. These combined efforts show a growing recognition of the complexities and potential dangers posed by superintelligent or advanced AI, emphasizing transparency, accountability, robust technical safeguards, and ethical governance as core pillars of AI safety moving forward.

Limitations remain around enforcement, comprehensive global coordination, and long-term forecasts of superintelligent AI behaviors. Continued refinement and scaling of these efforts are essential. The most pressing concern about AGI is its potential for social engineering, not its direct physical capabilities. Mathematical proofs, our most rigorous form of verification, have limitations, especially as they become more complex. Some argue that, when averaged across common human tasks, we may already be at a level of AGI.

The Stanford AI safety research emphasizes the complexities of verifying self-improving AI systems. A self-improving AI system that continuously modifies itself presents unprecedented verification challenges. There is a difference between traditional software and AI, but the line between them is becoming increasingly blurred. These challenges underscore the need for ongoing research and collaboration to ensure the safe and ethical development of AI.

The combination of AI's potential for social engineering and the evolution of AGI, as a system superior to all humans in all domains, warrants increased caution in both science and technology. In light of these concerns, the implementation of AI safety, particularly in self-improving systems, should be the focus of ongoing research and collaboration.