Skip to content

Artificial Intelligence Expansion: Is Bigger Necessarily Better?

The Elevation of AI Technology: Is Quantity Over Quality the Rule? In the contemporary era, an abundance of AI tools and machine learning models are constantly emerging, being developed, trained, and promoted. A glance at Hugging Face reveals approximately 400,000 models accessible today...

Artificial Intelligence Inflation: Is Greater Quantity Equal to Superior Quality?
Artificial Intelligence Inflation: Is Greater Quantity Equal to Superior Quality?

Artificial Intelligence Expansion: Is Bigger Necessarily Better?

In the fast-paced world of Artificial Intelligence (AI), a shift in approach is gaining traction among the scientific community. The adoption of scientific methods, such as hypothesis-driven development (HDD), is being proposed as a means to address the current issues derived from the rapid development speed in AI.

HDD, a methodology rooted in centuries of scientific progress, aims to make AI projects more meaningful, streamlined, and focused on achieving specific objectives. By embedding HDD principles into AI/ML projects, organizations can temper the challenges brought by speed, ensuring each model iteration is validated against real-world criteria related to quality, impact, privacy, copyright, and investment.

One of the key benefits of HDD is its ability to enhance quality and reduce risks. Through iterative testing and validation of hypotheses based on objective criteria and real user feedback, false assumptions are uncovered early, and solutions that fail or cause harm are avoided.

Moreover, HDD prioritizes impactful features aligned with business and ethical goals, helping to avoid unintended negative impacts by carefully measuring desired versus real outcomes. It also incorporates privacy and copyright considerations as hypotheses to be tested, ensuring compliance and ethical respect for data use before large investments are made.

By following a "fail fast, fail cheap" approach enabled by hypothesis testing, HDD lowers investment losses by accelerating learning from failures and avoiding costly full-scale projects on unvalidated ideas. This approach has been demonstrated to increase the return on investment (ROI) in AI-driven pharmaceutical R&D by improving prediction success and reducing wasted efforts.

Furthermore, HDD facilitates transparent, data-driven decision-making, including integrating multimodal datasets to improve prediction accuracy while managing noise and biases. This approach improves the trustworthiness and robustness of AI/ML models.

However, privacy and copyright concerns arise as models are rarely documented, making it difficult to ensure compliance with new regulations like the EU AI Act. To address this, a proposal is to encourage AI engineers and data scientists to outline an objective with hypotheses before starting any AI or ML project.

The reproducibility crisis in scientific research, due to an overwhelming amount of research papers and publication pressure, is a parallel issue. As of November 6, 2023, Hugging Face offers approximately 400,000 ML models, a significant increase from the ~84,000 models available in November 2022. The rapid increase in the number of ML models may lead to a large number of available models with low quality due to insufficient testing and review.

The return on investment for an AI project can be compromised if the resulting product is of low quality or lacks a clear user purpose. To counteract this, a shift from an exploratory approach to a hypothesis-driven development method is proposed, encouraging AI engineers and data scientists to define clear objectives and hypotheses before starting a project.

Many new AI models are not human (or nature) centric and do not have a clear use case or positive impact. A lot of scientific breakthroughs did not start in the lab, they started with a thought which was shaped into a hypothesis. Hugging Face and other prominent platforms could one day require engineers and scientists to preregister their objectives and hypotheses before they can start working on a model.

In summary, by embracing hypothesis-driven development in AI/ML projects, organizations can safeguard against downsides common in rapid AI/ML deployments. This approach ensures each model iteration is validated against real-world criteria related to quality, impact, privacy, copyright, and investment, thus promoting innovation while minimizing risks.

Read also:

Latest