Explore Cutting-Edge Tech — Harnessing the Power of AI

Meta introduces Emu Edit: Accurate image adjustments guided by text commands

Struggling interpreters met with a solution: Multi-task training in Emu Edit streamlines edit instruction interpretation.

, and Administrator

2025 July 7 . 11:25 PM

2 min read

Meta introduces Emu Edit: Accurate image alterations through text commands

Meta introduces Emu Edit: Accurate image adjustments guided by text commands

In a groundbreaking development, researchers from Meta's AI lab have introduced Emu Edit, an artificial intelligence system designed to revolutionise instruction-based image editing. The system, built upon a multi-task learning approach, aims to bridge the gap in AI that can follow edit instructions as effectively as humans.

Emu Edit's innovative design integrates both precise recognition and generative tasks within a unified framework. This approach allows the system to handle diverse editing instructions more effectively than previous systems, which often required separate architectures, training methods, and parameter settings for different editing tasks. By learning multiple tasks simultaneously, Emu Edit can better understand the semantics of an image and the user's intent, resulting in higher-quality edits that maintain semantic consistency and content preservation.

At the heart of Emu Edit's functionality is a text classifier that predicts the most appropriate task embedding based on the instruction. This embedding guides the model to apply the correct type of transformation - whether it's a "texture change" or "object removal". Ablation studies have validated that the multi-task training with vision and editing tasks improved performance on region-based edits.

Emu Edit comprises a dataset covering 16 distinct tasks grouped into three categories: region-based editing, free-form editing, and vision tasks. Region-based editing tasks include adding, removing, or substituting objects and changing textures. Free-form editing tasks involve global style changes and text editing. Vision tasks include object detection, segmentation, depth estimation, and more.

One of the key advantages of Emu Edit is its ability to adapt to wholly new tasks like image inpainting via "task inversion" with just a few examples. Emu Edit has demonstrated state-of-the-art performance on automated metrics for faithfulness of edits and preservation of unrelated image regions.

The multi-task approach of Emu Edit provides two key advantages: improved recognition abilities for accurate region-based edits and exposure to a wide range of image transformations beyond just editing. Emu Edit's performance was showcased in benchmarks such as the EMU-Edit Test, where it outperformed several state-of-the-art baselines in both semantic edit alignment and content preservation metrics.

The authors of the research have also released a benchmark that covers seven different image editing tasks, inviting other researchers to test and compare their systems with Emu Edit. This move is expected to further advance the field of instruction-based image editing and drive innovation in AI systems capable of understanding and executing complex natural language edit instructions.

In summary, the multi-task learning approach in Emu Edit improves instruction-based image editing by unifying understanding and generation tasks into a single, coherent framework that produces higher fidelity, semantically accurate, and user-aligned edits compared to previous specialized systems.

The artificial-intelligence system, Emu Edit, leverages technology by combining precise recognition and generative tasks within a unified framework, which allows it to handle diverse editing instructions more effectively than previous systems. Furthermore, the multi-task learning approach in Emu Edit not only enhances its recognition abilities for accurate region-based edits but also exposes it to a wide range of image transformations, contributing to its high-performance in instruction-based image editing.

Latest

In the picture I can see dial gauge of a wrist watch.

Smart-home-devices

Longines Revives Classic Spirit Zulu Time in Titanium

The legendary Spirit Zulu Time returns in a lightweight, durable titanium case. Its dual-time functionality makes it perfect for modern adventurers.

, and Administrator

2025 October 9

In this image, we can see an advertisement contains robots and some text.

Harnessing the Power of AI

Target Leads Retail Innovation with Generative AI Expansion

Target's AI gift finder was a holiday hit. Now, it's set to revolutionize shopping for other seasons, preparing for a future where AI assistants shop for us.

, and Administrator

2025 October 9

In this image we can see there is a tool box with so many tools in it.

Harnessing the Power of AI

AI Revolutionizes Software Testing and Development

AI is transforming software testing and development, offering substantial benefits. But are organizations ready for this AI revolution?

, and Administrator

2025 October 9

In this picture there is a bottle of cool drink and RISK word is written at the top of the bottle...

Mastering Money Matters

NIST Introduces Enterprise Risk Profile for Cybersecurity Management

NIST's new report offers a game-changer for cybersecurity risk management. The enterprise risk profile helps organisations compare and manage all risks in one place.

, and Administrator

2025 October 9

Meta introduces Emu Edit: Accurate image adjustments guided by text commands

Meta introduces Emu Edit: Accurate image adjustments guided by text commands

Read also:

Related

Latest