Apple’s New Open Source AI Model for Image Editing: MGIE and Multimodal Large Language Models (MLLMs)
2 min readApple, a company known for its innovative technology and consumer electronics, has recently entered the AI game with its new open source AI model for image editing, named MGIE (MLLM-Guided Image Editing). This groundbreaking technology, developed in collaboration with researchers from the University of California, Santa Barbara, utilizes multimodal large language models (MLLMs) to interpret text-based commands when manipulating images.
The significance of MGIE lies in its ability to transform simple or ambiguous text prompts into detailed and clear instructions that the photo editor itself can follow. For instance, if a user types in “make this image of a pepperoni pizza more healthy,” MLLMs can interpret it as “add vegetable toppings” and edit the image accordingly. This level of interpretation is crucial as human instructions can sometimes be too brief for current methods to capture and follow.
MGIE’s capabilities extend beyond major image edits. It can also crop, resize, and rotate images, as well as improve brightness, contrast, and color balance through text prompts. Furthermore, it can edit specific areas of an image, such as modifying the hair, eyes, and clothes of a person in the image or removing elements in the background.
Apple’s MGIE model is currently available for experimentation through GitHub, and a demo is also hosted on Hugging Face Spaces. While Apple has yet to announce any plans to incorporate this technology into its products, the potential applications are vast.
The use of MLLMs in image editing is a significant step forward in the field of AI. These models have the power to transform text prompts into detailed instructions that can be followed by the image editor. The ability to edit images based on text commands opens up a world of possibilities for content creators, designers, and artists.
Moreover, the use of open source AI models like MGIE allows for collaboration and innovation within the AI community. Developers and researchers can build upon this technology, contributing to its growth and refinement. This approach fosters a culture of continuous improvement and innovation, ultimately benefiting consumers and businesses alike.
In conclusion, Apple’s new open source AI model for image editing, MGIE, represents a significant contribution to the field of AI. Its use of multimodal large language models (MLLMs) to interpret text-based commands when manipulating images opens up a world of possibilities for content creators, designers, and artists. The potential applications of this technology are vast, and its availability through open source platforms allows for collaboration and innovation within the AI community. Apple’s entry into the AI game is a testament to its commitment to innovation and its potential to shape the future of technology.