The MGIE (MLLM-Guided Image Editing) model was developed by the tech giant with the University of California, and allows users to resize, crop, flip and add filters to photos through text prompts without the need for photo editing software.
The tool can be used with simple and more complex editing tasks, including altering specific objects in a photo to change shape or make them brighter.
The model incorporates two uses of multimodal language models, initially learning how to interpret prompts from a user.
Then, it 'imagines' what the edit would look like practically, such as turning up the brightness on the sky aspect of an image if the user asks for a bluer sky.
Users will simply have to type out what they want to change, and the edit gets made.
The researchers themselves took an image of a pepperoni pizza and typed the prompt "make it more healthy, which added vegetable toppings.
They said in the paper: "Instead of brief but ambiguous guidance, MGIE derives explicit visual-aware intention and leads to reasonable image editing.
"We conduct extensive studies from various editing aspects and demonstrate that our MGIE effectively improves performance while maintaining competitive efficiency.
"We also believe the MLLM-guided framework can contribute to future vision-and-language research."
Although Apple have made MGIE available to download through GitHub along with a web demo on Hugging Face Spaces, the company hasn't clarified any plans for the model beyond this research.