Can you imagine a gray cat? Well. Now, imagine him with white fur. Now, imagine him walking on the Great Wall of China. Done? Here, in these moments a rapid series of neuronal activations in your brain has produced variations in the image presented, based on your previous knowledge of the world.
Easy to imagine, for us human beings. For an artificial intelligence it's a completely different story, however. Despite advances in neural networks that match or exceed human performance at certain tasks, computers are still far from human's ability to imagine things.
Imagine? Impossible for an AI. At least until yesterday.
Now, a USC research team has developed artificial intelligence that uses human-like abilities to imagine a never-before-seen object with different attributes. The paper, titled “Zero-Shot Synthesis with Group-Supervised Learning,” was released in May, and collateral research has flourished ever since.
“We were inspired by human visual generalization capabilities to try to simulate human imagination in machines,” says the study's lead author Yunhao Ge. “Humans can separate their learned knowledge by attributes, for example shape, pose, position, color, and then recombine them to imagine a new object. Our paper attempts to simulate this process using neural networks.”
The generalization problem of artificial intelligence
Suppose we want to create an artificial intelligence system that generates images of cars. At the beginning we provide the algorithm with some images of a car. The task would be to generate many types of cars, in any color, from multiple angles. This is a serious challenge: create neural networks capable of extracting the underlying rules and applying them to a wide range of new examples never seen before. But networks today are trained on sample characteristics, without taking into account the attributes of an object.
In this new study, the researchers attempted to overcome this limitation.
The secret? It's called untangling
The research team's work was based on the application of a method called disentanglement. Untangling can be used to generate deepfake, for example, synthesizing new images and videos that replace one person's identity with another person, but maintaining the original movement.
The new approach takes a group of sample images, not one sample at a time like traditional algorithms. Dd extracts the similarity between them to achieve something called “controllable disentangled representation learning”.
Then, it recombines this knowledge to obtain a “new controllable image synthesis”. We could use the verb “imagine”.
It's a very similar process to how we humans extrapolate: when a human sees the color of an object, we can easily apply it to any other object by replacing the original color with the new one. Using the untangling technique, the team generated a new dataset containing 1,56 million images that could aid future research in the field.
Imagining helps to understand the world
While disentanglement is not a new idea, the researchers say their framework can be compatible with almost any type of data or knowledge. This broadens the opportunities for applications.
In the field of medicine, for example, disentanglement could help doctors and biologists discover more useful drugs by separating medical function from other properties and then recombining them to synthesize new medicine. Getting machines to “imagine” could also help create safer artificial intelligence. For example, allowing autonomous vehicles to imagine and avoid dangerous scenarios never seen before during training.
“Deep learning has already demonstrated unsurpassed performance and promise in many fields. Too often, however, this has happened through superficial mimicry and without a deeper understanding of the separate attributes that make each object unique,” said Laurent Itti, professor of computer science. “This new disentanglement approach, for the first time, truly unleashes a new sense of imagination in AI systems, bringing them closer to human understanding of the world.”