It's becoming extremely easy (and I don't think it's just a good thing) to alter a video, and the latest developments in AI are truly impressive.
A collaboration between giants (Stanford and Princeton Universities plus the Max Planck Institute for Informatics and Adobe) makes it possible to alter the speech in a video simply by modifying the textual transcription, and without creating the "dubbing" effect.
In other words, the person who is speaking on video will literally change the words of his speech, also modifying the lip movements.
To obtain this somewhat disturbing result, the algorithm "learns" the phonemes and their pronunciation by the subject in the video and creates an accurate 3D model of his face, capable of replicating all the sounds and movements: at that point it will be enough to edit the text of the speech and the algorithm will replace the original sentence.
Currently the algorithm needs at least 40 minutes of footage to "train" to replicate a person in a movie.
Here is a video demonstrating how the system works:
Huge ethical doubts
It is clear that this mechanism creates the possibility that anyone can modify a discourse (perhaps of political figures or public figures) by inserting elements of hatred, or disinformation, and spreading them as original and natural: this only increases concerns about the spread of systems based on deepfake.
On the other hand, there is some positive side, and it lies in the enormous savings that the editing will obtain by avoiding having to reshoot entire scenes due to small pronunciation errors.
For the rest, I am sure that other "anti-counterfeiting" methods will be developed for videos too: dynamic watermarks or watermarks that make the work of artificial intelligence even more complex, in a competition between reality and manipulation that already seems destined to characterize next years.