It's becoming extremely easy (and I don't think it's only good) to alter a video, and the latest developments in AI are truly impressive.
A collaboration between giants (Stanford University and Princeton plus the Max Planck Institute for Informatics and Adobe) makes it possible to alter the speech in a video simply by modifying the textual transcription, and without creating the “dubbing” effect.
In other words, the person who is speaking on video will literally change the words of his speech, also modifying the lip movements.
To achieve this somewhat disturbing result, the algorithm “learns” the phonemes and their pronunciation from the subject in the video and creates an accurate 3D model of his face, capable of replicating all sounds and movements: at that point just edit the text of the speech and the algorithm will replace the original sentence.
Currently the algorithm needs at least 40 minutes of video to "train" to replicate a person in a video.
Here is a video demonstrating how the system works:
Huge ethical doubts
It is clear that this mechanism creates the possibility that anyone can modify a discourse (perhaps of political figures or public figures) by inserting elements of hatred, or disinformation, and spreading them as original and natural: this only increases concerns about the spread of systems based on deepfake.
On the other hand, there are some positive sides, and it is in the enormous savings that the editing will achieve by avoiding re-shooting entire scenes due to small pronunciation errors.
For the rest, I am sure that other "anti-counterfeiting" methods will also be developed for videos: dynamic watermarks or watermarks that make the work of artificial intelligence even more complex, in a competition between reality and manipulation that already seems destined to characterize the next years.