A method and apparatus for videorealistic, speech animation is disclosed.
A human subject is recorded using a video camera as he/she utters a
predetermined speech corpus. After processing the corpus automatically, a
visual speech module is learned from the data that is capable of
synthesizing the human subject's mouth uttering entirely novel utterances
that were not recorded in the original video. The synthesized utterance
is re-composited onto a background sequence which contains natural head
and eye movement. The final output is videorealistic in the sense that it
looks like a video camera recording of the subject. The two key
components of this invention are 1) a multidimensional morphable model
(MMM) to synthesize new, previously unseen mouth configurations from a
small set of mouth image prototypes; and 2) a trajectory synthesis
technique based on regularization, which is automatically trained from
the recorded video corpus, and which is capable of synthesizing
trajectories in MMM space corresponding to any desired utterance.