AAS2F: Ambiguity-Aware Speech-to-Face Synthesis with Speaker-Conditioned Diffusion Models

Steps to use the demo:

  1. Upload or record a speech audio clip. Please provide at least 5 seconds of speech.
  2. Note that it works best with English, but should work with other languages as well.
  3. After you are done recording/uploading the audio, click the 'Generate' button to start the generation process.
  4. After a few seconds, the generated images will be displayed on the right.