AAS2F: Ambiguity-Aware Speech-to-Face Synthesis with Speaker-Conditioned Diffusion Models
Steps to use the demo:
- Upload or record a speech audio clip. Please provide at least 5 seconds of speech.
- Note that it works best with English, but should work with other languages as well.
- After you are done recording/uploading the audio, click the 'Generate' button to start the generation process.
- After a few seconds, the generated images will be displayed on the right.