AAS2F: Ambiguity-Aware Speech-to-Face Synthesis with Speaker-Conditioned Diffusion Models
Steps to use the demo:
- Upload or record a speech audio clip to generate face images conditioned on the speaker's voice. Please provide at least 5 seconds of speech. Note that it works best with English as the model is trained on English speech, but should work with other languages as well.
- Click the 'Generate' button to start the generation process.
1 15
1 10
10 50
0 9999