AAS2F: Ambiguity-Aware Speech-to-Face Synthesis with Speaker-Conditioned Diffusion Models

Upload or record a speech audio clip and generate face images conditioned on the speaker's voice. Please provide at least 5 seconds of speech.

1 15
1 10
10 50
0 9999