AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

GPTKB entity

Statements (31)
Predicate Object
gptkbp:instanceOf gptkb:academic_journal
text-to-audio generation model
gptkbp:assesses Foley sound generation
audio captioning
text-to-audio retrieval
gptkbp:author gptkb:Mark_D._Plumbley
gptkb:Qiuqiang_Kong
gptkb:Wenwu_Wang
Yuxuan Wang
Yong Xu
Haohe Liu
gptkbp:citation 100+
gptkbp:contribution proposes a latent diffusion model for text-to-audio generation
gptkbp:field gptkb:machine_learning
generative models
audio synthesis
https://www.w3.org/2000/01/rdf-schema#label AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
gptkbp:input text prompt
gptkbp:output audio waveform
gptkbp:publicationYear 2023
gptkbp:publishedIn gptkb:arXiv
gptkbp:relatedTo gptkb:Stable_Diffusion
text-to-image generation
gptkbp:repository https://github.com/haoheliu/AudioLDM
gptkbp:trainer gptkb:Clotho
AudioCaps
ESC-50
gptkbp:url https://arxiv.org/abs/2301.12503
gptkbp:uses latent diffusion models
gptkbp:bfsParent gptkb:AudioLDM
gptkbp:bfsLayer 7