![]() ![]() Combining this novel perspective of two-stage synthesis with advanced generative models (i.e., the diffusion models),the proposed BinauralGrad is able to generate accurate and high-fidelity binaural audio samples.Experiment results show that on a benchmark dataset, BinauralGrad outperforms the existing baselines by a large margin in terms of both object and subject evaluation metrics (Wave L2: 0.128 vs. Specifically, in the first stage, the common information of the binaural audio is generated with a single-channel diffusion model conditioned on the mono audio, based on which the binaural audio is generated by a two-channel diffusion model in the second stage. Accordingly, we propose BinauralGrad, a novel two-stage framework equipped with diffusion models to synthesize them respectively. In this paper, we formulate the synthesis process from a different perspective by decomposing the binaural audio into a common part that shared by the left and right channels as well as a specific part that differs in each channel. This synthesis process involves not only the basic physical warping of the mono audio, but also room reverberations and head/ear related filtrations, which, however, are difficult to accurately simulate in traditional digital signal processing. As it is expensive to record binaural audio from the real world, synthesizing them from mono audio has attracted increasing attention. Yichong Leng* (University of Science and Technology of China) Zehua Chen* (Imperial College London) Junliang Guo (Microsoft Research) Haohe Liu (University of Surrey) Jiawei Chen (South China University of Technology) Xu Tan^ (Microsoft Research Asia) Danilo Mandic (Imperial College London) Lei He (Microsoft Azure Speech) Xiang-Yang Li (University of Science and Technology of China) Tao Qin (Microsoft Research Asia) Sheng Zhao (Microsoft Azure Speech) Tie-Yan Liu (Microsoft Research Asia) Equal contribution.īinaural audio plays a significant role in constructing immersive augmented and virtual realities. ![]() BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |