text-video-to-audio
a unified Any2Audio generation framework guided by Chain-of-Thought (CoT) reasoning