Audio-Oscar:
A Multi-Agent System for Complex Audio Scene Generation, Orchestration, and Refinement
Abstract
In recent years, audio generation has made significant progress in tasks such as text-to-speech (TTS), text-to-audio (TTA) and text-to-music(TTM). However, generating long-form and controllable audio from complex audio scene descriptions remains a significant challenge, as such scenes often require coordinated speech, sound effects, music, songs, temporal structure, and post-production. In this work, we introduce Audio-Oscar, a multi-agent framework for generating audio from complex descriptions. Audio-Oscar coordinates a set of specialist agents, each responsible for a different aspect of the audio scene, including role modeling and voice design, speech generation, fine-grained timeline planning, model selection, non-speech generation, and audio post-production. Audio-Oscar further incorporates feedback-driven refinement. In addition, to address the lack of suitable benchmarks for evaluating audio generation from complex audio scene descriptions, we construct ASG-Bench(Audio Scene Generation Benchmark), which includes both scene descriptions paired with reference audio and text-only scene descriptions. Each scene is annotated with target audio events and temporal statements to evaluate whether the generated audio faithfully realizes the required scene content and temporal structure. Experimental results show that Audio-Oscar can effectively generate audio that matches complex scene descriptions.
Method Overview
Multi-agent collaborative framework for complex audio generation.
Given a natural-language scene description, our system decomposes it into independent audio elements—dialogue, effects, ambient, and music—generated by specialized agents and assembled on a structured timeline for coherent, editable audio scenes.
Structured Planning
Decomposes complex scenes into plannable audio elements with explicit timeline representation.
Multi-Agent Collaboration
Specialized agents handle speech, sound effects, ambient, and music generation independently.
Fine-grained Editing
Supports local regeneration and post-processing without regenerating the entire scene.
Audio Samples & Demos
Experimental results across different prompts.
Text-to-Audio
T2A EN 1
Immersive cyberpunk city alleyway at night: persistent heavy rain hitting metal dumpsters and sizzling on hot neon signs, a sputtering electrical zap from a broken streetlamp, a low-flying hover vehicle roaring past overhead with a distinct Doppler effect, and a distant, muffled synth-bass thrumming from an underground club.
T2A EN 2
Magical forest soundscape evolving over 30 seconds: starts with a gentle, shimmering wind scattering dry leaves, gradually transitioning into crisp, exotic bird calls echoing through tall trees, accompanied by a fast-flowing, sparkling creek, and underneath it all, a subtle, ethereal, high-frequency hum that feels mystical.
T2A EN 3
Complex industrial soundscape inside a steampunk engine room: a dominant, rhythmic heavy piston thudding like a heartbeat, intermittent high-pressure steam releases hissing sharply, metallic gears grinding and clicking, the continuous deep roaring rumble of a massive furnace, and a clanging brass bell chiming twice at the very end.
T2A ZH 1
暴风雨逼近海边无线电塔的密集声音场景:风穿过金属缆线,远处海浪,雨声从轻到强,一个松动舱门不断撞击,电台静噪声,以及底部持续的低沉不安氛围。
T2A ZH 2
极地暴风雪期间脆弱的小木屋内部幽闭的冬季声音场景:冰冷的狂风在屋顶呼啸、发出高频的尖啸声,粗壮的木梁在厚重的积雪下发出嘎吱嘎吱的剧烈断裂声,一块松动的窗玻璃疯狂地摇晃碰撞,以及屋子中央一个小铁炉里安静、形成鲜明对比的柴火噼啪声。
T2A ZH 3
雨夜中极速前进的重型蒸汽机车声音图景:巨大的金属活塞往复运动时发出的沉重轰隆声,锅炉内炉火吞噬煤炭的剧烈呼啸,高压蒸汽泄压时发出的震耳欲聋的长鸣,雨点密集地横扫在滚烫金属车壳上的'嗤嗤'声,以及车轮与铁轨接缝处有节奏的、冰冷的撞击声。
Speech
Speech EN 1
Lena whispers, 'Don't touch the console yet.' Marcus replies, 'If we wait any longer, the whole grid goes dark.' Their voices should feel close, urgent, and restrained, with a low server-room ambience underneath.
Speech EN 2
Rain lashes against the tall glass windows of the control tower. Thunder rolls in the distance, radio static cuts in and out, and radar screens beep softly in the background. Nora says quietly, 'Flight 712 just disappeared from the radar.' Elias replies, 'No, it didn't disappear. Someone erased it.' Nora turns toward him. 'That's impossible.' Elias whispers, 'Then explain why the backup log is gone too.' A burst of radio static fills the room. A faint pilot's voice comes through, distorted and broken. Nora says, 'Was that them?' Elias answers, 'I don't know. But don't respond yet.' Nora asks, 'Why?' Elias says, 'Because they may not be the only ones listening.' The scene should feel tense and cinematic, with rain, thunder, radar beeps, radio interference, glass vibration, and low control-room ambience.
Speech EN 3
A narrow city street at night, slick with recent rain. Neon signs buzz above small storefronts, distant traffic rolls past the intersection, and a subway rumbles faintly beneath the pavement. Lena whispers, 'Don't turn around yet.' Marcus replies quietly, 'Tell me they're not following us.' Lena says, 'I can't. They crossed the street when we did.' A car passes slowly, tires hissing over wet asphalt. Marcus lowers his voice. 'How many?' Lena answers, 'Two. Maybe three.' A phone vibrates in Marcus's pocket, but he doesn't answer it. Marcus says, 'If we run, they'll know we saw them.' Lena breathes in sharply. 'And if we don't?' Marcus says, 'Then we reach the corner and disappear into the crowd.' Their voices should feel close, low, urgent, and restrained, with wet street ambience, neon buzzing, distant traffic, muffled subway rumble, footsteps, dripping water, a humming vending machine, and occasional passing cars underneath.
Speech ZH 1
空荡的地铁站里,隧道深处传来风声,坏掉的日光灯滋滋闪烁,远处有水滴落下。瑞秋低声说:'无线电又收到那段求救了。'欧文:'播放给我听。'一阵沙沙声后,断断续续的人声响起,又很快被杂音吞没。瑞秋皱眉:'这段录音是昨天的。'欧文看向黑暗的隧道:'不,它是现在发出来的。'瑞秋声音发紧:'可昨天我们已经找到那个人了。'欧文轻声说:'我知道。所以别回应。'背景应有地铁隧道回声、荧光灯电流、无线电噪声、水滴、远处铁轨轻微震动和低沉风声。两人的声音贴近、谨慎、像在压低声音避免被隧道里的东西听见。
Speech ZH 2
深夜的城市窄街刚下过雨,路面湿亮。小店招牌上的霓虹灯发出轻微嗡鸣,远处路口有车辆驶过,地下隐约传来地铁经过的低沉震动。附近的自动售货机低声运转,消防梯上不断有水滴落下,脚步声在两侧砖墙之间回荡。莉娜低声说:'先别回头。'马库斯压低声音回答:'告诉我他们没跟上来。'莉娜说:'我不能。他们跟我们一起过了街。'一辆车缓慢驶过,轮胎碾过湿漉漉的柏油路,发出细密的摩擦声。马库斯低声问:'几个人?'莉娜回答:'两个,也可能三个。'马库斯口袋里的手机震动了一下,但他没有接。马库斯说:'如果我们跑,他们就知道我们发现了。'莉娜急促吸了一口气:'那如果不跑呢?'马库斯说:'走到拐角,然后混进人群里消失。'两人的声音应当贴近、低沉、急迫而克制,背景有潮湿街道环境声、霓虹灯嗡鸣、远处车流、地下地铁闷响、脚步声、水滴声、自动售货机低鸣,以及偶尔经过的车辆声。
Speech ZH 3
狭小的博物馆安保室里,老旧监控屏发出幽幽光亮。空调咔哒作响,摄像头转动时发出轻微机械声,墙上的钟在寂静中滴答作响。诺拉低声说:'三号摄像头刚刚自己动了。'伊莱亚斯回答:'系统不是自动模式。'诺拉说:'那是谁在控制它?'一个监控屏突然闪白了一秒,空荡展厅里传来轻微脚步声。伊莱亚斯说:'别触发警报。'诺拉转向他:'为什么?'伊莱亚斯压低声音:'因为警报也接在同一个网络上。'另一个摄像头发出咔哒声,屏幕显示出一条黑暗走廊。诺拉低声说:'东门旁边站着什么东西。'伊莱亚斯说:'别盯着它看太久。'场景应包含监控屏电流声、摄像头马达声、钟表滴答声、空调震动、远处脚步声、展厅混响和冰冷的安保室氛围。
Complex Audio Scenes
Scene EN 1
In a narrow back alley behind the night market, rainwater is dripping from the cracked plastic awning above a small noodle stall, landing in uneven puddles with soft, scattered 'plip, plop' sounds. A rusty exhaust fan mounted on the brick wall spins lazily, giving off a tired 'whirr... clack... whirr...' every few seconds. Steam rises from a dented metal soup pot, hissing gently as the old vendor stirs thick broth with a long ladle. The gas stove underneath keeps burning with a low blue 'fwoooosh,' occasionally sputtering when raindrops splash too close to the flame. A young man sits on a plastic stool, his wet jacket squeaking faintly as he shifts his shoulders. He breaks apart a pair of disposable chopsticks with a dry 'snap,' then blows on the steaming noodles in front of him. Across the table, his friend shakes rainwater off an umbrella, sending droplets tapping rapidly onto the concrete floor. The friend laughs and says, 'You dragged me across three blocks in this weather just for a bowl of noodles?' The young man slurps a mouthful loudly, the broth and noodles making a warm 'shrrrp' sound, then replies, 'Not just noodles. This place tastes exactly like the night before we left home.' Behind them, a scooter passes through the alley, its engine buzzing and fading into the rain. The vendor drops a handful of fresh noodles into boiling water, and the pot erupts into a rolling 'blurble-blurble' boil. Metal tongs scrape against the pot rim with a sharp 'clang,' and a ceramic bowl is set down on the counter with a heavy 'tok.' The friend grows quiet for a moment, then says softly, 'Then order one more. We've got time tonight.' Just then, a nearby neon sign flickers violently, buzzing 'bzzzt-bzzzt,' before snapping fully awake in a red glow, lighting up the rain like sparks in the dark alley.
Scene EN 2
Before dawn at a small fishing harbor, cold wind sweeps across the dark water, pushing waves against the concrete pier with heavy 'slap... slosh... slap' impacts. Dozens of moored fishing boats tug against their ropes, wooden hulls knocking together with hollow 'thunk, thunk' sounds. Metal hooks swing from a rack beside the pier, clinking softly whenever the wind catches them. A distant buoy bell rings somewhere in the fog, slow and uneven — 'dong... dong...' Under a dim dock lamp, an old fisherman crouches beside a pile of tangled nets. He pulls at the wet rope with both hands, and the net fibers stretch and scrape with a sticky 'shrrrk.' Beside him, a younger man stacks plastic fish crates onto a cart, each crate landing with a hollow 'clack.' The wheels of the cart squeal as he pushes it over the uneven pier. He glances at the black clouds above the water and says, 'The weather report said the storm would turn south.' The old fisherman spits into the sea and replies, 'Weather reports don't know this harbor. The wind changed an hour ago.' A small diesel engine coughs to life on one of the boats: 'chug-chug-chug... CHUG.' Exhaust smoke drifts low over the water. A loose tarp snaps violently in the wind, cracking like a whip. The younger man tightens his hood and says, 'So we're not going out?' The old fisherman pauses, listening to the waves hit the outer breakwater. 'We go out ten minutes late,' he says. 'Not one minute earlier.' Suddenly, a thick rope under tension slips from a rusted cleat. It whips across the pier with a violent 'THWACK,' slamming into a stack of metal buckets and sending them crashing everywhere in a chaotic 'KLANG-KLANG-KLANG.' One boat jerks sideways, its hull scraping hard against the dock. The old fisherman grabs the young man by the sleeve and shouts over the wind, 'Hold the bow line! Now!' Above them, thunder rolls across the harbor in a deep, endless 'grrrrrrr-BOOM,' and every mast begins to rattle like bones in the dark.
Scene EN 3
Inside a cramped old bookstore on a stormy evening, rain hammers against the front windows with a dense, restless 'tat-tat-tat-tat.' The wooden sign outside swings in the wind, its rusty chain groaning softly before tapping against the glass with a dull 'tok... tok.' Tall shelves packed with yellowed books lean slightly toward the narrow aisles, and every now and then the building creaks as if taking a deep, tired breath. A small electric heater beneath the counter buzzes weakly, fighting the damp air with a thin 'mmmmmm.' An elderly shopkeeper stands behind the counter, repairing the spine of a torn book with a brush and glue. The brush strokes make faint sticky sounds against the paper, while the pages whisper 'frrrp' whenever he turns them. Near the back of the shop, a student in a soaked hoodie pulls a heavy encyclopedia from a shelf. The book resists for a moment, scraping against its neighbors with a dry 'shhhhk,' then drops into his arms with a heavy 'thud.' The student walks to the counter and places the book down. Dust puffs into the light of a green desk lamp. 'I found the one you told me about,' he says, trying to catch his breath. The old shopkeeper looks up over his glasses and says, 'That book is not for casual reading.' The student lowers his voice. 'I'm not reading casually. The name inside it matches the letter my grandmother left me.' A flash of lightning turns the whole shop white for an instant. The thunder follows almost immediately, rolling over the roof with a deep 'KRAK-BOOOOM.' Several books shift on the shelves. Somewhere behind the wall, a loose pipe rattles violently. The old shopkeeper reaches for the encyclopedia, but before his fingers touch it, the book opens by itself. Its dry pages begin flipping rapidly — 'frrr-frrr-frrr-frrr' — though there is no wind inside the room. The heater cuts out with a tiny 'click,' the lights dim, and from between the pages comes the faint sound of a woman humming an old lullaby.
Scene ZH 1
宽旷的老旧汽车修理厂里,角落里的空气压缩机正发出沉闷且有节奏的"突突突"运作声,半空中悬挂的老式白炽灯接触不良,偶尔爆出"嗞嗞"的电流微响。一滴粘稠的废机油从高高架起的汽车底盘上坠落,砸在下方的铁皮接油盘里,发出一声清脆的"滴答"。男人躺在修车滑板上,手里用力拧着扳手。突然扳手打滑,沉重的精钢工具猛地磕在排气管上,爆发出"当!"的一声震耳脆响。他伴随着滑板轮子在粗糙水泥地上滚动的"骨碌碌"声滑了出来,抹了一把脸上的黑漆漆的油污长叹道:"这辆老福特的底盘螺丝简直像焊死了一样,死磕了一下午,我这腰都快折了。"旁边的同伴正往铁桶里扔废弃的火花塞,细碎的金属块砸在空荡荡的桶底,发出一连串轻快空灵的"叮叮当当"声。他扯过一块沾满油污的碎布擦了擦手,靠着巨大的轮胎笑道:"别跟它较劲了,喷点除锈剂先放一晚。明晚去盘山公路跑一圈?我那辆机车的化油器刚调好,跑完咱们去山顶吹吹风,吃那家路边摊的炒河粉去去机油味。"男人撑着地面坐起来,扭动脖子时颈椎发出一声轻微的"咔吧"响。他随手把扳手扔进旁边的红色铁皮工具箱里,砸出一阵响亮且混杂的"哗啦"金属碰撞声。他疲惫的眼神中透出一丝期待,笑着应道:"行啊,不过先说好,下山的时候你别又像个疯子一样瞎拧油门,我可不想半夜去绿化带里捞你。明晚八点,老加油站集合。"话音刚落,修理厂深处的检测工位上突然有人猛踩了一脚油门,一阵震耳欲聋的V8引擎轰鸣声——"轰——嗡嗡嗡!"瞬间如野兽咆哮般爆发,暴躁的声浪把架子上的修车工具震得"嗡嗡"作响。
Scene ZH 2
一家小餐馆打烊前的后厨里,空气中弥漫着蒸汽、蒜香、热油味,还有水槽里潮湿的金属气息。炒锅灶喷出猛烈的蓝色火焰,在发黑的铁锅下方持续发出'呼——'的轰鸣声。热油溅在锅壁上,'滋滋滋'地爆响;旁边的案板上,菜刀快速落下,敲出密集的'哒哒哒哒'节奏。后方的洗碗机一边低沉呻吟,一边喷出一阵阵沉重的水流声。一个围裙沾满油渍的厨师正在翻炒米饭,饭粒撞击锅面,发出干燥沙沙的'唰啦'声。旁边,一个年轻服务员抱着一摞陶瓷碗,每只碗都不安地互相轻碰,'叮叮'作响。服务员看向前厅,说:'七号桌问厨房还开不开。'厨师停了半秒,然后头也不抬地笑了一声:'告诉七号桌,厨房关了,除非他们是来洗碗的。'金属点菜单打印机突然'滋滋滋'地响了起来,又吐出一张新订单。厨师一把扯下纸条,眯眼看了看,然后呻吟道:'三份炒面,两碗饺子汤,一份特辣牛肉。'服务员无奈地笑了:'所以……厨房还是关了吗?'厨师把蒜末扔进锅里,炒锅顿时爆出一阵尖锐浓香的滋啦声。'情感上关了。'他说,'技术上还没有。'就在这时,炉灶上方那台老旧抽油烟机突然剧烈咳嗽起来。电机卡顿着发出'呃呃……咔'的声音,随后猛地冲成一阵深沉的金属轰鸣。架子上一只松动的托盘被震得一点点滑向边缘。服务员伸手去抓,但已经太晚。托盘掉下来,砸进一堆锅里,整个后厨瞬间爆发出灾难般的'哐啷——砰——叮当'巨响。厨师僵在原地,随后用锅铲指着服务员说:'现在你肯定要洗碗了。'
Scene ZH 3
暴雨中的山间小庙里,雨水从弯曲的屋瓦上成股流下,不停砸在石阶上,发出连续的'哗——'声。院子四周的竹林被风雨压弯,竹叶彼此摩擦,掀起一阵阵不安的沙响。正殿内,古旧神像前的香缓慢燃烧,香头偶尔轻轻噼啪。门口附近悬着的一口铜钟,每当风钻进门缝,便微微摇晃,发出低而柔和的'咚'声。一个年轻旅人跪坐在门边,拧着湿透的袖子。水滴落在木地板上,发出轻轻的'啪、啪、啪'声。一位老和尚坐在小火盆旁,用铁钳拨动炭块。炭火干涩地'咔啦'移动,橙色火星短暂升入潮湿空气中。旅人发着抖说:'我以为下山的路还没封。'和尚把热茶倒进一只裂纹陶杯里,回答:'山想关的时候,自己就会关。'雷声在云层后方滚动。庙里的灯笼轻轻摇晃,让神像的影子在墙上缓慢移动。旅人双手接过茶杯,陶瓷杯沿碰到他冰冷的手指,发出轻轻的'嗒'声。他问:'这里经常有人迷路吗?'和尚看向被雨染黑的院子,说:'只有那些听见钟声之后还继续走的人。'就在这时,门口那口铜钟自己响了一下。'咚。'和尚悬在火盆上方的手停住了。外面,隔着雨帘,石阶小路上传来脚步声——缓慢、潮湿,而且刻意。'哗……踏……哗……踏……'旅人转头看向门口。一阵突如其来的风穿过正殿,'噗'地同时吹灭了两支蜡烛。铜钟再次响起,这一次更响,而在风雨深处,有什么东西在庙门上敲了三下。
Model Comparisons
Comparing our model against baselines.
Speech
| Prompt | Ming-Omni-TTS | Ours |
|---|---|---|
|
"speaker_1: 所以你想成功的话,就推荐你看这些书。speaker_2: 我会有时间去看一看的。speaker_1: 要是像我看的话,我就会感觉特别的。speaker_2: 枯燥。speaker_1: 对枯燥无聊毕竟是古文也看不懂除非那些。" |
||
|
"speaker_1: 知道家长在考虑什么让家长也知道孩子们在考虑什么。speaker_2: 对。speaker_1: 减少矛盾。speaker_2: 对,就是感觉其实出这些电影或者电视剧,也是挺好的让彼此更加了解一下,我感觉如果是一个家长和一个小孩儿,去看电视剧的话,收获也是蛮多的。speaker_1: 那你还有什么比较好的电影介绍给我呢。" |
Text-to-Audio
| Prompt | Stable Audio | Ours |
|---|---|---|
|
"Immersive cyberpunk city alleyway at night: persistent heavy rain hitting metal dumpsters and sizzling on hot neon signs, a sputtering electrical zap from a broken streetlamp, a low-flying hover vehicle roaring past overhead with a distinct Doppler effect, and a distant, muffled synth-bass thrumming from an underground club." |
||
|
"Complex industrial soundscape inside a steampunk engine room: a dominant, rhythmic heavy piston thudding like a heartbeat, intermittent high-pressure steam releases hissing sharply, metallic gears grinding and clicking, the continuous deep roaring rumble of a massive furnace, and a clanging brass bell chiming twice at the very end." |
Audio Scene
An energetic electronic house track with crisp hi-hats, a driving snare/clap, and bright synth layers kicks in immediately, establishing a lively, modern atmosphere.
Any2Speech
WavJourney
Ours
An energetic electronic house track with crisp hi-hats, a driving snare/clap, and bright synth layers kicks in immediately, establishing a lively, modern atmosphere. Over this steady musical bed, an enthusiastic male narrator with a clear British RP accent launches straight into the story, saying, "to seven chassis, the Dutch company is famous for producing insane roadsters with ludicrous power and barely any weight." The music maintains a consistent volume underneath his voice as he continues, "The D8 GTO uses a two and a half liter five cylinder TFSI engine from Audi, specially tuned to achieve three hundred and forty to three hundred and eighty horsepower." He explains further without pause, "This standard engine is used all over Audi's model range. Due to the fact it is relatively small and turbocharged, it is easily tunable, and this is exactly why Donkervoort uses it." The narration remains dry and centered, with no ambient car noises or reverb, keeping the focus entirely on the voice and the rhythmic background track. After a brief breath marked by a minimalist visual cue, the narrator shifts tone slightly to address the audience directly: "So there you are. Can you think of any other hero cars with humble engines? Let me know in the comments below and don't forget to like or dislike the video." As soon as he finishes the word "video," the music cuts out abruptly, replaced by a single, sharp cinematic sound of a car door slamming shut—a solid "thunk" that serves as a definitive full stop to the segment.
A quiet outdoor ambience hums with a faint, low-level electronic hiss as the scene opens on the water.
Any2Speech
WavJourney
Ours
A quiet outdoor ambience hums with a faint, low-level electronic hiss as the scene opens on the water. The sharp, rhythmic clicking of a fishing reel cuts through the stillness, accompanied by the soft splash of water being disturbed. A man's voice, close and breathy with excitement, says, "There he is," followed quickly by, "Got him that time." Wet thuds and gentle splashing continue as the fish struggles near the surface, the reel's mechanical whir persisting underneath. The man speaks softly to his catch, "Come here buddy," his tone warm and satisfied. As the action settles, the reel falls silent and the water calms, coming with an enthusiastic music based on electronic guitar, bass and drums. The man shifts into a concluding tone, announcing, "Well, that is gonna wrap it up for me today." A distinct plastic clack sounds as he handles his gear, perhaps closing a tackle box or setting down equipment. Without pause, he transitions into a promotional message: "Be sure to check out Two More Cast Tackle Box, top link in the video description." His voice remains clear and steady over the quiet background hiss. He adds, "Bobber fishing is a fun tactic this time of year," then observes, "It's another female, white belly, a little bit bigger belly," describing the fish he holds. Just as he begins another sentence—"Well that is gonna…"—the audio cuts off abruptly, leaving only the faint ambient hiss for a split second before silence.
A steady, retro electronic loop kicks in immediately, driven by crisp hi-hats ticking in eighth notes and a snare snapping on beats two and four, all wrapped in a bright major-key harmony that plays without interruption.
Any2Speech
WavJourney
Ours
A steady, retro electronic loop kicks in immediately, driven by crisp hi-hats ticking in eighth notes and a snare snapping on beats two and four, all wrapped in a bright major-key harmony that plays without interruption. Over this energetic bed, a young adult male voice speaks rapidly and clearly, his tone neutral and informative, saying, "This is the rarest object in the back rooms, and it's also the most powerful. It's object number 66, or the Leviathan's Tooth, as it's been nicknamed. The object was originally a full clay tablet that was 29 by 29 inches in size. Over the years it's been broken and smashed more, so it's not that big." The music maintains its constant volume and rhythm underneath, never dipping or swelling, as he continues, "It was discovered in 2003, and it's said that any person who touches this object will instantly gain the power it possesses, and all the knowledge that it's said to have. As I just said, the tablet's been broken so it's less powerful now, but its original purpose was apparently to transport people instantly from the back rooms to reality using the law of equivalent exchange, meaning a soul—" The sentence cuts off abruptly mid-word, and the music stops dead at the exact same moment, plunging the scene into total silence with no trailing reverb or fade.
初始时,一座老旧体育馆内充斥着篮球拍打地板的咚、咚、咚的有力回响,伴随着球鞋摩擦地面的尖锐吱吱声。
Any2Speech
WavJourney
Ours
初始时,一座老旧体育馆内充斥着篮球拍打地板的咚、咚、咚的有力回响,伴随着球鞋摩擦地面的尖锐吱吱声。一群少年的呼喊声交织:"这里!""传!"背景里,通风系统低沉的嗡嗡声如影随形。一名教练吹响哨子,尖锐的哨音划破空气,所有人瞬间安静,只有篮球滚动和喘息声。教练高声说:"暂停!你们看看自己的防守!"说话间,他用力拍手,发出清脆的啪声。球员们围拢过来,脚步声杂乱,水瓶被拧开的咕噜声和喝水声响起。他接着说:"第二节还剩三分钟,注意力集中!"话音未落,广播里传出一段激昂的摇滚乐前奏,电子吉他声骤然响起,试图调动气氛。但教练示意关掉音乐,做出一个下压手势,音乐戛然而止,只留下尾音微微回荡。他指着球场说:"盯紧9号。"球员们散开,球鞋声再次密集起来。一名队员运球,球撞击地面的节奏加快,然后他跳投,球鞋落地一声闷响,篮球划过空气的嗖嗖声后,砸在篮筐上发出哐当一声,弹飞出去,落在远处地板上连续弹跳几下。观众席上有零星的掌声和口哨声。教练吼道:"篮板!!"场边有人按下计时器,发出清脆的滴滴声。短暂的停顿后,裁判的哨音再次响起,比赛继续。球鞋声、呼喊声、拍球声重新交织成一片混乱而充满活力的声浪,直到广播里传来电子蜂鸣声,第一节结束的提示音尖锐地回荡在整个场馆。所有人动作放缓,喘息声和脚步声渐次减弱。
从节目一开始,一段轻柔而略带复古感的钢琴垫乐便缓缓渗入空间,营造出一种深夜电台的氛围,并贯穿始终。
Any2Speech
WavJourney
Ours
从节目一开始,一段轻柔而略带复古感的钢琴垫乐便缓缓渗入空间,营造出一种深夜电台的氛围,并贯穿始终。主播的声音沉稳而富有磁性,他缓缓开口道:"欢迎收听《深夜回声》,我是你们的老朋友,林深。"话音落下后,他轻轻翻动纸张,发出一阵细碎的沙沙声,像是在整理一叠旧信。他随即念道:"今晚的第一封信,来自一位署名'微光'的听众……"读信过程中,他的声音里带着一丝不易察觉的哽咽,背景的钢琴音始终低回盘旋,如同月光般铺满整个房间。当他说到"她说,她最后一次见到他,是在三年前的雨夜"时,窗外适时传来一声遥远的雷鸣,沉闷而悠长,仿佛为这段话做了注脚。短暂的停顿间,他轻啜了一口茶水,传来一声清晰的吞咽声。他继续说道:"于是她学会了在雨中散步,在所有能听见水声的地方寻找他的影子。"话音刚落,一阵细密的雨声从远及近地漫过整个空间,持续了大约七八秒,逐渐与钢琴融为一体。他放下杯子,指尖在桌面轻点两下,发出笃笃的声响,随后语气一转:"但时间终究是一味良药,对吗?昨天,她又寄来了一封信,说她已经搬去了海边。"随着他念出新信内容,背景音中隐隐加入了海浪潮汐的沙沙声,轻柔而舒缓,像是从远方渐渐靠近。他念道:"她说,这里的风很咸,但很自由。"话音未落,音频中的钢琴声突然被一阵信号干扰的杂音打断,短暂刺耳后,一切恢复平静。主播轻笑一声,语气带着歉意:"抱歉,录音设备有点老化了。"之后,他沉默了两秒,背景随即只剩下雨声和微弱的海浪声交织在一起。他缓缓开口,声音低得仿佛自言自语:"那么,今晚的节目,就在这里和大家告别吧。"他说完"晚安"二字后的瞬间,所有声音——钢琴、雨声、海浪、信号杂音——全部同时截断,留下一片彻底的寂静。
暴雨的雨水砸在铁皮屋顶上,发出密集的噼啪声,夹杂着偶尔的雷鸣低吼。
Any2Speech
WavJourney
Ours
暴雨的雨水砸在铁皮屋顶上,发出密集的噼啪声,夹杂着偶尔的雷鸣低吼。记者站在檐下,声音被雨声压得时断时续:"我们正在被洪水围困的村庄外围,水位还在上涨。"话音未落,远处传来一声巨大的树干断裂声,随后是哗啦的水声。一位村民焦急地喊道:"快!那边的堤坝要垮了!"紧接着,众人沉重的跑步声混杂在雨幕中,有人吹响哨子。记者提高音量:"都在组织撤离,但道路已经泥泞不堪。"话音刚落,一阵狂风吹过,雨声瞬间加剧,屋顶的金属板被吹得哐当作响。风声和雨声融为一体,人声在其中显得渺小而顽强。记者喘息着说:"我们只能等待救援队——"话音被一声巨大的雷鸣截断,音频在此刻戛然而止。