c# – 如何找到SpeechSynthesizer所选语音的音频格式

发布时间：2020-12-15 18:01:31 所属栏目：百科来源：网络整理

导读：在C#中的文本到语音应用中,我使用SpeechSynthesizer类,它有一个名为SpeakProgress的事件,它为每个口语单词触发.但是对于一些声音,参数e.AudioPosition不与输出音频流同步,并且输出波形文件的播放速度比此位置显示的快(见 this related question). 无论如何,

在C#中的文本到语音应用中,我使用SpeechSynthesizer类,它有一个名为SpeakProgress的事件,它为每个口语单词触发.但是对于一些声音,参数e.AudioPosition不与输出音频流同步,并且输出波形文件的播放速度比此位置显示的快(见 this related question).

无论如何,我正在尝试找到关于比特率和与所选语音相关的其他信息的确切信息.如果我可以用这个信息初始化wave文件,那么同步问题就会被解决.但是,如果我在SupportedAudioFormat中找不到这样的信息,我也不知道找到它们.例如,“Microsoft David Desktop”语音在VoiceInfo中不提供支持的格式,但它似乎支持PCM 16000 hz,16位格式.

如何找到SpeechSynthesizer所选语音的音频格式

var formats = CurVoice.VoiceInfo.SupportedAudioFormats;

 if (formats.Count > 0)
 {
     var format = formats[0];
     reader.SetOutputToWaveFile(CurAudioFile,format);
 }
 else
 {
        var format = // How can I find it,if the audio hasn't provided it?           
        reader.SetOutputToWaveFile(CurAudioFile,format );
}

解决方法

更新：这个答案在调查后编辑.最初我从内存建议支持的音频格式可能只是(可能配置错误的)注册表数据;调查显示,对于我来说,在Windows 7上,这绝对是这样,并且在Windows 8上支持acecdotally.

SupportedAudioFormats的问题

System.Speech包装着名的COM语音API(SAPI),一些声音是32对64位,或者可能配置错误(在64位机器的注册表中,HKLM /软件/ Microsoft / Speech / Voices vs HKLM / Software / Wow6432Node / Microsoft /语音/声音.

我在System.Speech及其VoiceInfo类中指出了ILSpy,我非常相信SupportedAudioFormats完全来自注册表数据,因此当枚举SupportedAudioFormats时,如果您的TTS引擎未正确注册,可以获得零结果您的应用程序的Platform目标(x86,Any或64位),或者供应商根本不在注册表中提供此信息.

语音可能仍然支持不同的,附加的或更少的格式,因为这取决于语音引擎(代码)而不是注册表(数据).所以这可能是黑暗中的一枪.在这方面,标准Windows语音通常比第三方声音更加一致,但是它们仍然不一定有用地提供SupportedAudioFormats.

寻找这个信息的硬道

我发现仍然可以获得当前语音的当前格式 – 但是这样做依赖于反射来访问System.Speech SAPI包装器的内部.

因此这是相当脆弱的代码！我不建议在生产中使用.

注意：以下代码要求您一次调用Speak()进行安装;需要更多的呼叫来强制设置,而不需要Speak().但是,我可以叫Speak(“”)说什么也没有,只是很好.

执行：

[StructLayout(LayoutKind.Sequential)]
struct WAVEFORMATEX
{
    public ushort wFormatTag;
    public ushort nChannels;
    public uint nSamplesPerSec;
    public uint nAvgBytesPerSec;
    public ushort nBlockAlign;
    public ushort wBitsPerSample;
    public ushort cbSize;
}

WAVEFORMATEX GetCurrentWaveFormat(SpeechSynthesizer synthesizer)
{
    var voiceSynthesis = synthesizer.GetType()
                                    .GetProperty("VoiceSynthesizer",BindingFlags.Instance | BindingFlags.NonPublic)
                                    .GetValue(synthesizer,null);

    var ttsVoice = voiceSynthesis.GetType()
                                 .GetMethod("CurrentVoice",BindingFlags.Instance | BindingFlags.NonPublic)
                                 .Invoke(voiceSynthesis,new object[] { false });

    var waveFormat = (byte[])ttsVoice.GetType()
                                     .GetField("_waveFormat",BindingFlags.Instance | BindingFlags.NonPublic)
                                     .GetValue(ttsVoice);

    var pin = GCHandle.Alloc(waveFormat,GCHandleType.Pinned);
    var format = (WAVEFORMATEX)Marshal.PtrToStructure(pin.AddrOfPinnedObject(),typeof(WAVEFORMATEX));
    pin.Free();

    return format;
}

用法：

SpeechSynthesizer s = new SpeechSynthesizer();
s.Speak("Hello");
var format = GetCurrentWaveFormat(s);
Debug.WriteLine($"{s.Voice.SupportedAudioFormats.Count} formats are claimed as supported.");
Debug.WriteLine($"Actual format: {format.nChannels} channel {format.nSamplesPerSec} Hz {format.wBitsPerSample} audio");

为了测试它,我重命名了Microsoft Anna的AudioFormats注册表项HKLM / Software / Wow6432Node / Microsoft / Speech / Voices / Tokens / MS-Anna-1033-20-Dsk / Attributes,导致SpeechSynthesizer.Voice.SupportedAudioFormats在查询时没有元素.以下是这种情况下的输出：

0 formats are claimed as supported.
Actual format: 1 channel 16000 Hz 16 audio

（编辑：李大同）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!