Some VoIP phones incorrectly use the description "G729a/8000" in SDP (e.g. this affects some Cisco and Linksys phones). This is incorrect as G729a is an alternative method of encoding the audio, but still generates data decodable by either G729 or G729a - i.e. there is no difference in terms of codec negotiation. Since the SDP RFC allows static payload types to be overridden by the textual rtpmap description this can cause problems calling from these phones to endpoints adhering to the RFC unless the codec is renamed in their settings since they will not recognise 'G729a' as 'G729' without a specific workaround in place for the bug.

Neural vocoder, such as WaveGlow, has become an important componentin recent high-quality text-to-speech (TTS) systems. In this paper,we propose Efficient WaveGlow (EWG), a flow-based generative modelserving as an efficient neural vocoder. Similar to WaveGlow, EWG hasa normalizing flow backbone where each flow step consists of an affinecoupling layer and an invertible 11 convolution. To reduce thenumber of model parameters and enhance the speed without sacrificingthe quality of the synthesized speech, EWG improves WaveGlow in threeaspects. First, the WaveNet-style transform network in WaveGlow isreplaced with an FFTNet-style dilated convolution network. Next, toreduce the computation cost, group convolution is applied to both audioand local condition features. At last, the local condition is sharedamong the transform network layers in each coupling layer. As a result,EWG can reduce the number of floating-point operations (FLOPs) requiredto generate one-second audio and the number of model parameters bothby more than 12 times. Experimental results show that EWG can reducereal-world inference time cost by more than twice, without any obviousreduction in the speech quality. 041b061a72


