
最新阿里云云产品活动优惠券领取,阿里云文档智能基于多年技术积累打造的多模态文档识别与理解引擎,为用户提供各类文档文字提取和文档处理,支持通用场景、行业场景和自定义场景下的多样化文档处理需求。
To jointly optimize the waveform-domain adversarial loss, we employ multi-period discriminator (MPD)[20-21] and multi-scale discriminator (MSD)[20-21] to identify speech signal from two different perspectives. The MSD method is derived from the MelGAN vocoder. Through the average pooling operation, the length of the speech sequence is halved successively. Then the convolution operation is performed on the speech signals of different scales. Finally, it is flattened and output. The MPD method folds the single-channel audio sequence into a two-channel audio with different fixed-lengths called period, and then apply 2-D convolution on the folded data. The disadvantage of this approach is the folded data on each channel is mixed with artifacts of different frequencies. In edy order to make up for this defect, we proposed multi-length discriminator (MLD), to improve ability of discriminating synthetic or real audio as much as possible. Firstly, the single-channel audio is folded into multi-channel audio by wavelet transform[20]. Then apply 1-D dilated convolution as in [22]. In this way, each channel in the folded data contains few or no artifacts of other frequencies, ensuring the stability and accuracy of the discrimination.
The generator of PLCNet is a symmetric encoder-decoder structure with skip connections and residual units. The encoder and decoder each have 4 sub-modules, each sub-module of the encoder consists of 3 residual units and a down-sampling module and each sub-module of the decoder consists of an up-sampling module and 3 residual units. The residual unit alternately uses 1-D dilated convolution with kernel size of 7and 1-D convolution with kernel size of 1. The dilation rate is gradually increased using (1,3, 9). The input is first transformed by 1-D convolution with kernel size of 7, then the encoder maps the 16khz waveform to the 50hz representation through down-sampling block of (2,4,5,8) in the form of a stride convolution. The decoder uses the transposed convolution method to up-sampling in reverse order, restores the features to the same dimension as the speech. The number of channels is doubled when down-sampling and halved when up-sampling. The middle bottleneck layer acts as a bridge between encoder and decoder and consists of 3 1-D convolutions with kernel size of7. A skip-connection is used between the corresponding layers of the encoder and decoder to allow information such as phase or alignment to pass through. We use the ELU activation function [19] and weight normalization in the generator to guarantee the stability of adversarial training. Finally, the output of the decoder is a mono signal, with tanh limiting the output range to [-1,1]. To be able to process real-time audio streams on low-power mobile devices, all our convolutions are causal.

数据统计
温馨提示
关于文档智能特别声明
本站阿里云导航提供的文档智能优惠活动内容、折扣信息、优惠券、优惠码、免费试用入口等内容都来源于阿里云官方公开信息和公开渠道,不保证优惠折扣额的准确性,优惠金额应该以阿里云官方实时显示折扣金额为准!同时,用户通过本网站访问的活动链接、参与的优惠活动或购买行为,均属于用户与阿里云之间的独立关系,本网站不承担任何责任。
相关导航

工业大脑是阿里云的一体化数据智能产品,融合大数据与AI技术激活工业数据价值。通过汇集与分析全链路数据,以智能算法优化生产流程与工艺参数,助您显著降低成本、提升生产效率,全面加速智能制造转型升级。

新智能外呼机器人
阿里云智能外呼机器人即通过业务的场景自动发起的电话外呼任务。支持灵活画布配置,精准语音识别、自然人声合成、丰富开放API。通过人与机器人的语音对话交互收集业务结果,并对数据加以统计处理,获取用户反馈,帮助客户轻松实现智能化外呼。能够降低呼叫中心人力外呼成本,提升信息筛选及反馈效率。

新智能推荐 AIRec
智能推荐 AIRec 是阿里云官方个性化推荐服务,基于阿里领先的大数据与AI技术,提供丰富的开箱即用模型与行业模板,满足从快速上线到深度定制的业务需求,显著提升用户粘性与转化效率,驱动业务增长。

新通义大模型
通义大模型是阿里云打造的高性能、低成本的AI基础设施,依托其深厚的训练数据与优化技术,支持全模态高效精准的模型服务调用和AI应用快速搭建,还能实现模型的高效训练。它以卓越的文本生成和理解能力,服务于广泛的领域,为用户提供性价比极高的智能解决方案,重新定义了AI应用的开发与部署标准。

新智能开放搜索 OpenSearch
智能开放搜索 OpenSearch 是一站式AI驱动的智能搜索业务开发平台,融合RAG、向量检索与行业算法,支持对话式、多模态等智能搜索,助您快速构建企业级搜索服务,全面提升用户体验与业务转化。

新阿里云百炼
大模型服务平台百炼是企业级的大模型应用开发平台,一站式提供丰富的通义大模型与灵活的智能应用开发工具,助力企业快速落地AI应用,加速业务创新。

新云联络中心
云联络中心是阿里云为企业打造的一站式智能客服平台,能无缝整合电话、网站、APP等全渠道,并以AI能力升级数字员工与智能辅助体系,全面提高服务效率与客户满意度,助您加速业务创新。

新无影 Agent 开发套件 AgentBay
无影AgentBay是阿里云推出的全场景AI Agent执行平台,提供浏览器、云电脑、代码空间、云手机四大环境支持,具备秒级弹性伸缩、千级并发运维能力,集成企业级安全容器方案,助力深度研究、金融分析等场景的智能体高效运行。
暂无评论...
