The working principle and classification of speech recognition chip

Speech recognition chip is also called speech recognition IC, compared with the traditional speech chip, speech recognition chip is a big feature of speech recognition, it can let the machine understand human speech, and can perform various actions according to the command, such as blinking, moving mouth (intelligent doll). In addition, the voice recognition chip also has high quality, high compression rate recording playback function, can realize man-machine dialogue. The technologies involved in speech recognition chip include signal processing, pattern recognition, probability theory and information theory, phonation mechanism and auditory mechanism, artificial intelligence and so on.

The principle of speech recognition chip embedded speech recognition system adopts the principle of pattern matching. The input speech signal is preprocessed firstly, including sampling, anti-aliasing filtering and speech enhancement, followed by feature extraction, which is used to extract a group or groups of parameters that can describe the characteristics of speech signal from the waveform of speech signal. The data after feature extraction is generally divided into two steps. The step is the "learning" or "training" stage of the system. The task of this stage is to build a reference pattern library. The second is the "recognition" or "test" stage, in which the distortion measure between the speech characteristic parameters and the speech information to be tested and the corresponding template in the pattern library is obtained according to certain criteria, and the matching is the recognition result. 2. Classification of speech recognition chips According to the limitations of users, speech recognition chips can be divided into specific voice recognition chips and non-specific voice recognition chips.

The working principle and classification of speech recognition chip

1, speaker-dependent speech recognition chip speaker-dependent speech recognition chip is aimed at designated person's speech recognition, others do not recognize, deposited in the user's voice first reference samples as a comparison of database, the speaker-dependent speech recognition has to be before use voice training, general training according to machine clew 2 times voice entry can be used.

2, speaker-independent speech recognition chip speaker-independent speech recognition is not against the specified identification technology, regardless of their age, sex, just speak the same language, the application mode is in front of the product in accordance with the identified a dozen voice interaction terms, to collect the voice of the 200 or so samples, after processing of PC algorithm is interactive voice model and characteristics of entry database, And then burn it to the chip. Machines using these chips (smart dolls, tamagotchi, children's computers) are interactive.

Some non-specific voice recognition applications are phoneme-based algorithms. In this mode, interactive recognition can be performed without collecting many people's voice samples, but the disadvantage is that the recognition rate is not high and the recognition performance is unstable.
According to the continuity of speech mode, speech recognition chip can be divided into discontinuous speech recognition and continuous speech recognition.

3, discontinuous speech recognition chip for discontinuous speech, recognition said each word must be identified separately, after each word to pause.

4, continuous speech recognition chip continuous speech recognition can be generally natural and fluent speech to humanized speech recognition, but because of the problem related to the connected tone, it is difficult to achieve a good recognition effect.

Leave a Comment