Car audio sound control system based on UniSpeech-SDA80D51

The development of modern electronic technology has enabled more and more in-vehicle electrical appliances to join the ranks of automotive electronics. While improving the performance of automobiles, it also increases the complexity of driving operations of automobiles, which brings unsafe hidden dangers to the driving process. With the improvement of the speech recognition algorithm and the advent of a new generation of dedicated speech processing chips, the voice control replaces the manual control of the vehicle-mounted electrical appliances, thereby reducing the burden on the driver's manual operation and greatly improving the driving safety.

This article refers to the address: http://

At present, the electronic voice control of the body in China is mainly concentrated on the application of the car navigation system, and the application of the voice recognition technology in the body electronics has not been fully exerted. This paper firstly proposes a design scheme of a non-specific human car audio voice control system with a dedicated voice processing chip UniSpeech-SDA80D51 as the core, and realizes the development of the system prototype.

1 car audio voice control system

The system consists of voice acquisition, voice recognition, control drive and car audio. The main functions of the system are: the voice acquisition module is used to collect the voice command signal sent by the driver, and the voice recognition module realizes the A/D conversion of the signal. And performing speech recognition processing on the converted digital signal, and finally outputting the word encoding signal corresponding to the voice command, and the control module performs logic analysis and processing on the received word encoded signal and generates a corresponding control signal to drive the car audio action instead of Manual operation of the driver.

1.1 Speech Recognition Module

The speech recognition module is mainly composed of a UniSpeech-SDA80D51 chip and peripheral circuits.

SDA80D51 is a high-integration SoC special chip newly developed by Infineon of Germany for speech recognition and speech processing applications. Its basic structure is shown in Figure 1.

As can be seen from Figure 1, the SDA80D51 integrates direct dual access fast SRAM, 2-channel ADC and 2-way DAC, multiple communication interfaces and general-purpose GPIO. The working mode of SDA80D51 is M8051 as the main control chip, which mainly completes system configuration and control of SPI, PWM, I2C, GPIO interface and voice data transmission; DSP core OAK is coprocessor, completes speech recognition algorithm, speech codec algorithm Wait for voice processing work.

The non-specific human voice signal is input by the directional pickup, and is subjected to A/D conversion by the data acquisition module inside the SDA80D51, and then processed by the preprocessing of the identification program, the endpoint detection, the feature parameter extraction, the template matching, etc., and the closest in the recognition vocabulary is selected. The number of the entry is used as the recognition result, and the recognition result is output through the GPIO port.

1.2 Control Drive Module

The control drive module is composed of an MCU and an analog switch and a peripheral circuit. The module is mainly used for receiving the voice recognition result, and performing logic analysis and processing on the word coded signal, and generating a corresponding function control signal to drive the sound action through the analog switch circuit. The MCU selects the AT89S51 of the American ATMEL company, integrates the output voltage characteristics of the AT89S51 output I/O signal and the characteristics of the SL1102C1 audio control panel resistive shunt keyboard circuit, and determines the closing and opening action of the control panel buttons of the SL1102C1 using the relay. The schematic diagram of the AT89S51 and relay analog switch circuit is shown in Figure 2.

1.3 audio module

This design is based on the SL1102C1 car stereo. The SL1102C1 is a car stereo designed for mid-range cars. It features MP3 playback, radio and display time. It is currently used in the JAC Tongyue sedan. The front panel of the SL1102C1 has a total of 15 buttons for power on/off, mute, sound, play/pause, and a code switch for adjusting the volume.

The SL1102C1 front panel key control is a partial pressure identification method, and the buttons include short press and long press. AT89S51 output voltage is TTL level, direct drive audio is easy to cause key code misidentification, resulting in system misoperation, so this paper uses the circuit shown in Figure 2, which solves the above problem well. When the AT89S51 receives the speech coded signal, it immediately performs logic analysis and outputs the corresponding control signal to drive the relay to pull the analog button action. The short press and long press function of the button are realized by software.

The analog switch circuit is also suitable for the code switch on the front panel of the SL1102C1, and the code switch has a volume adjustment function. When the switch knob is rotated, the corresponding terminal of the switch outputs a corresponding pulse signal. When the MCU receives the voice command signal for operating the code switch, the drive terminal outputs a pulse signal to simulate the switch knob function.

2 system software design

The system software includes a non-specific person speech recognition module and a logic control module.

2.1 Non-specific person speech recognition module

The non-specific human speech recognition module is based on a hidden Markov model algorithm. The HMM algorithm collects statistics on a large number of voice data, establishes a statistical model speech library that identifies the terms, and then extracts features from the speech to be recognized, matches the model library, and obtains the recognition result by comparing the matching scores, and passes the GPIO port of SDA80D51. Output. The non-specific human speech recognition module is mainly composed of signal preprocessing, feature parameter extraction, model matching and Viterbi algorithm. The block diagram of the module is shown in Figure 3.

2.1.1 Signal preprocessing

The signal preprocessing part mainly performs the sampling and analog/digital conversion functions of the input speech signal. The A/D conversion is implemented by the SDA80D51 embedded with a 12-bit A/D converter, and the sampling frequency is fixed at 8 kHz.

2.1.2 Feature Parameter Extraction

The feature parameter extraction is based on a speech frame, and the feature is extracted by using a frame. First, the speech signal is overlapped and framing, and the previous frame and the latter frame overlap by half (the frame signal overlap is to reflect the correlation between the adjacent two frames of data), the frame length is 25 ms, and the speech feature is extracted once for each frame. .

The MFCC parameters belong to the perceptual frequency domain cepstrum parameters, reflecting the characteristics of the short-term amplitude spectrum of the speech signal. The specific calculation and extraction process of the p-dimensional MFCC parameters is as follows:

(1) Calculate the linear spectrum for each frame s(n:m) by DFFT, and calculate the square of the spectrum mode as the power spectrum;

(2) The power spectrum is obtained through the Mel filter bank to D parameters X(i), where D is the number of triangular filters in the Mel filter bank;

(3) Logarithmically and discrete cosine transforms are performed on X(i). The cosine transform is calculated as follows:

Y(i) in the equation is the output of the i-th Mel filter logarithmic energy, i=1, 2, ..., D.

2.1.3 HMM speech recognition algorithm

The hidden Markov algorithm uses a probabilistic statistical model to describe the speech signal. The HMM model is based on the Markov chain and uses the MarKov chain to simulate the statistical characteristics of the speech signal. The HMM model is a double stochastic process, one of which is the Markov chain, which describes the state transition by (Ï€, A) and the output as a state sequence. The other is a stochastic process, described by B. In the statistical sense, B reflects the state and observation. The correspondence between the outputs is a sequence of observation vector. The state and time parameters in the Markov chain are discrete Markov processes.

The Viterbi algorithm is a frame synchronization dynamic regularization algorithm. Given a sequence of observations and a model, the Viterbi algorithm gives a sequence of states with the largest probability density P(Q, O|Î»). The Viterbi algorithm includes initialization, recursion, termination, path backtracking, and determining the best state sequence.

For speech processing, P(Q, O|Î») takes a large range due to the change of Q, and the maximum value of P(Q, O|Î») accounts for all P(Q, O|Î»). Large components, so you can use the Viterbi algorithm to calculate P(O|Î»).

2.2 Control module

The main function of the control module is: after the AT89S51 queries the voice entry signal, the lookup table obtains the entry code, and according to the code, the corresponding button is long pressed or short pressed, and respectively enters the corresponding subroutine processing. In the subroutine, the I/O control signal corresponding to the output voice command drives the relay to pull the analog button or the code switch action, and resets the I/O port in time. The control module also has full compatibility with manual control. It can also be manually operated while the voice control operation is performed. The manual priority is higher than the voice command, which avoids the conflict between voice control and manual control.

The control module part of the program code is as follows:

Void main()
{... //Initialize
While(1)
{... //Initialize
While(P0 == 0x00) //wait for voice signal
{ WDTRST=0x1E;
WDTRST=0xE1; //WD instruction
YXSY=0;}
YXSY = 1;
If(date != P0)
{ delayms(6); //delay function
Date = P0;
If(date==1 || date==2) //Power on, mute
{ PWR = 0; //Power button pressed
Delayms(200);
PWR = 1;} //The power button is released...
P1 = 0xff;
P2 = 0xff;}}}

3 system test results

The system carried out the non-specific person speech recognition rate and the analog switch action accuracy test on the JAC Tongyue SL1102C1 car audio. Since the voice entry of car audio is 2~4 words, the experimental content of speech recognition rate is 18 for car audio, 12 for 3 words, 12 for 4 words, and 10 for 4 words. 6 people, 4 males and 2 females (mandarin and dialect), the experimental environment is the laboratory environment. In order to improve the recognition rate of the system, the system uses the Olympus ME52 directional microphone to improve the microphone receiving range. The system test results are shown in Table 1.

As can be seen from Table 1, the recognition rate of the system is related to the number of voice command words, the microphone receiving distance, and the speaker dialect. The recognition rates of male and female voices are close.

In the system control circuit experiment, the analog switch action reached a high accuracy rate, and the test result was 98% or more. As long as the control program was running normally, each relay could perform the closed and open analog manual switch operation according to the program.

Realizing the voice control of automotive electrical appliances is the development trend of automotive electrical appliances in the future, and more and more solutions are constantly being proposed and verified. The design proposed in this paper is to use the SDA80D51 chip on the SL1102C1 car audio system to realize the voice recognition and control of the car audio. The prototype obtained by the design has a high recognition rate, stable work, strong scalability, and achieves the expected design goals, and the entire design and implementation method is feasible. Since the speech recognition rate varies with the environment and the speaker, although the HMM algorithm can obtain a high recognition rate in a low-noise environment, the speech recognition system is used when the test speech or the environment contains different degrees of noise pollution. The performance will be reduced. Improving the noise immunity and robustness of the system is one of the keys to the practical application of speech recognition systems.

T16 Series Terminal Blocks

Withstand high voltage up to 750V (IEC/EN standard)

UL 94V-2 or UL 94V-0 flame retardant housing

Anti-falling screws

Optional wire protection

1~12 poles, dividable as requested

Maximum wiring capacity of 16 mm2

30 amp Terminal Blocks,high quality Barrier Terminal Connector,high performance Polypropylene Terminal Block,Polyamide66 Terminal Blocks,BELEKS T16 series connector terminal

Jiangmen Krealux Electrical Appliances Co.,Ltd. , https://www.krealux-online.com