System Work Flow



Our system is a software component as a whole, which can be further divided into four sub-components namely, Digital Signal Processing Component, Speech Recognition Component, Natural Language Processing Component and Audio Replacing component. All these high-level components of our system work together in order to achieve the final objective of our system, which is to automatically detect and replace objectionable content in audio clips of local languages. This major process can be further divided into several sub-processes. Those sub-processes can be identified through our system's main workflow, which is depicted in Fig. 1.



Figure 1:  Main workflow of the system

a) Conversion of input audio files of other audio formats into .WAV (Windows Wave) format

- In order to proceed ahead with our system, the input audio files should be in .WAV format. Yet, the inputs that the users need to process might be in various formats such as .MP3 (Moving Picture Expert Group Layer-3 Audio), .MPA, .WMA, .MP4 etc. Considering all these situations, and making it easier for the users to use the application, our system is developed in a way that it accepts all kinds of audio formats. Our system has the ability to convert any of those audio formats into .WAV format. Furthermore, our system even has the ability to accept .MP4 files, separate its audio portion and then convert into a .WAV format to proceed ahead.

b) Pre-processing of audio files

- The audio inputs of users might be recorded under various environmental conditions and may be in different qualities. But in order to proceed ahead through the system and to get accurate results, the audios should be clean. Hence the input audios are first filtered through the system to make it noise-free and then amplified.

c) Clipping of audio

- The noise-free audio is then clipped. Through this process, the words are separated and saved as separate audio files.

d) Speech to Text Conversion

- The objectionable content detection in our system is mainly based on text. Hence, all audio inputs of all three languages, Sinhala, Tamil and English, are converted into text. For this purpose, Google Speech Recognizer is used.

e) Profanity Detection

- Profanity detection is performed using the converted text under three categories, namely, racist, sexist, and cursing for all the three languages, Sinhala, Tamil, and English.

f) Replacements for profanity detected

- The detected objectionable content of all three languages is replaced with a predetermined audio clip through the audio replacement component of the software.

The main workflow of our system is maintained through four inter-dependent sub-modules namely, Digital Signal Processing (DSP) module, Speech Recognition Module, Natural Language Processing (NLP) module, and the Audio Replacing Module. The article that covers the about these modules can be viewed through this link

Comments

Popular posts from this blog

Conclusion | Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale

User Interface design

System Evaluation