High level implementation components of the system.



The main workflow of our system is maintained through four inter-dependent sub-modules namely, Digital Signal Processing (DSP) module, Speech Recognition Module, Natural Language Processing (NLP) module, and the Audio Replacing Module. 

A. DSP Module

The article about the full description and the progress of the Digital Signal processing of our system can be viewed through this link

At a glance, the DSP module of the system accepts the user input audio, converts it into a .WAV (Windows Wave) file if it is in some other audio format, and then performs the noise reduction process. To accomplish these tasks, High Pass Filter and the Low Pass Filter are used, with their cut-off frequency ranges set considering the fundamental human voice frequency range and the results obtained through testing out various voice samples with varying cut off frequency ranges. After the noise reduction process, the cleaned audio samples are amplified through the DSP module as its final task.

B. Speech Recognition Module

Since we are classifying the objectionable content in textual form, the use of speech recognition is required to convert the audio into text format. We used the Google Speech Recognizer to complete this task accurately and efficiently for all the three languages, Sinhala, Tamil, and English.

C. NLP Module

In order to validate the contextual offensive nature, a preliminary filtering model was created which takes the converted sentences as input and classifies whether they are simply offensive or non-offensive through a binary classification. If the text is classified as offensive, then secondary filtering is carried out with a separate multi-class text classification model which classifies each word in the sentence into sexist, racist, cursing, and non-offensive categories. The models in preliminary filtering involve the Term Frequency–Inverse Document Frequency (TF-IDF) vectorizer and Support vector Machine (SVM) algorithm with varying hyperparameters. This conclusion for the binary model was arrived at considering that SVM performs well at text classification.

If the text is classified as offensive, then secondary filtering is carried out with a separate multi-class text classification model which classifies each word in the sentence into sexist, racist, cursing, and non-offensive categories. The secondary filtering models were trained after several comparisons with Naive Bayes, Decision Tree, and other classification algorithms along with Countvectorizer and TF-IDF vectorizer. Considering the results with the pipelined comparisons, different classifiers were chosen accordingly.

D. Audio Replacing Module

This process is started with the first identified objectionable word. The component gets the start time of that specific word and splits the audio into two parts. Next, the component gets the end time of the first identified objectionable word and accordingly splits the second part of the audio. Then it stores the tail part of the 2nd portion of the audio. Likewise, the same method applies to all identified objectionable words and finally, the system gets a series of audio clips stored in a folder. At last, the split audio clips are merged based on the user requirements, using a beep, silence or recorded audio.



Comments

Popular posts from this blog

Conclusion | Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale

User Interface design

System Evaluation