Posts

Special Algorithms

Image
Special Algorithms Mel Frequency Cepstral Coefficients (MFCC)     The ultimate goal of using MFCC in our project is to transform the audio signal into vector form to train the AI model to predict the words corresponding to a string of phonemes in the audio signal. The primary concern in our proposed system regarding the discourse is that the sounds produced by a human are sifted by the state of the vocal tract including tongue, teeth and so on. This shape determines what sound turns out. In the event that we can decide the shape precisely, this should give us an exact portrayal of the phoneme being created. These phonemes are utilized to distinguish the words in a programmed discourse recognizer.Through this method most of the clear audio inputs can be analysed considering that their rate of noise is minimal. Hidden Markov Model (HMM)    For the speech recognition module of the project, HMM is used for the purpose of speech-tagging. When co...

Methodology

Image
Data collection Firstly we did a problem identification survey via Google Form. Then data collection began for our project to create the models to detect the offensive content in Sinhala, Tamil and English languages. Detecting profanity in audio signals for English was convenient, owing to the presence of sample tools and datasets for English. However, since Sinhala and Tamil languages are rather regional, not many technologies are supported in terms of the specified languages. Therefore, the project demanded most of the NLP components to be created from scratch or to employ alternative approaches to tackle the issue. Since our project concerns Natural Language Processing, social media posts and comments were chosen as the source of raw data, since there was a requirement of data with the presence of heavy colloquial language. This ensured the detection of profanity, regardless of whether the audio file contains formal speech or casual speech form. A list of offensive keywords wa...

System Work Flow

Image
Our system is a software component as a whole, which can be further divided into four sub-components namely, Digital Signal Processing Component, Speech Recognition Component, Natural Language Processing Component and Audio Replacing component. All these high-level components of our system work together in order to achieve the final objective of our system, which is to automatically detect and replace objectionable content in audio clips of local languages. This major process can be further divided into several sub-processes. Those sub-processes can be identified through our system's main workflow, which is depicted in Fig. 1. Figure 1:  Main workflow of the system a) Conversion of input audio files of other audio formats into .WAV (Windows Wave) format - In order to proceed ahead with our system, the input audio files should be in .WAV format. Yet, the inputs that the users need to process might be in various formats such as .MP3 (Moving Picture Expert Group Layer...

High level implementation components of the system.

Image
The main workflow of our system is maintained through four inter-dependent sub-modules namely, Digital Signal Processing (DSP) module, Speech Recognition Module, Natural Language Processing (NLP) module, and the Audio Replacing Module.  A. DSP Module The article about the full description and the progress of the Digital Signal processing of our system can be viewed through this link .  At a glance, the DSP module of the system accepts the user input audio, converts it into a .WAV (Windows Wave) file if it is in some other audio format, and then performs the noise reduction process. To accomplish these tasks, High Pass Filter and the Low Pass Filter are used, with their cut-off frequency ranges set considering the fundamental human voice frequency range and the results obtained through testing out various voice samples with varying cut off frequency ranges. After the noise reduction process, the cleaned audio samples are amplified through the DSP module as its fin...