Conclusion | Automatic Audio Replacement of Objectionable Content for Sri Lankan Locale






When considering Sri Lanka, the government had to temporarily ban a few social media platforms several times from 2018 to 2019 to control the violence between the communities which was accelerating via hate speech and racist fake news spreading through social media. Yet, there is no proper mechanism to specifically filter and replace objectionable content in audio for Sri Lankan locale.

In addressing this issue, we proposed a system for Sri Lankan locale, which automatically detects and replaces objectionable content in audio. In this proposed system, to selectively filter out the potentially objectionable audio content, the input audio is first preprocessed and converted into text format. The presence of racist, cursing, and sexist objectionable content are detected along with their corresponding locations and timestamps through the filtering mechanism. Afterwards, the detected objectionable content is seamlessly replaced with predetermined audio input.

The model was tested against a Tamil dataset of 1720, and a Sinhala dataset of 3950. Our proposed system can be seen as a complementary attempt to alleviate the problem of profanities in media for the Sri Lankan scenario, with its accuracy of testing results for the Sinhala language at 89%, and its accuracy for testing results of Tamil language at 77%. Hence, this system might not be the perfect model for this purpose, and moreover, it has few limitations at the present.

The main limitation which our software has is the requirement of an internet connection to run successfully. However, with proper incorporation of customized Speech-to-Text converter models for the Sri Lankan languages, the prerequisite of internet connection can be rectified. Apart from that, the data corpora collection and preprocessing for Sinhala and Tamil proved to be demanding and due to the insufficient data corpus in Tamil, the performance of the model was comparatively less.

As future work, we are aiming to create a voice cloning model which can enable the replacement of detected objectionable content with a replica of the voice of the speaker. We are also expecting to expand the system for real-time purposes and for all media types. Apart from that, we plan to develop this product as a plugin, so that it can be implemented in social media platforms to automatically detect and replace objectionable audio content in the video clips that are shared around.

Yet, given that human moderators cannot monitor the large number of audio files spread across the country and due to the lack of mechanisms for automatic audio replacement of objectionable content for Sri Lankan locale, we believe that this attempt represents a compatible solution for the identified problem.

Comments

Popular posts from this blog

System Evaluation

High level implementation components of the system.