Applying Supervised Machine Learning to the Fight against Online Child Sexual Abuse
Some reports of NGOs and anecdotal evidence suggest that child abuse materials (CAMs) share some characteristics extensively. They mostly take place indoor settings, victims' face or genitalia is visible and there are few visual clues about the abuser(s).
For the known CAMs, there are methods and projects to detect and remove them automatically such as PhotoDNA of Microsoft Inc. and Baseline project of Interpol. However, detection of new CAMs still heavily relies on outdated and inefficient methods such as user reporting of ordinary people.
By coding specific attributes of every known CAM such as indoor/outdoor setting, on bed/couch/table, and visibility of face/breast/genitalia throughly, supervised machine learning can be used in the detection of new CAMs. As the number of coded known CAMs increases, the algorithm would be better at determining the features of known CAMs due to mentioned similarities of them. Since the databases of Interpol and NCMEC have millions of known CAMs, intervention of supervisors would decrease. Naturally, the algorithm also might be helpful to detect new CAMs to some extent after the inclusion of all known CAMs. Besides faster victim identifications, this would also decrease the spreading rate of new CAMs throughout the internet remarkably if it is successful.