Sunday, June 14, 2009

Pornographic Classifier

According to a BBC article, China's new "Green Dam Youth Escort" software, intended to serve as a filter for offensive material (such as pornography), has some classification errors. In particular, it was reported that
I went on the internet to check out some animal photos. A lovely little naked pig was sent onto the black list. Pitiful little pig!," read one comment.

"I was curious, so I looked up some photos of naked African women. Oh, they were not censored!

which is amusing to say the least.

The comments do suggest that a key component of their pornographic classifier involves color, specifically the proportion of skin-colored pixels, though this approach apparently has its weaknesses.

Having some experience with image processing as well as classification, there are a few other possible approaches to constructing a pornographic classifier. Rather than using only color information, it might be advisable to perform object recognition to search for a few key areas of interest, i.e. exposed private parts. A combination of shape (since the outlines of some private parts are considerably unique) and color information should be able to identify these body areas with some effectiveness; alternatively, standard object recognition techniques can be used, though these may be more suitable for the specific rather than the general.

Another approach (which is simpler and less accurate) may be to use the color information to generate an outline of the human frame (by segmentation), and to then see how many colors are contained in the frame. The theory is that many colors indicates the presence of clothes; the advantage is that one is unconcerned with the actual skin color itself. Of course, the issue is that whether the segmented region is human- this in itself is a worthy sub-problem, though one workaround is to identify a head region, and if there is none it is not a human.

Generally, though, the problem of creating a classifier to identify pornographic images is considerably difficult. The above ideas may be suitable for images with only one human subject, but the classification task can be made much more difficult by other complications. Such complications include multiple subjects and non-realistic images (i.e. computer generated images or drawings). In fact, circumventing such pornographic classifiers may be extremely simple indeed.

No comments: