Musings of an Aspiring Polymath: Graphics Classifier

Showing posts with label Graphics Classifier. Show all posts

Sunday, June 14, 2009

Pornographic Classifier

According to a BBC article, China's new "Green Dam Youth Escort" software, intended to serve as a filter for offensive material (such as pornography), has some classification errors. In particular, it was reported that

I went on the internet to check out some animal photos. A lovely little naked pig was sent onto the black list. Pitiful little pig!," read one comment.
"I was curious, so I looked up some photos of naked African women. Oh, they were not censored!

which is amusing to say the least.

The comments do suggest that a key component of their pornographic classifier involves color, specifically the proportion of skin-colored pixels, though this approach apparently has its weaknesses.

Having some experience with image processing as well as classification, there are a few other possible approaches to constructing a pornographic classifier. Rather than using only color information, it might be advisable to perform object recognition to search for a few key areas of interest, i.e. exposed private parts. A combination of shape (since the outlines of some private parts are considerably unique) and color information should be able to identify these body areas with some effectiveness; alternatively, standard object recognition techniques can be used, though these may be more suitable for the specific rather than the general.

Another approach (which is simpler and less accurate) may be to use the color information to generate an outline of the human frame (by segmentation), and to then see how many colors are contained in the frame. The theory is that many colors indicates the presence of clothes; the advantage is that one is unconcerned with the actual skin color itself. Of course, the issue is that whether the segmented region is human- this in itself is a worthy sub-problem, though one workaround is to identify a head region, and if there is none it is not a human.

Generally, though, the problem of creating a classifier to identify pornographic images is considerably difficult. The above ideas may be suitable for images with only one human subject, but the classification task can be made much more difficult by other complications. Such complications include multiple subjects and non-realistic images (i.e. computer generated images or drawings). In fact, circumventing such pornographic classifiers may be extremely simple indeed.

Friday, April 25, 2008

Secret Research Project 1: Interesting Facts

Now that the research report has been completed and submitted, I have some time to share some interesting facts about the research.

When people ask what I do for my secret research, I say I download images of apples, bananas, irons, scientific calculators, suitcases, crayons, dustbins, pears, plastic bottles, tissue boxes, forks, spoons, laptops, clocks, pillows and blenders.
I now possess a large collection of images of apples, bananas, irons, scientific calculators, suitcases, crayons, dustbins, pears, plastic bottles, tissue boxes, forks, spoons, laptops, clocks, pillows and blenders. Much time was spent to collect and download these images.
For the preliminary versions of the classifier, the classifier achieved a classification rate of 70%. Coincidentally, the training set consisted of 70% natural images and 30% synthetic images.
For the uninformed, point 3 was a consequence of the classifier labeling everything as a natural image, hence achieving a 70% performance. This approach is excellent. By increasing the set of natural images to 99%, the performance can be boosted to 99% !
Point 4 was a joke.
The trusty Microsoft Paint was used to perform many tasks for the project. These tasks include converting GIF, PNG, and BMP images into JPEG format, and creating diagrams for the final report.
Two computers were used for the project. The project could be run exclusively on either the desktop or the laptop computer, but using two computers greatly increase rate of work.
The increase in rate of work was not due to being able to run multiple simulations simultaneously. Rather, the simulations were mostly done on one computer (the laptop), while the other (the desktop) was used for net research and report writing. Useful work was done on the desktop while the MATLAB program ran on the laptop.
Using two computers was also cool for report writing. All the data and related papers were displayed on the laptop screen, while the desktop ran only Word. Information could be directly read from the laptop and entered into the report without ALT-TABBING and changing windows constantly.
I want a dual monitor setup after learning the advantages of point 9.
I want a dual core system to be able to run my (hypothetical) dual monitor system properly without lagging. This is also to make running simulations less of a pain.
MATLAB should perform some checks on code integrity before running. Many times, MATLAB would return an error after it had run much of the simulation processing. The error was a simple formatting error located at the last few lines of the code.

More information on the image classifier will be released if anyone is interested.

Thursday, April 24, 2008

Secret Research Project 1: Quick Update

I'm just dropping by to release some of the latest results for the graphics classifier.

Classifier	SR	NR	AR	Time Taken
C1	85.44 %	69.86 %	77.65 %	37.4 s
C2	87.67%	72.69 %	80.14 %	38.2 s
C3	86.95 %	74.51 %	80.73 %	38.5 s
C4	86.00 %	76.21 %	81.11 %	38.4 s

Notes:
SR, NR = Recall rates for synthetic and natural images. AR = average recall rate.
C1~C4 are classifiers employing some set of metrics.

I'm going to sleep soon. The only reason I'm awake at this hour is to finish my research report, which is now at 37 pages. I'm still missing the discussion and conclusion chapters, as well as a third of the introduction. Work also has to be done on the formatting of the report.

I'm hoping I can mop up the remaining work by noon ~~tomorrow~~ later, since I still have another secret research report to complete by Friday.

I'll release more information and results on the Synthetic and Natural Image Classifier (that's the official name of the thingy) when I'm extremely free. That would be next Tuesday.

[25/04/08] Errors in the calculations were found and corrected.

Monday, April 14, 2008

Secret Research Project 1: Graphics Classifier

One of the 'secret research projects' I'm working on currently is a graphics classifier. Basically, the aim of the research is to build a classifier capable of differentiating between a graphics image and a realistic image.

To be clearer, a graphics image is an image which is artificial. This type of image is not directly captured from the physical environment. Hence, drawings, paintings, cartoons and clipart are considered to be graphics images.

On the other hand, realistic images are images which are directly captured from the physical environment. In other words, these are photographs of real objects.

A sample of a realistic image and a graphics image is shown below.

Left : Graphics Image
Below: Realistic Image

I'm now using simple metrics to metrify the images. According to intuition, graphics and realistic images will have different metrics, hence enabling me to clasify them. Some metrics that I have adopted, and the assumptions behind them, are:

Saturation Metric: Graphics, especially computer generated ones, tend to have higher saturation values compared to photographs, which are invariably less saturated (more faded/dull) due to natural effects.

Number of Colors Metric: Graphics tend to be composed of a small palette of colors compared to realistic images, which tend to occupy much of the spectrum.

I've also used a number of other metrics, but I'll share those at a later point of time. In any case, the effectiveness of the existing metrics are reasonable, but not spectacular, achieving only about a 60~70% correct classification rate.

However, by combining the different metrics into a single classifier system (aka boosting), I expect the performance of the system to improve.

More updates on the secret project will come later.