Musings of an Aspiring Polymath: Evolutionary Algorithms

Sunday, May 17, 2009

Secret Research Project 4: N Queens Problem with GA

Over the last couple of days I had an idea and nothing to do, so I implemented a genetic algorithm search for the N Queens problem. The N Queens problem is a toy problem in chess, where you are given N queens and a N x N chessboard, and where you are supposed to place the queens such that they do not attack each other.

The idea behind the GA search is simple; each individual in the population represents one configuration of the board. Better solutions can breed, thus (hopefully) conserving better positions. The details of the GA I used are as in the standard literature, employing uniform crossover and mutation with no elitism. The fitness function employed is the number of non-mutually-attacking queen pairs.

I tested the performance of the GA search algorithm on the N queen problem for N=4,6,8,10,12. Each problem was run 20 times, resulting in the following results:

N	4	6	8	10	12
Average Generations	2.0	184.5	195.4	838.0	396.2
Median Generations	1	19	38.5	320.5	295.5
Average* Generations	1.6	95.2	75.2	521.9	349.4
Solution Density	128	11664	1.8E6	1.4E7	6.3E8
Efficiency	0.82	1.22	24.3	264	17968

*Average after discarding the 2 highest and lowest extremes.
Solution Density is the number of board configurations divided by number of distinct solutions; hence representing the number of random searches before a solution is found.
Efficiency is the solution density divided by average number of GA searches.

The results of the GA search are not too bad, though I was expecting better performance. However, the scaling seems a bit haphazard; I am unable to explain why N = 12 runs better than N = 10. Another point which is unreflected in this report is that for N = 15 and beyond, this GA search will require considerable time to run. More improvements are necessary.

Lastly, here are some of the solutions found by the GA search. A solution each for the N = 8 and N = 12 cases are shown.

N = 8 Solution

N = 12 Solution

Sunday, November 30, 2008

Secret Reseach Project 3: Authorship Attributor

During the term, I developed a program that was able to attribute a document to its correct author. In other words, given a sample document, the trained program would be able to indicate who wrote it.

The theoretical basis for the authorship attributor is that different authors have different active vocabularies and different word preferences. Hence, this difference can be exploited to differentiate between writings of different authors, as long as enough of their prior work is known.

The program employed genetic algorithms to evolve simple rule-based classifier systems. The final performance on the test set, the Federalist Papers, was generally good, although slightly inferior to existing methods.

P	L	M	C	Average % Correctly Classified	Average No. of Active Rules
100	25	100	5	91.67	4
50	25	100	5	78.33	5
100	10	100	5	88.33	2
100	25	10	5	73.33	11.2
100	25	100	25	83.33	4.6
100	25	10	25	78.33	7.6

Table: Performance of Classifiers for Given GA Parameters

The key weakness of my proposed approach is that is too simple, employing only a voting framework of rules. Furthermore, the individual rules consider only the relative frequencies of pairs of word, which is an approach that is not generalizable. In other words, while the existing approach may be useful for a pairwise classifier, it is likely to be useless for developing a universal attributor capable of differentiating between any number of candidate authors.

Still, in terms of potential, I believe that this particular research project has more potential for improvement than my previous research projects. In particular, the rule combining system can be improved from the voting framework currently used. Furthermore, the rules used can be made more general versions with little modification. Additional criteria could also be included.

I hope to revisit this problem at a later point in time.