Wednesday, July 09, 2008

Good-Enough Random Number Generators from Complex Systems

Given a sufficiently complex system, it might be possible to create a good-enough random number generator.

Of course, the definition of a complex system is lacking. In my usage, a complex system is one where it is extremely difficult to predict the next state of the system, as there are too many variables to consider. Such systems tend to exist in the real world, where there are many chaotic factors.

In particular, I believe that something that is inadvertently created by many users is considerably complex. For example, the number of web users online, or the number of students on campus at any moment, are both figures that are complex and difficult to predict.

One possible way to implement a good-enough random number generator from a complex system is as follows:

Using a number (for example, three) of seed words, google the seed words. Access the first document of the results, and analyze the document. Use the total number of words in the document, modulo by a small base, as the random number.

For repeated number generation, take the fourth, first, and fifth most frequent terms of the document as seed words for the next query, following the same procedure for repeated searches.

While it is possible that the same seed words would appear after some number of searches, the dynamic nature of the web (documents continuously being created and destroyed), the results would vary from search to search. Furthermore, the Google server being used might vary from time to time too.

Such a random number generator would be difficult to analyze and to predict the results of. However, it might be possible to control the results if the adversary were to be able to control the first seed words. For example, a search for a set of hapaxes would lead to a single document result, which could be planted by the adversary. Subsequent results could then be controlled in the same manner.

It would be interesting, though, to analyse the statistical qualities of such a random number generator. If any patterns were discovered, it could be indicative of a sort of pattern in online documents/search engines.

PS: The numbers in the random number generator are actually "nothing up my sleeve" numbers. They are the digits of PI. In base ten, of course.

No comments: