Boing Boing points to a description of an unconventional filing system proposed by Japenese economist Noguchi Yukio:
Algorithms folks should immediately recognize this as the Move To Front heuristic for dynamic list maintenance: After accessing any item in a linked list, move it to the front of the list. The cost of accessing an element of the list is equal to its rank in the list. So if you access the same item twice in a row, the second access has cost 1.
Moreover, assuming your file drawer has a fixed size, the Noguchi Filing System is also the same as the Least Recently Used paging algorithm buried inside every operating system on the planet: When your memory (or filing cabinet) overflows, throw away the page (or file) that you accessed least recently. The cost accessing a page is 0 if the page is in memory, and 1 if it isn't.
These are both classical examples of competitive online algorithms. An online algorithm is an algorithm that responds to a sequence of requests—for example, "Put this file in the drawer", or "Get my passport out of the drawer". The algorithm must handle each request as soon as it arrives. There is some cost associated with the algorithm's behavior—for example, the number of files that we examine. At the end of the day, we will most likely look back and realize that the algorithm made some sub-optimal decisions that led to extra work, but such wasted effort is inevitable. The algorithm can't look into the future; it has to handle each request immediately. Our goal in designing a good online algorithm is to minimize the amount of wasted effort.
This goal can be formalized as follows. Let A(X) denote the cost of online algorithm A on request sequence X, and let OPT(X) be the cost of the best possible offline algorithm, meaning an algorithm that knows the entire request sequence in advance. The competitive ratio of algorithm A is the maximum, over all request sequences X, of the ratio A(X)/OPT(X). If the competitive ratio is R, we call the algorithm R-competitive. We want this ratio to be as small as possible.
My poor departed high school English teachers are spinning in their graves at the ugly phrase "competitive ratio". After all, it's not the ratio that's competitive, but the algorithm. It really should have been called "competitiveness". But once the term was introduced, it stuck. Alas, we theoretical computer scientists aren't known for our L33t gr@mm3r 5ki11z.
In fact, both MTF and LRU are motivating examples in the paper of Danny Sleator and Bob Tarjan [pdf] that first introduced competitive analysis. Sleator and Tarjan proved the following theorems, among others:
- Move to Front is 2-competitive, and this is the best possible competitive ratio of any (deterministic) online list-access algorithm.
- Least Recently Used is M-competitive, where M is the number of pages that fit into memory, and this is the best possible competitive ratio of any (deterministic) online paging algorithm.
So no wonder the Noguchi Filing System is so attractive—in a very real sense, it's the best possible filing system!
Well, almost. We can do slightly better, at least on average, if we introduce some randomness. The following list-access algorithm, called BIT, is 7/4-competitive. Whenever you access a file that you've accessed before, move it to the front of the list. But the first time you access a file, flip a coin, and move the file to the front of the list only if the coin comes up heads.
Even this isn't the best algorithm known; another slightly more complicated algorithm is 8/5-competitive. On the other hand, the best lower bound known for the competitive ratio of randomized list-access algorithms is 1.50084, so there is still lots of room for improvement.
I must have known intuitively about random algorithms for a long time, since my filing system has a lot of randomness to it.
Kidding aside, here's what I do at the office.
1) New piece of paper comes in.
2) I will just drop it into one of the k piles I've got on my desk where k \approx 3.
3) When I search something out, I simply look first if I see it on top of one of the k piles, if not, I start browsing each one of the piles.
4) Once done with the paper, go back to 1.
Intuitively, having k piles is better than having 1. Interestingly, that's what introduces randomness into my sorting.
I'm too lazy this morning to do an analysis of this algorithm, but if someone does, please drop me a line.
Posted by: Daniel Lemire | October 14, 2005 at 09:37 AM
This is a 1993 invention? Hmmm, I think that Noguchi-san might have taken this idea from a programmer. Japanese Word Processors allow you to type in hiragana, but when you hit the space key to signal the end of the word, all the kanji that correspond to the phonetic arrangement you just typed pop up in a little menu. The kanji are numbered, and when you choose one, it pops to the #1 on the list, so the next time you type that combination of syllables, it's your first choice (the default setting are by general language frequency of use). It's not out of the question that a writer who saw this applicaiton of the algorithm appear before him on a daily basis would try to apply it somewhere else. Good observational skills, though.
Posted by: John | October 14, 2005 at 11:59 AM