Select random line from a file in a single pass
I was asked this in an interview a few years back and I though it was kind of neat. The question is how do you select a single line from a file randomly, such that each line has a equal probability of being selected. The naive solution is of course to count the number of lines in the file, and then to select a line and get it from the file. But can you do it in a single pass, without seeking through the file?
The solution is kind of neat, have a buffer to store a “selected” line from the file. Then you read each line in turn, you then randomly swap your “selected” line with the new line with decreasing probability. The probability decreases to compensate for the fact that previous lines are more likely to have been replaced already.
Here’s a C++ solution:
string selected_line; for(size_t linenumber=0;!file.eof();linenumber++) { string s = file.getline(); int64_t r = rand64(); r = r%linenumber; if(r==0) { selected_line = s; } } cout << selected_line << endl;
You can read more about this here.