The NIPS consistency experiment was an amazing, courageous move by the organizers this year to quantify the randomness in the review process. They split the program committee down the middle, effectively forming two independent program committees. Most submitted papers were assigned to a single side, but 10% of submissions 166 were reviewed by both halves of the committee. This let them observe how consistent the two committees were on which papers to accept. For fairness, they ultimately accepted any paper that was accepted by either committee.
The results were revealed this week: of the 166 papers, the two committees disagreed on the fates of 25.9% of them: 43. But this “25%” number is misleading, and most people I’ve talked to have misunderstood it: it actually means that the two committees disagreed more than they agreed on which papers to accept.