Context (or why my opinion matters at all): I regularly review for the ML conferences + controls or transportation journals and have won an outstanding reviewer reward (maybe more than one? I can't remember). I’d say I’m in the middle for how much I review in terms of community norms of how much reviewing should be done.
In lieu of complaining on twitter, I figured I’d try to write up my mental model of what I try to accomplish when I review a paper and the standards I have for what should be accepted at any tier of conference. At a basic level, my goal is to accept all papers that are correct, would be useful to at least one other researcher, and are on an appropriate topic for the conference. Note that `correctness` is a loose term and could also include ideas like being ethical, contain open-source code or data, etc. As additional benefits to the authors I want to:
- suggest ways that the writing could be improved since the authors are often too in the weeds to remember what is and isn’t clear
- suggest additional experiments that would strengthen the argument the paper is making. However, if these additional experiments are required to prove correctness of the results rather than simply make the paper better, I think the paper should just be rejected.
Note that the basic goal I set is significantly simpler than what is often set as the standard and makes no mention of novelty. If it’s true and someone is likely to build on it (i.e. it is useful), I think it should be accepted.
This model of reviewing, if made universal, would basically turn conferences into a rubber-stamp check of correctness instead of an attempt to measure the novelty and significance of results. I think that this would be a significantly more important role for conferences to play since a reader can easily scan arxiv abstracts and pick out novel or interesting papers to read but it’s significantly more work for them to figure out if a paper appears to be correct (even more true for newer students who are more easily fooled). Below I address a few obvious criticisms of this model.
Top-tier conferences should have a novelty criterion. Otherwise, what makes them top-tier?
Yes, under this model of reviewing, top-tier conferences are less distinguishable. However, let me propose a different view of what makes things top-tier? What about a top-tier conference being one where you could have high-confidence that results were thoroughly vetted, well-written and clear, highly likely to be true, etc. My guess is that top-quality work is even more strongly correlated with being feasible to check than it is correlated with perceived novelty / significance by reviewers. Of course, I can’t prove this, but this blog post by Neil D. Lawrence appears to suggest that the correlation between reviewer perceived quality and future citations is quite weak. If reviewers can barely estimate paper quality, why bother? Just check correctness.
Accepting all papers that are correct and useful to at least one other researcher would blow up the size of the conferences
You and I are clearly not reviewing the same distribution of papers. Okay, but lets steelman this and say that this doubles the amount of accepted papers. If there’s this much good and correct work than we should either be expanding the size of conferences to make them more inclusive of all the new, good researchers or, to my preferences, splitting into more small, focused conferences.
Correctness and useful to one researcher is too low a bar for a paper. It’ll lead to a deluge of published papers.
I think that if a result is true and useful to at least another person, then the research community is benefiting. Yes, the amount of papers being published right now is insane but this speaks to a need for better filtering and searching mechanisms rather than suggesting a restriction on the amount of published papers.