Review Form

Paper Summary

Describe what this paper is about. This should help action editors and area chairs to understand the topic of the work and highlight any possible misunderstandings.

Summary of Strengths

What are the major reasons to publish this paper at a selective *ACL venue? These could include novel and useful methodology, insightful empirical results or theoretical analysis, clear organization of related literature, or any other reason why interested readers of *ACL papers may find the paper useful.

Summary of Weaknesses

What are the concerns that you have about the paper that would cause you to favor prioritizing other high-quality papers that are also under consideration for publication? These could include concerns about correctness of the results or argumentation, limited perceived impact of the methods or findings (note that impact can be significant both in broad or in narrow sub-fields), lack of clarity in exposition, or any other reason why interested readers of *ACL papers may gain less from this paper than they would from other papers under consideration. Where possible, please number your concerns so authors may respond to them individually.

Comments/Suggestions/Typos

If you have any comments to the authors about how they may improve their paper, other than addressing the concerns above, please list them here.

Reviewer Confidence

5 = Positive that my evaluation is correct. I read the paper very carefully and am familiar with related work.
4 = Quite sure. I tried to check the important points carefully. It’s unlikely, though conceivable, that I missed something that should affect my ratings.
3 = Pretty sure, but there’s a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper’s details, e.g., the math or experimental design.
2 = Willing to defend my evaluation, but it is fairly likely that I missed some details, didn’t understand some central points, or can’t be sure about the novelty of the work.
1 = Not my area, or paper is very hard to understand. My evaluation is just an educated guess.

Soundness

Given that this is a short/long paper, is it sufficiently sound and thorough? Does it clearly state scientific claims and provide adequate support for them? For experimental papers: consider the depth and/or breadth of the research questions investigated, technical soundness of experiments, methodological validity of evaluation. For position papers, surveys: consider whether the current state of the field is adequately represented and main counter-arguments acknowledged. For resource papers: consider the data collection methodology, resulting data & the difference from existing resources are described in sufficient detail.

5 = Excellent: This study is one of the most thorough I have seen, given its type.
4.5
4 = Strong: This study provides sufficient support for all of its claims. Some extra experiments could be nice, but not essential.
3.5
3 = Acceptable: This study provides sufficient support for its main claims. Some minor points may need extra support or details.
2.5
2 = Poor: Some of the main claims are not sufficiently supported. There are major technical/methodological problems.
1.5
1 = Major Issues: This study is not yet sufficiently thorough to warrant publication or is not relevant to ACL.

Excitement

How exciting is this paper for you? Excitement is subjective, and does not necessarily follow what is popular in the field. We may perceive papers as transformational/innovative/surprising, e.g. because they present conceptual breakthroughs or evidence challenging common assumptions/methods/datasets/metrics. We may be excited about the possible impact of the paper on some community (not necessarily large or our own), e.g. lowering barriers, reducing costs, enabling new applications. We may be excited for papers that are relevant, inspiring, or useful for our own research. These factors may combine in different ways for different reviewers.

5 = Highly Exciting: I would recommend this paper to others and/or attend its presentation in a conference.
4.5
4 = Exciting: I would mention this paper to others and/or make an effort to attend its presentation in a conference.
3.5
3 = Interesting: I might mention some points of this paper to others and/or attend its presentation in a conference if there’s time.
2.5
2 = Potentially Interesting: this paper does not resonate with me, but it might with others in the *ACL community.
1.5
1 = Not Exciting: this paper does not resonate with me, and I don’t think it would with others in the *ACL community (e.g. it is in no way related to computational processing of language).

Overall Assessment

If this paper was committed to an *ACL conference, do you believe it should be accepted? If you recommend conference, Findings and or even award consideration, you can still suggest minor revisions (e.g. typos, non-core missing refs, etc.).

Outstanding papers should be either fascinating, controversial, surprising, impressive, or potentially field-changing. Awards will be decided based on the camera-ready version of the paper. ACL award policy: https://www.aclweb.org/adminwiki/index.php/ACL_Conference_Awards_Policy

Main vs Findings papers: the main criteria for Findings are soundness and reproducibility. Conference recommendations may also consider novelty, impact and other factors.

5 = Consider for Award: I think this paper could be considered for an outstanding paper award at an *ACL conference (up to top 2.5% papers).
4.5 = Borderline Award
4 = Conference: I think this paper could be accepted to an *ACL conference.
3.5 = Borderline Conference
3 = Findings: I think this paper could be accepted to the Findings of the ACL.
2.5 = Borderline Findings
2 = Resubmit next cycle: I think this paper needs substantial revisions that can be completed by the next ARR cycle.
1.5 = Resubmit after next cycle: I think this paper needs substantial revisions that cannot be completed by the next ARR cycle.
1 = Do not resubmit: This paper has to be fully redone, or it is not relevant to the *ACL community (e.g. it is in no way related to computational processing of language).

Best paper justification

If your overall assessment for this paper is either ‘Consider for award’ or ‘Borderline award’, please briefly describe why.

Further Questions

Limitations and Societal Impact

Have the authors adequately discussed the limitations and potential positive and negative societal impacts of their work? If not, please include constructive suggestions for improvement. Authors should be rewarded rather than punished for being up front about the limitations of their work and any potential negative societal impact. You are encouraged to think through whether any critical points are missing and provide these as feedback for the authors. Consider, for example, cases of exclusion of user groups, overgeneralization of findings, unfair impacts on traditionally marginalized populations, bias confirmation, under- and overexposure of languages or approaches, and dual use (see Hovy and Spruit, 2016, for examples of those). Consider who benefits from the technology if it is functioning as intended, as well as who might be harmed, and how. Consider the failure modes, and in case of failure, who might be harmed and how.

Ethical Concerns

Please review the ACL code of ethics (https://www.aclweb.org/portal/content/acl-code-ethics) and the ARR checklist submitted by the authors in the submission form. If there are ethical issues with this paper, please describe them and the extent to which they have been acknowledged or addressed by the authors.

Needs Ethics Review

Should this paper be sent for an in-depth ethics review? We have a small ethics committee that can specially review very challenging papers when it comes to ethical issues. If this seems to be such a paper, then please explain why here, and we will try to ensure that it receives a separate review.

Reproducibility

Is there enough information in this paper for a reader to reproduce the main results, use results presented in this paper in future work (e.g., as a baseline), or build upon this work?

5 = They could easily reproduce the results.
4 = They could mostly reproduce the results, but there may be some variation because of sample variance or minor variations in their interpretation of the protocol or method.
3 = They could reproduce the results with some difficulty. The settings of parameters are underspecified or subjectively determined, and/or the training/evaluation data are not widely available.
2 = They would be hard pressed to reproduce the results: The contribution depends on data that are simply not available outside the author’s institution or consortium and/or not enough details are provided.
1 = They would not be able to reproduce the results here no matter how hard they tried.

Datasets

If the authors state (in anonymous fashion) that datasets will be released, how valuable will they be to others?

5 = Enabling: The newly released datasets should affect other people’s choice of research or development projects to undertake.
4 = Useful: I would recommend the new datasets to other researchers or developers for their ongoing work.
3 = Potentially useful: Someone might find the new datasets useful for their work.
2 = Documentary: The new datasets will be useful to study or replicate the reported research, although for other purposes they may have limited interest or limited usability. (Still a positive rating)
1 = No usable datasets submitted.

Software

If the authors state (in anonymous fashion) that their software will be available, how valuable will it be to others?

5 = Enabling: The newly released software should affect other people’s choice of research or development projects to undertake.
4 = Useful: I would recommend the new software to other researchers or developers for their ongoing work.
3 = Potentially useful: Someone might find the new software useful for their work.
2 = Documentary: The new software will be useful to study or replicate the reported research, although for other purposes it may have limited interest or limited usability. (Still a positive rating)
1 = No usable software released.

Knowledge of/educated guess at author identity

5 = From a violation of double-blind-submission rules, I know/can guess at least one author’s name. (Note that non-anonymous preprints are permitted at any time, so this is not a violation.)
4 = From a preprint or allowed workshop paper, I know/can guess at least one author’s name.
3 = From the contents of the submission itself, I know/can guess at least one author’s name.
2 = From social media/a talk/other informal communication, I know/can guess at least one author’s name.
1 = I do not have even an educated guess about author identity.

(Optional) Conjecture as to author identity. Authors will not see your response.