Review Form

Paper Summary

Describe what this paper is about. This should help action editors and area chairs to understand the topic of the work and highlight any possible misunderstandings.

Summary of Strengths

What are the major reasons to publish this paper at a selective *ACL venue? These could include novel and useful methodology, insightful empirical results or theoretical analysis, clear organization of related literature, or any other reason why interested readers of *ACL papers may find the paper useful.

Summary of Weaknesses

What are the concerns that you have about the paper that would cause you to favor prioritizing other high-quality papers that are also under consideration for publication? These could include concerns about correctness of the results or argumentation, limited perceived impact of the methods or findings (note that impact can be significant both in broad or in narrow sub-fields), lack of clarity in exposition, or any other reason why interested readers of *ACL papers may gain less from this paper than they would from other papers under consideration. Where possible, please number your concerns so authors may respond to them individually.

Comments/Suggestions/Typos

If you have any comments to the authors about how they may improve their paper, other than addressing the concerns above, please list them here.

Soundness

How sound and thorough is this study? Does the paper clearly state scientific claims and provide adequate support for them? For experimental papers: consider the depth and/or breadth of the research questions investigated, technical soundness of experiments, methodological validity of evaluation. For position papers, surveys: consider the current state of the field is adequately represented, and main counter-arguments acknowledged. For resource papers: consider the data collection methodology, resulting data & the difference from existing resources are described in sufficient detail. Please adjust your baseline to account for the length of the paper.

  • 5 = Excellent: This study is one of the most thorough I have seen, given its type.
  • 4.5
  • 4 = Strong: This study provides sufficient support for all of its claims/arguments. Some extra experiments could be nice, but not essential.
  • 3.5
  • 3 = Acceptable: This study provides sufficient support for its major claims/arguments. Some minor points may need extra support or details.
  • 2.5
  • 2 = Poor: Some of the main claims/arguments are not sufficiently supported. There are major technical/methodological problems.
  • 1.5
  • 1 = Major Issues: This study is not yet sufficiently thorough to warrant publication or is not relevant to ACL.

Overall Assessment

Would you personally like to see this paper presented at an *ACL event that invites submissions on this topic? For example, you may feel that a paper should be presented if its contributions would be useful to its target audience, deepen the understanding of a given topic, or help establish cross-disciplinary connections. Note: Even high-scoring papers can be in need of minor changes (e.g. typos, non-core missing refs, etc.).

  • 5 = Top-Notch: This is one of the best papers I read recently, of great interest for the (broad or narrow) sub-communities that might build on it
  • 4.5
  • 4 = This paper represents solid work, and is of significant interest for the (broad or narrow) sub-communities that might build on it
  • 3.5
  • 3 = Good: This paper makes a reasonable contribution, and might be of interest for some (broad or narrow) sub-communities, possibly with minor revisions
  • 2.5
  • 2 = Revisions Needed: This paper has some merit, but also significant flaws, and needs work before it would be of interest to the community
  • 1.5
  • 1 = Major Revisions Needed: This paper has significant flaws, and needs substantial work before it would be of interest to the community
  • 0 = This paper is not relevant to the *ACL community (for example, is in no way related to natural language processing)

Reviewer Confidence

  • 5 = Positive that my evaluation is correct. I read the paper very carefully and am familiar with related work.
  • 4 = Quite sure. I tried to check the important points carefully. It’s unlikely, though conceivable, that I missed something that should affect my ratings.
  • 3 = Pretty sure, but there’s a chance I missed something. Although I have a good feel for this area in general, I did not carefully check the paper’s details, e.g., the math or experimental design.
  • 2 = Willing to defend my evaluation, but it is fairly likely that I missed some details, didn’t understand some central points, or can’t be sure about the novelty of the work.
  • 1 = Not my area, or paper is very hard to understand. My evaluation is just an educated guess.

Best Paper

Could the camera-ready version of this paper merit consideration for an ‘outstanding paper’ award (up to 2.5% of accepted papers at *ACL conferences will be recognized in this way)? Outstanding papers should be either fascinating, controversial, surprising, impressive, or potentially field-changing. Awards will be decided based on the camera-ready version of the paper.

  • Yes
  • Maybe
  • No

If the answer is Yes or Maybe, please justify your decision:

Further Questions

Limitations and Societal Impact

Have the authors adequately discussed the limitations and potential positive and negative societal impacts of their work? If not, please include constructive suggestions for improvement. Authors should be rewarded rather than punished for being up front about the limitations of their work and any potential negative societal impact. You are encouraged to think through whether any critical points are missing and provide these as feedback for the authors. Consider, for example, cases of exclusion of user groups, overgeneralization of findings, unfair impacts on traditionally marginalized populations, bias confirmation, under- and overexposure of languages or approaches, and dual use (see Hovy and Spruit, 2016, for examples of those). Consider who benefits from the technology if it is functioning as intended, as well as who might be harmed, and how. Consider the failure modes, and in case of failure, who might be harmed and how.

Ethical Concerns

Please review the ACL code of ethics (https://www.aclweb.org/portal/content/acl-code-ethics) and the ARR checklist submitted by the authors in the submission form. If there are ethical issues with this paper, please describe them and the extent to which they have been acknowledged or addressed by the authors.

Needs Ethics Review

Should this paper be sent for an in-depth ethics review? We have a small ethics committee that can specially review very challenging papers when it comes to ethical issues. If this seems to be such a paper, then please explain why here, and we will try to ensure that it receives a separate review.

Reproducibility

Is there enough information in this paper for a reader to reproduce the main results, use results presented in this paper in future work (e.g., as a baseline), or build upon this work?

  • 5 = They could easily reproduce the results.
  • 4 = They could mostly reproduce the results, but there may be some variation because of sample variance or minor variations in their interpretation of the protocol or method.
  • 3 = They could reproduce the results with some difficulty. The settings of parameters are underspecified or subjectively determined, and/or the training/evaluation data are not widely available.
  • 2 = They would be hard pressed to reproduce the results: The contribution depends on data that are simply not available outside the author’s institution or consortium and/or not enough details are provided.
  • 1 = They would not be able to reproduce the results here no matter how hard they tried.

Datasets

If the authors state (in anonymous fashion) that datasets will be released, how valuable will they be to others?

  • 5 = Enabling: The newly released datasets should affect other people’s choice of research or development projects to undertake.
  • 4 = Useful: I would recommend the new datasets to other researchers or developers for their ongoing work.
  • 3 = Potentially useful: Someone might find the new datasets useful for their work.
  • 2 = Documentary: The new datasets will be useful to study or replicate the reported research, although for other purposes they may have limited interest or limited usability. (Still a positive rating)
  • 1 = No usable datasets submitted.

Software

If the authors state (in anonymous fashion) that their software will be available, how valuable will it be to others?

  • 5 = Enabling: The newly released software should affect other people’s choice of research or development projects to undertake.
  • 4 = Useful: I would recommend the new software to other researchers or developers for their ongoing work.
  • 3 = Potentially useful: Someone might find the new software useful for their work.
  • 2 = Documentary: The new software will be useful to study or replicate the reported research, although for other purposes it may have limited interest or limited usability. (Still a positive rating)
  • 1 = No usable software released.

Knowledge of/educated guess at author identity

  • 5 = From a violation of double-blind-submission rules, I know/can guess at least one author’s name. (Note that non-anonymous preprints are permitted at any time, so this is not a violation.)
  • 4 = From a preprint or allowed workshop paper, I know/can guess at least one author’s name.
  • 3 = From the contents of the submission itself, I know/can guess at least one author’s name.
  • 2 = From social media/a talk/other informal communication, I know/can guess at least one author’s name.
  • 1 = I do not have even an educated guess about author identity.

(Optional) Conjecture as to author identity. Authors will not see your response.