Response to ACL 2023 Questions

· September 6, 2023

At ACL 2023, there was a plenary session dedicated to updates from ARR and questions from the community (slides). We did not have time to answer all of the questions in the session and not everyone in the community was able to attend the session. To be as open as possible, below we have written answers to all of the questions submitted online and asked in person. In some cases, we have grouped together related questions. We have also created a video explaining the details of ARR and rewritten our author information page (thanks to Nathan Schneider for leading the rewrite!) to provide clarity on the process.

Some questions have simple clear answers. Other questions involve policy decisions where we provide the current answer. We acknowledge that other policies could exist. A survey will be conducted in 2024 to get input from the community on the path forward for these kinds of policies.

ARR Process questions

The system is very convoluted and complicated to understand even for usual ACL members. How do we ensure this doesn’t act as a deterrent for newcomers in the field and hence doesn’t affect diversity?
The process used to be so complex that it excluded some of us who were not as informed/used to it. Can you please make more efforts in simplifying it and explaining it in a SHORT post?
I have just commented on ARR: decision-making is very complicated at ARR. I think it should be made similar to SoftConf

We want to make the process as clear as possible. We have developed new materials to guide authors (video and author page) and we are happy to answer questions. At a high level, the process is the same as submission to a conference, except that after receiving reviews, authors decide whether to have their paper considered for the conference or go through another round of reviews.

Will there be clear cut-offs for when to submit to ARR in order to guarantee reviews back before a conference commitment deadline?
How will timely reviews be guaranteed?
Will the system ensure a max delay to get the reviews to a paper?

Yes, ARR now guarantees reviews are ready within two months after the submission date. Since switching to two month cycles we have consistently achieved this goal.

After I receive reviews, can I revise and commit at the same time? Or I can only commit after receiving reviews from a previous cycle?
Wouldn’t it be great that one can commit in parallel to conferences with the same commitment deadline? Like AACL 23 and EMNLP 23

No, you can choose either to revise or to commit. This is equivalent to standard policies for double-submissions to conferences: while your paper is being considered for acceptance at one conference you cannot simultaneously submit for review to another conference.

Do we submit again or resubmit i.e. do we know the link with the initial paper. Is there any trace kept of the reviewing steps of a given paper (number of review) and who has access to this history ?

Each cycle has a separate submission page. When you make your resubmission you include the link to the previous version. This is provided to reviewers of the current paper. Note, if new reviewers are assigned to the paper they do not have access to the old reviews until after they submit their review (new in August 2023).

The ARR editors and tech/support teams have access to the history of papers and use this to calculate aggregate statistics as shown on the ARR Stats page.

What will the tracks be? Can reviewers chose more than one? If so will review load be considered overall?

The tracks are listed in the submission form and in our CFP. Reviewers can choose multiple tracks. Their total load is computed across all tracks. Authors choose exactly one track for each submission.

Is there a standard for how much a paper must be modified in order to request that old reviews get hidden?

Old reviews are never completely hidden in the current ARR process. As of August 2023, we give access to new reviewers after they have independently reviewed the current paper (before that they cannot see the old reviews).

ARR Design Questions

The biggest problem with ARR now is that ‘no decision’ is made, which leads to frustration and further delays in publication. How can the system deal with that?
The day one design flaw was the fact that you can’t get a decision. In a journal or conference you get a decision. Whereas, no matter how good your ARR scores are your paper may not be accepted. Is there consideration by the ACL Exec or ARR Board of changing that, so you actually get a decision?

With the current structure of ARR, this depends on when conference PCs decide to set their commitment deadlines. This year, the October, December, and February cycles have commitment deadlines immediately following the cycle, which will provide decisions (e.g., a submission on October 15th that is committed to EACL will get a decision by January 15th). In other words, in a conference review system without ARR, a typical review+decision cycle is about 3.5 months. In the same way, in the system with ARR, we can easily simulate older systems by committing to the adjoining conference irrespective of the strength of reviews. ARR actually makes the system strictly more flexible, by creating a post-review decision point for authors, giving them the ability to take more informed decisions in the best interest of their paper. Therefore, while ARR may have this as the biggest problem, it is still strictly better than existing conference review systems.

In the future, assuming ARR continues and the community prefers all cycles to have decisions, there are a range of ways to achieve that:

  • Have fewer cycles, where each cycle is immediately followed by a commitment deadline.
  • Have more than one commitment deadline per conference.
  • Have ARR directly make acceptance decisions for the conferences.

the motivation for arr was supposed to be to reduce a surge of reviews and to spread across the year but given there is a “submission by” deadline for each conference how is this accomplished?

Spreading reviewer load is one of several motivations for ARR. The hope is that papers that are ready earlier get reviewed in earlier cycles, with a possibility of being posted on arxiv, and then committed to the conference when a suitable deadline comes along. This does happen, though cycles without an immediate commitment deadline have been smaller. This is understandable since the change to several cycles in a year is so recent and our community is still in the process of getting used to our conference deadlines. We expect that as ARR continues to offer multiple cycles, then the community will slowly adapt to this new way of getting our papers reviewed. This, in the long run, should lead to spreading reviewer loads.

Will LREC, COLING, CONLL, WMT, and other NLP conferences be part of this initiative?
Will all workshops be part of the same process?

ARR is happy to provide reviews to a range of venues. It is up to the PCs of those events to choose whether to participate. Note, however, that we instruct reviewers to assume they are reviewing a submission for ACL. That may not be a good fit for venues with narrower and specialised foci, such as workshops.

Are reviews still sticky, or will authors have the option to suppress previous reviews/restart from zero state?

Reviews are sticky, i.e., when a paper is resubmitted the old reviews are available to reviewers. Note, reviewers and AEs are not sticky. Authors may choose to get new reviewers and/or a new AE when they resubmit. If authors do request new reviewers, the current implementation hides old reviews from the new reviewers until they submit their own review, independently.

Will we ever consider public reviews like for ICLR?
Why not make the reviews open in public domain? Making them public would increase the responsibility of the reviewers as well as the authors.
Why not considering making un-anonymous reviews (sign by the reviewers). Quality of reviews will improve and could be publicly accessible. Some journal process this way and the result is much better.

There is no plan to add public reviews to ARR at this time or to reveal reviewer identities to the public. These are both possible within OpenReview and could be implemented if the community strongly supports them.

Did you consider making ARR a full TACL-style journal and thus completely decouple acceptance from conference presentation?

ARR was created based on community input to the ACL reviewing committee. For their original proposal and context for the decision process, see this page.

I think there should be at least two non-student reviewers on each paper. What is the motivation for the relaxed restriction in the current policy?

For context, we currently require (technically speaking, strongly encourage the matching algorithm to produce) at least one non-student reviewer per paper. Requiring more non-student reviewers would be difficult to do while also (a) avoiding heavy reviewing loads and (b) getting the best expertise matching we can. It’s also worth noting that many student reviewers are senior students with significant experience.

Will there be new conferences or will the ACL conference deadlines be more spread out throughout the year?

The timing of each conference’s commitment deadlines is up to the PCs of that conference. For 2024, EACL, NAACL, and ACL coordinated their deadlines with each other and ARR, leading to deadlines in October, December, and February respectively. We are not aware of plans for other conferences next year.

PC decisions on ARR papers

Why does the conference still make an accept/reject decision after ARR decides that a paper is good? Doesn’t this create extra work? What will the role of conference PCs in the future be?
Isn’t it unfair that a 4 or 5 paper is then rejected? What are authors supposed to do with this decision?

Separating accept/reject decisions from the review process is how we preserve PCs’ agency in the creation of the technical program. ARR is meant to support the review process only.

In a typical conference, these steps happen:

  1. Reviewers write reviews.
  2. Area Chairs look at the reviews, weigh a range of factors, write a meta-review, and make a recommendation.
  3. Senior Area Chairs look at all of the papers in the track and develop recommendations that take into consideration calibration across ACs (e.g., one AC may tend to be more generous than another).
  4. Program Chairs consider the SAC recommendations, work to calibrate across tracks, and create a final program by making final accept/reject decisions.

ARR only does (1) and part of (2) since Action Editors in ARR provide a score, but not an explicit recommendation.

Note that in a regular conference it is also the case that seemingly “strong papers” get rejected because a score is not the only way to assess the relevance/quality of a paper and PCs have full agency to decide how to create the technical program.

Will the reject after committing to a conference be justified? This time decisions were made without further notice.
Can you highlights reasons why some ARR papers that scored 5 may have been rejected? What could authors do to those papers?

The commitment process is run by PCs. They could provide a justification to authors, but it was noted in the session that it would involve a significant additional effort on their part. Note that in standard conference reviewing, papers with positive scores are sometimes rejected and no additional comments are provided beyond the meta-review. We hope to improve this part of the process with cooperation from conference PCs.

Governance Questions

Why was ARR chosen as the sole submission system for 2024 conferences when the ACL 2023 statistics show clear preference for softconf?

This question makes the assumption that submission patterns in 2023 show a clear preference. During 2023, the ARR EICs intentionally worked with PCs to schedule the deadlines so that the direct submission deadline was slightly later, which encouraged people to go direct (since it gave more time before submission). This succeeded, leading to smaller cycles in which the ARR team could focus on improving the system.

The decision to use ARR only for EACL, NAACL, and ACL in 2024 was made by the PCs of the respective conferences.

Finally, we need to separate the infrastructure from the review process. Softconf and OpenReview are software tools we use to support the review process. ARR and direct conference submission are two styles of reviewing processes, run using those tools. PCs could choose various combinations of these. For example, EMNLP 2023 direct reviewing is happening on OpenReview.

if you think the survey results are not valid why not continue in hybrid submission mode and then do a new survey?

We are not sure which survey this is referring to. The 2022 survey indicated strong support for ARR, but also raised a range of issues, which we have worked to address over the last year (see the presentation at ACL 2023). Another survey will be conducted in 2024 to get community input based on the experience of ARR-only submissions at EACL, NAACL, and ACL.

What plans do you have to ensure the community remains informed about all of the steps and all of the changes and questions that arise so that the community feels like it understands the system?

We will be working with the PCs of conferences to share updates on their websites, our blog, and social media. We are also in the process of updating our website (thank you to Nathan Schneider for lots of helpful input and edits!), and have prepare a detailed video with guidance to authors.

Anonymity Period Policy

Can we remove anonymity period restrictions for ARR submissions? Neurips does not have a ban against pre prints being uploaded at any point so long as they are not advertised as under peer review.
What is the point of the anonimity period? You can still find the article if it was submitted more than a month before. What are you trying to achieve?
How does anonymity/ axXiv work with ARR? Can one post after receiving reviews? If so, does one have to wait 30 days afterwards before resubmitting? Before committing?
How does the anonymity period work in the new system ? I.e. for submitting versions on ARXIV?
Is there a plan to change the policy on non-anonymized pre-prints during ARR? Other similar (non-ACL) conferences allow these to be uploaded to arxiv during the review period.
Are there two anonymous periods or just one, if ARR is going to be used for all ACL confs?

The anonymity policy is set by the ACL Executive. ARR will follow any changes to the policy.

The simplest way to understand the process is that there is an anonymity period for the ARR cycle and an anonymity period for commitment:

  • For the ARR cycle, you cannot post a paper to arXiv starting one month before the submission deadline and ending at the end of the cycle.
  • For commitment, you cannot post a paper to arXiv starting one month before the commitment deadline and ending once decisions are announced.

Our new video explains this in detail. These two examples may be helpful:

  1. If you submit to the October cycle of ARR, and then commit to NAACL in February, your paper should not be posted to arXiv in September 15th - December 15th (review time+anonymity period) and January 15th - until conference decision (commitment time). You may post to arXiv in the gap between the two (December 15th - January 15th).
  2. If you submit to the October cycle of ARR, and then commit to EACL in December, your paper should not be posted to arXiv between September 15th and January 15th (this is because the two anonymity periods overlap, creating a single contiguous period).

ARR also provides the option of anonymous preprints. When you submit to a cycle you can choose to opt-in and have your paper available with no indication of the author identities.

As a follow-up, how is arxiv and Twitter anonymous, regardless of when it’s done?

Neither arXiv nor Twitter are anonymous. ARR’s provided preprint process is anonymous.

Review Quality

Perhaps use conference acceptance statistics to help measure solidity of individual reviewers?

Not sure we agree. Just because a reviewer agreed or disagreed with an eventual decision should not reflect the quality of their assessment of the paper. A paper may be accepted even when valid points get raised by a reviewer; and a paper may be rejected in spite of some strengths highlighted by a reviewer.

There is no quality information about ARR reviews. The ARR SACs do not take acceptance decisions. So we will have many unexpected rejects a long time down the line. Novelty suffers. Not good.

ARR reviewers are from the broad ACL community only. Therefore, ARR reviews are not expected to be of lower quality. If anything, since reviewing loads may get spread out, at least in leaner cycles, we expect ARR reviews to be of marginally higher quality. As described above, SACs come into the picture at decision making time, not at reviewing time (in both ARR and conferences); so the process is not much different. Indeed, even in typical conference reviewing some papers get repeatedly rejected and their novelty suffers. We do not anticipate that this phenomena will increase because of the decoupling of reviewing and decision making.

ARR is contributing, with reviewer and author consent, to the creation of a dataset of reviews. We hope that resources like that and efforts by our team will lead to improvement in review quality over time.

How do we make sure reviewers don’t ignore rebuttals and actively engage with the authors during multiple rounds

Due to shortened review cycles in ARR, author response is meant for factual inaccuracies only. If your paper indeed receives conflicting opinions, it may be hard to reconcile during the author response phase. It may be better to resubmit to the next cycle and ask for a different set of reviewers.

How do we solve the issue of reviewers not doing the work? That is 2 lines reviews. How do we force non students to review way more? Can we force 1 publication = 6 reviews at least?

This is indeed an issue in our community for all reviewing (not just ARR). ARR does not have a magical solution here. We do agree that creating a centralized reviewing system enables some possibilities that were not there in standalone conference reviewing (e.g., tracking service over time). We have no plans for rules like those proposed in the question. Our current focus is successful implementation for the EACL, NAACL and ACL 2024 cycles.

Over 800 papers reported each R1/R2/R3 during the ACL’23 review cycle (ACL2023 reviewing report). If these many authors request new reviewers in the ARR cycle, is there a large enough reviewing pool?

Our pool is large enough to accommodate many requests. Note that a reviewer may be reported for issues on one paper but do a great job on another that is a better fit for their expertise. Very few of the ACL 2023 reviewers were reported on multiple papers. This means they are not completely removed from the pool for such a report and can continue to contribute, reviewing a paper other than the one they were reported on.

Should reviewer “quality” over time be monitored?

Yes, this is one of the motivations for ARR, but not our current focus.

Will student and non-student reviews be weighted differently? Should the degree obtained (BS/MS/PHD) be the main determiner not the stage (student, non student)? What about speciality areas?

We do not instruct action editors to weigh reviews differently based on reviewer expertise.

Various

Recommendation: let meta reviewers make two venue suggestions if they are happy with the paper.

Thanks for this suggestion. The current structure of ARR is that all reviews and decisions are done with the assumption that papers are intended for ACL or similar conferences.

For ACL23 there was approximately 1 month more time for direct submission than ARR. Will time from submission to publication increase?

The timing of the 2023 conference deadlines and the ARR deadlines was intentionally set to reduce the load on ARR. In the coming year, when conferences are ARR-only, the time will be shorter (e.g., if you submit on October 15th, and commit to EACL on December 15th, you will receive a decision by January 15th or thereabouts).

What will be the relation with TACL and ARR in the future? Will TACL also move to the ARR platform?
What role does TACL serve in an ARR-only world?

We are not aware of plans to move TACL to the ARR platform. For the foreseeable future, TACL will continue as it is now, both in terms of the review process and appearance at conferences.

I submitted to ARR last year. Revised and resubmitted. Did not request new reviewers but still got THREE new reviewers…what is the point if there is no continuity?

There are two possibilities here. First (more likely), early in 2022 there were issues with the reviewer reassignment process, which have now been resolved. Second (less likely), if all three reviewers indicated they were unavailable then we would not have been able to assign them. Note, now when we ask reviewers for their availability, we give them the option to say that they will have bandwidth to review resubmissions, but not new papers.

How do I know when my reviews are “good enough”?

For context, this question was asked online during the session after a comment that you can go through multiple cycles of ARR until you are happy with your reviews. The decision is subjective and depends on the reviews. In terms of scores, as shown at ACL, papers with a meta-review score of 5 almost always get in, papers with a score of 4 usually get in, and papers with a score of 3 sometimes get in. The reason there is not a clear mapping from numbers to decisions is that the content of the reviews is important - the issues raised by reviewers may vary in significance.

Are there clear guidelines for what counts as a “good review”? Is there numerical guidance per conference for what is likely accept or reject?
In order to evaluate the quality of a paper we need to know to which conference this paper will be intended/committed. Reviews are conditioned on the venue and ARR is decoupling this.
As a SAC at NAACL 2022 it was extremely difficult to make decisions because the (meta) reviewers didn’t make recommendations for a specific venue - how should ACs make recs w/o knowing the venue?

We ask all reviewers to treat papers as submissions to ACL or a comparable conference. We have guidelines for reviewers and the numerical scores have descriptions.

What is the evolution on reviewer load? Early sign-ups had the problem of committing to (up to) 6 papers each cycle, AFAIK.

In the October, December, and February cycles, we expect loads to be comparable to regular conference loads. For cycles that do not have a conference commitment deadline immediately following, we expect loads to be low (at most 1-2 papers per cycle).

Many of your focus is on helping authors. How are you working on reviewer burnout?

Every cycle, we ask reviewers what load they are able to take on. This allows them to avoid burnout by adapting based on their own personal situation. Moreover, some of the reviews are for resubmissions, which require significantly less time to review.

How are we going to deal with proceedings compilation such as SoftConf?

An effort outside of ARR is underway to improve the proceedings compilation process from OpenReview that is supported by the ACL exec (and is needed for EMNLP 2023).

why don’t we do a simple referendum to the NLP community– do you want ARR or not. We can decide on that rather than some small group deciding on something that affects the future of many.
Why is the ACL community (as we pay the memberships) not a part of deciding this important part of the research workflow?

ARR has been the focus of two large community surveys, first in 2020 for general feedback on the proposal, then in 2022 for feedback on the first experience with ARR. The second survey included Likert-scale questions to gauge support for ARR, and the community broadly supported it (see the full results here). The survey also raised a range of issues, which we have worked to address in the past year.

In addition, we want to highlight that the decision for going fully ARR for the upcoming EACL, NAACL, and ACL conferences was made by each PC team after discussions with ARR and with input from the ACL exec and the ARR board. As has been the case in past *CL conferences, PCs have full agency on deciding how to manage their review process, and this time it is not an exception to this.

Process question: If I commit a paper to a conference but it gets rejected, am I allowed to try again and commit it to another conference?

Yes.

For ACL, there were six sources of presentations that appeared on the main conference schedule. (1) Direct submissions (2) ARR (3) SRW (4) TACL (5) CL (6) Industry track Which will use ARR now?

Of those, “(1) Direct submission” will not exist in 2024 for EACL, NAACL, and ACL. The others will continue as they are.

Will ARR adopt the soundness/excitement breakdown? How frequently will there be changes/experimentations with new scoring systems?
will the multidimensional review score approach be taken ? if so which one?

Options for this are being discussed with the EACL, NAACL, and ACL PCs.

Regarding the future, we are currently focused on running the ARR process smoothly, before considering further changes.

Why not doing the following: reviewers submit reviews, authors respond to reviewers and – at the same time – revise the paper accordingly. If the paper is accepted, that will be the camera ready!

We are not currently considering this proposal, Any changes to our review process need to be made in consultation with the ARR board. For major changes, consultation with the community would also be needed.

Does ARR necessarily imply OpenReview? Can we consider doing one change but not the other?

Of these options:

  1. Rolling review process on OpenReview
  2. Non-rolling process on OpenReview
  3. Rolling review process not on OpenReview
  4. Non-rolling process not on OpenReview

1, 2, and 4 can be done, but 3 would be difficult. In the 2022 survey (see here), we described (2) as “Do not keep rolling review, but do have an integrated reviewing system, with improvements / modifications”.

how will we communicate acceptance rates in a way that will be meaningful to those who make hiring and tenure decisions based on elite publication metrics?

Several options were proposed in the original ARR proposal. ACL 2022 reported two values in the proceedings:

“(a) (Number of accepted papers at ACL 2022) / (Number of papers that selected ACL 2022 as the preferred venue in ARR or were committed to ACL 2022). For ACL 2022, for the denominator we consider the 3,378 papers as explained above. Thus, the acceptance rate is 701 / 3,378 = 20.75% for the Main conference.”

“(b) (Number of accepted papers at ACL 2022) / (Number of papers committed to ACL 2022). For the denominator, we had 1,918 papers committed to ACL 2022, and thus, the acceptance rate is 701 / 1,918 = 36.54% for the Main conference.”

If people resubmit once before the deadline, how is this decreasing the reviewing load?

This question assumes that everyone submits once before the deadline. In reality some authors choose to submit to non-conference cycles. Moreover, for resubmissions, if the authors do not request new reviewers then the review process should be less of a load as the reviewers are already familiar with the paper.

Shorter people have trouble with QR codes etc when they are at the very bottom of the screen

This comment is about the layout of our slides in the plenary session. Thanks for mentioning this, we will keep it in mind for future presentations.

How do you obtain transparency without being open which has the possibility that leads to dissemination of harmful code?

We have an ethics reviewing process, which deals with potential harms of research. If by the question, you mean an author purposely providing harmful code, while saying something else in the paper - this is hard to handle right now, along with other similar issues like an author purposefully fabricating the data.

Do you think a four-months round for improvement (2 for review, 2 for incorporating changes) is satisfiably fast in a fast-pacing field like ours? Won’t the work on conferences become more outdated?

Any peer review process needs to give a meaningful amount of time to the peers for providing a rigorous review. Shortening this time will result in a further reduction in review quality and timeliness - issues that our community is already dealing with.

Most conferences have an exclusivity policy. How does that work at ARR?

See our CFP for details on our dual submission policy. The general principle is the same as for other venues: while a paper is under review at ARR it cannot be submitted elsewhere and vice versa.

Will the conferences in the coming year all be using the same review form, the ARR form? What advice would you give to PCs about interpreting the main score in the form?

Yes, they will all use the same form. This is simplest for PCs as it means they can get papers from any cycle and have consistent information in the reviews. We are guiding reviewers to treat the overall score as they would for submissions to ACL and PCs are aware of that approach.

Twitter, Facebook