QEST + FORMATS 2024
🇨🇦 Calgary, CanadaSeptember 9-13, 2024

Artifact Evaluation

Reproducibility of experimental results is crucial to foster an atmosphere of trustworthy, open, and reusable research. To improve and reward reproducibility, QEST+FORMATS 2024 includes a dedicated Artifact Evaluation (AE). An artifact is any additional material (software, data sets, machine-checkable proofs, etc.) that supports the claims made in the paper and, in the ideal case, makes them fully replicable. In case of a tool, a typical artifact consists of the binary or source code of the tool, its documentation, the input files (e.g., models analyzed or input data) used for the tool evaluation in the paper, and a configuration file or document describing the parameters used to obtain the results.

Submission of an artifact is mandatory for tool papers, and optional – but encouraged – for research papers if it can support the results presented in the paper. Artifacts will be reviewed concurrently to the corresponding papers. The results of the artifact evaluation will be taken into consideration in the paper reviewing discussion. However, the primary goal of the artifact evaluation is to give positive feedback to authors as well as encourage and reward replicable research.

Benefits for Authors: By providing an artifact supporting experimental claims, authors increase the confidence of readers in their contribution. Accepted papers with a successfully evaluated artifact will receive a badge to be included on the paper’s title page. Finally, artifacts that significantly exceed expectations may receive an Outstanding Artifact Award.

Important Dates

All dates are AoE

Phases are explained below.

Evaluation Criteria

The goal of this initiative is to encourage research that is openly accessible and reproducible also in the future (time-proof). The AE Committee will assign a score to the artifact based on the notion of reproducibility as detailed in the ACM badging policy. The AEC will focus on both “Functional” and “Available” aspects by evaluating:

For example, artifacts that need to download third-party material from a private sharing link (e.g., a dropbox link or a private webpage) will not be considered future-proof.

Evaluation Process

The artifact evaluation is single blind. This in particular means that the artifact does not need to be anonymized.

The evaluation consists of two phases: the smoke-test phase (Phase I) and the full-review phase (Phase II), which proceed as follows.

Submission

An artifact submission consists of

Submissions shall be created through easychair at this link.

The Artifact Itself

In the spirit of reproducibility and future-proofness, some requirements are imposed on the actual artifact. In case any of these points cannot be implemented, e.g., due to the use of licensed software that cannot be distributed, please contact the AEC chairs as soon as possible to discuss specific arrangements.

The artifact must contain the following.

  1. A README file, describing in clear and simple steps how to install and use the artifact, and how to replicate the results in the paper.
    • If applicable, the README file should provide a “toy example” to easily check the setup during Phase I.
    • In case network access is required by the artifact, an explanation of when and why it is required should be provided.
  2. A LICENCE file, which at the very least allows the AEC to download and execute the artifact.
  3. The concrete binaries as either a docker image (preferred) or VM image, containing everything that is needed to run the artifact.
    • For Docker: Include the complete image saved with docker save (potentially compressed with, e.g., gzip).
    • For VM: Use VirtualBox and save the VM as Open Virtual Appliance (OVA) file.

    Including instructions and sources to build the tool is strongly encouraged, but does not replace providing the complete image.

The artifact must be made available through archive-quality storage (Zenodo, Figshare, etc.) that provides a citable DOI.

The artifact must not require financial cost to be evaluated (e.g., by running on a cloud service).

It is recommended (but not required)

In general, it should be as simple as possible for reviewers to conclude reproducibility.

Sources and Reusability

Authors are also encouraged to include all sources, dependencies, and instructions needed to modify and/or (re-)build the artifact (e.g., through tarballs and a Dockerfile). This may, of course, rely on network access. We recommend to strive to be as self-contained as possible. In particular, the artifact should contain as much of its dependencies as reasonably possible, and any downloads should only refer to stable sources (e.g., use a standard Debian docker image as base) and precise versions (e.g., a concrete commit / tag in a GitHub repository or docker image version instead of an unspecific “latest”). This maximizes the chances that the tool can still be modified and built in several years.

Badges

As indicated, papers with successful evaluation will place a badge on their title page for the camera-ready version. Note that the paper must clearly indicate the exact artifact version used for the evaluation, i.e. the DOI used for the submission. Of course, the paper may additionally link to the most recent version and/or source code repositories.

Sample LaTeX code to place the badge will be provided.

Outstanding Artifact Award

AEC members will nominate artifacts that significantly exceed expectations for an outstanding artifact award. The AEC chairs will consider these nominations, and might award none or multiple artifacts. Awardees will receive a certificate during the social event.