King Laboratory Policy on Research Reproducibility
Conscientious scientists strive to ensure that their results are reproducible. Only reproducible results can be subjected to full scrutiny; non-reproducible results are inescapably and quite properly subject to some degree of suspicion. Moreover, reproducibility is its own reward: when scientific results are archived in a fully reproducible format, it becomes easier to unambiguously answer queries arising from publications, and to improve, expand, or extend analytical methods. Efforts made to ensure reproducibility are rewarded by enhancements to research organization and efficiency. Therefore, all research conducted in the King Laboratory, or in conjunction with the King Laboratory or its members, shall be fully reproducible. The following elements of policy are designed to realize this goal.
- Preparation of manuscript-specific research archives. Each manuscript prepared in the lab shall have a corresponding archive containing the data, codes, intermediate results, and text needed to fully and exactly reproduce the manuscript.
- Portability, completeness, and self-containedness of archives. Archived codes must be portable, self-contained, and self-reproducing. In particular, it must be possible to port the archive to a different machine, run the codes, and produce an exact copy of the entire manuscript, in all its detail. It follows from this that all code written for the project, and all source data, will be contained in the archive.
- External dependencies. Dependencies on external software, libraries, etc. must be documented; version numbers and provenance of the software must be noted. The archive should be constructed in such a way that when external dependencies are broken, re-running the codes will generate a clear error message about the broken dependency.
- Stochastic simulations. Results based on pseudorandom stochastic simulations must be reproducible in detail. This is facilitated by fixing the seed of any pseudorandom number generators used.
- Intermediate results. When results depend on lengthy or expensive calculations, results of those calculations should be stored in some open-source format, along with the codes that generated them.
- Inclusion of datasets. The archive should, when permissible and practicable, contain copies of any datasets that played a role in the analysis. When there are restrictions on any data, this can be replaced by a surrogate data set and a statement of the precise provenance of the actual data, contact information for the person or organization responsible for access to the data, and a brief explanation of the nature of the restrictions. When dataset size makes inclusion of a copy impractical, precise instructions on how to obtain the exact data needed shall be supplied.
- Distinction between data and results. A strict and scrupulous distinction between data and results shall always be maintained. For this purpose, it shall be sufficient to maintain a copy of the original data, clearly so marked, distinct from any results generated using these data.
- Interactive software. Interactive software that does not support scripting, by its nature, is inimical to reproducibility and should be avoided when possible. When interactive software must be used, it should be used only for the production of text and for stylistic adjustments. No tables containing more than 12 items shall be typeset by hand. A copy of any longer table shall be stored in an open-source format independent of any interactive software and all codes used to generate the table's contents shall be provided. Interactive modification of figures shall be limited to purely aesthetic considerations. Up to such aesthetic considerations, it must be possible to reproduce the figure exactly using archived codes and data. In any case, high-resolution (print quality, uncompressed) final versions of all figures so prepared shall be stored in open-source format independent of the interactive software.
- Timeliness. All codes necessary for the production of any publication shall be checked for adequacy and appropriately archived before initial submission. Should revisions or subsequent submissions be necessary, the archive will be updated as needed before submission of revisions.
- Exceptions to the policy. When a need for an exception to this policy arises, the matter should be discussed with the lab PI. The goal of the discussion will be the design of an alternative plan to achieve the goal of full reproducibility. Such a plan must be approved by the lab PI before an exception is allowable. Archives produced under such an alternative plan must contain a clearly labeled top-level text file explaining the structure of the archive and detailing the manner in which the results may be reproduced.
- Applicability of this policy. This policy shall apply to all research conducted by any member of the lab. When a lab member is peripherally involved in a project the principals of which are not lab members, the lab member should ensure that his or her contributions are reproducible according to this policy, even if non-lab contributors do not adhere to this policy. Should any lab member be the corresponding author of any publication, this policy shall apply, regardless of where the research has been conducted.
- Amendments to this policy. This policy is a living document. Suggestions for additions and amendments are invited and should be the subject of open discussion in the lab.
- Tools to facilitate preparation of archives. This policy neither prescribes nor proscribes any particular methods or approaches to the construction of research archives. Some tools that have proved useful include versioning systems (e.g., subversion, git), literate programming systems (e.g. Sweave, knitr), open-source typesetting systems (e.g., latex, markdown) and GNU make. Other things being equal, open-source and platform-independent solutions have obvious advantages over proprietary and platform-specific ones. It is hoped that lab members will share novel solutions to aspects of the reproducibility problem.