Replication: Practical Advice for Authors and Journals
by R. Michael Alvarez, Professor of Political Science, Caltech
Research transparency and replication are important issues today in the quantitative social sciences. Many political science journals, especially those that publish quantitative research, have developed policies regarding the provision of replication materials for the manuscripts they publish, as have journals in other social sciences.
There are many good reasons for journals to develop strong replication policies. As I found during my time as co-editor of Political Analysis, when we helped to lead the development of research transparency and replication policies in political science, the provision of replication materials prior to paper publication helps to insure that the results reported in the publication are consistent with what the data and code produce.
The replication materials for papers published in journals like Political Analysis are stored in a journal Dataverse, and are used by other scholars, both to learn new techniques and methods, and in their teaching. Occasionally, replication materials are used by scholars when they want to examine the robustness of an empirical claim or method. So strong replication policies are moving quantitative political science into a new era, and helping authors practice better social science.
The development of journal replication policies for quantitative research is an important movement in political science. But in a recent article in PS: Political Science & Politics, Ellen Key surveyed the policies of an array of political science journals and found great variation across the journals in their enforcement of replication policies. The important conclusion from Key’s paper is in addition to developing replication policies, journals also must develop mechanisms to enforce those policies.
When we read Key’s paper, Lucas Núñez and I noticed that there was a paper listed by Key, published in Political Analysis, that Key coded as lacking replication materials. We examined Key’s replication data from her PS paper, and found the paper in dispute. This paper in fact did have available replication materials, but they were archived in a format that was not usable for some scholars.
This led Key, Núñez, and I to a discussion of an emerging problem with journal research transparency and replication policies — a lack of guidance for authors about the organization, clarity, and usability of replication materials. The result of this friendly discussion was a collaboration between the three of us, and the first paper from that collaboration was recently published in PS: Political Science & Politics, “Research Replication: Practical Considerations.”
The three of us found that we had accumulated a great deal of knowledge about the quality of available quantitative replication materials: Key, through her research for her original article; Núñez, through his assistance to Jonathan Katz and I helping to manage Political Analysis’s replication process; and myself, in my recent position as co-editor of Political Analysis.
During these experiences, we all saw many examples of nonexistent or incomplete documentation for replication materials, code or documentation that was unclear and poorly written, and data that was provided in difficult-to-use formats. All of these problems make the use of some replication materials difficult, if not impossible.
So we set out to provide practical guidance for authors as they develop required replication materials. The typical set of replication materials should contain documentation that describe the elements contained in the replication materials, how the provided code works, and should also tell users about specific issues that they might encounter when running the code (for example, non-standard packages that might need to be used).
The documentation, and code, should be clearly written. The code, in particular, should contain only the steps necessary to reproduce the figures, tables, and other quantities reported in the paper — and when executed, should reproduce each table and figure exactly as they appear in the paper. Moreover, materials that include multiple files of code should contain guidance as to the interdependence of each file, if any. Data should be provided in usable and common formats (minimally comma-delimited), and should only contain the data needed to reproduce the results reported in the paper. In particular, authors always need to keep in mind that data provided for replication will typically be made public, and thus they need to make sure that their data has been completely scrubbed of any identifying information about the units of analysis in the data, to comply with privacy regulations.
Our guidance is meant for authors and journal editors, in the hope that our paper will spark further discussion about the development of standards and best practices that might be adopted by political science journals for replication materials. By developing common and accepted standards and best practices, authors will have a clear understanding about what is expected when they are asked for replication materials, and they can build the development of replication materials into their research workflow. Journal editors and their staff will have more efficient processes for the examination and confirmation of replication materials. And the scholarly community will have more useful research materials for their work and teaching.
Achieving these goals will require additional effort by our professional organizations, journal editors, and of course, scholars and authors. But as we argue in our papers, we believe that the outcome of these efforts will be better social scientific practices in the future.
R. Michael Alvarez, Ph.D., is a Professor of Political Science at the California Institute of Technology. He is the co-director of the Caltech/MIT Voting Technology Project, and in that role has played an important role in both the scientific study of election administration and technology, and the development of public policies to improve the conduct of elections in the United States and other nations. His undergraduate degree in Political Science is from Carleton College, and he obtained his Ph.D. in Political Science from Duke University. He has authored or co-authored five books, including Information and Elections; Hard Choices, Easy Answers: Values, Information, and American Public Opinion; and the recently published New Faces, New Voices: The Hispanic Electorate in America. He has also edited a book on election fraud, and has written scores of academic articles on voting behavior, elections, and voting technologies. At Caltech he is a member of the Social Sciences and Behavioral and Social Neurosciences groups. He is the co-editor of the journal Political Analysis.