Student Work & Testimonials

Examples of Student Work With Soup-to-Nuts Documentation

These recent senior theses are publicly-available in the Project TIER Dataverse. Developed and hosted by Harvard's Institute for Quantitative Social Science, Dataverse is a platform designed to store and provide access to files and associated metadata for empirical projects. Along with the full-text of each thesis, you will find a complete set of files, constructed according to the specifications of our protocol, that documents the empirical results. With each thesis there is also a ReadMe file that explains how you can use the electronic documentation to replicate all of the data management and analysis required to generate the results reported in the thesis. (You must have access to Stata statistical software, version 10 or higher.)

In addition to Dataverse, the Haverford College Libraries provide access to senior theses using DSpace, the open source repository software jointly developed by MIT and HP. Haverford theses in economics are available on the DSpace platform.

Student Testimonials

Caitlin Gallagher '15

In fall 2014 I took a statistical methods course with Professor Ball and was introduced to Stata. As I had never used Stata before, I inevitably experienced occasional challenges with it. However by recording commands in a "do-file" rather than entering codes directly into Stata, I was able to identify my errors exactly. The ability to retrace my steps and to determine where the issues arose not only facilitated a relatively easy correction of the mistakes, but also helped me to better understand how Stata works. Rather than becoming lost in the tool and spending considerable time searching for errors, I was able to focus on the actual research and data analysis.

In addition to learning Stata, we learned techniques for managing and documenting data and associated files. We were introduced to the Open Science Framework (OSF), a platform that aided the organizational structure of our research projects. My entire team, including my professor and research librarian, were able to easily access my team’s files, which included folders for raw data, written works, imported data, data analysis, and do-files. The combination of using correct data documenting techniques and OSF allowed me to better understand Stata and avoid becoming lost in my own work. It also facilitates the replication of my work and its extension by future scholars. I believe it is critical that these techniques be implemented by all scholars conducting empirical research.

Read MoreRead Less

Patrick Haneman '12

As I have learned throughout my life, though especially in the past year while working on my thesis and searching for a job, organization is extremely important. Regarding my thesis, it was vital that I be organized because I often had to refer back to steps I had taken months earlier. For instance, at one point during my analysis, I noticed my results for steals and blocks were quite unusual. (Specifically, I was finding evidence suggesting that road teams had significant advantages in both categories.) Since I kept my "raw data" (i.e., data as it appeared when I collected it from an outside source) and "importable data" (i.e., data that I "cleaned up" and categorized before importing into a statistical software program called STATA) organized, I was able to refer back to the raw and importable data and discover that I had incorrectly labeled "home team blocks" as "road team blocks" and vice versa.

Moreover, because the project lasted for months, I maintained do-files to keep track of each and every step I took during my analysis. If I had instead adjusted and analyzed my data interactively (through STATA) without documented do-files, I would have had no chance of recalling each and every nuance of my prior work with the data and thus, I would have been unable to reproduce my results in the long run.

And of course, my do-files themselves had to be organized. At the end of each day of thesis work, I saved updated versions of my two major do-files (titled "Import" and "Analysis"). As Professor Ball continually stressed, saving new do-files each day (such as "Updated Import on Feb 2") would lead to clutter in the short term and questions about what work was the most updated version in the long run.

Overall, while adhering to Professor Ball's replicability structure, I came to appreciate that what people do with data to draw conclusions can and should be transparent not only to themselves but also to anyone interested in reproducing the results down the line.

Read MoreRead Less

Giff Brooks '12

I can point to four main benefits that arise from documenting and recording economic (or any other data-centric) analysis in a reproducible fashion, e.g. a Stata do-file. Doing so is useful because it is transparent, it is convenient, it offers an opportunity to comment on my own work, and it maintains organization.

First, recording my analysis electronically is beneficial insofar as it increases the transparency, and thus credibility, of my findings. My work can be reproduced by anyone, anywhere. If any of my findings or analysis is incorrect, this vastly improves the chances that my errors will be noticed by a sharp analyst.

Second, keeping electronic files that document my analysis is simply convenient. As a project develops, it is quite easy to open my do-file and pick up where I left off the day before. Compare that to trying to recreate my analysis--and then build on it--day after day. Moreover, if I realize I made a mistake editing or organizing my data (or simply want to try a different tack in my analysis), I can tweak my do-file to rectify my mistake. This is vastly superior to having to start anew with the raw data, as I would have to do absent documentation of my previous work.

Third, the commenting feature included in most analytical software packages is a tool that should not be overlooked. In my experience, interweaving comments (i.e. text unread by the program, but visible to the analyst) with my commands has been immensely helpful. Lines of code that might otherwise look like gobbledygook can be enhanced with informative comments. Furthermore, they can be used to remind me of why I have chosen to run a certain test or create a certain variable, as well as to interpret a given result "in plain English."

Finally, maintaining an electronic record has aided me, especially late in my project, by keeping things organized. One do-file, for example, can contain every single step necessary for an analysis, from pulling in raw data from the internet to outputting the results as neat tables in Microsoft Word. Recording my analysis in such a form ensures that I never forget to run a given test, or mix up the order of my commands, or irreparably harm my data.

Read MoreRead Less