Reproducibility Issues Hinder Machine Learning Progress

Reproducibility, the factor that allows other scientists to reproduce an experiment’s results by reproducing its procedure, is largely absent in machine learning and artificial intelligence development, and given the scale and scope of the field, it’s starting to become a real issue. One of the biggest pieces of the puzzle is being able to record and factor in small changes, such as GPU driver updates in mid-job, or changes to the data set during a training run by an outside source. A very large number of factors can affect an AI research project’s journey from conception to fruition, and without being able to reproduce all of these factors, AI researchers are essentially unable to reproduce one another’s work. This harms collaboration and piggyback development, two of the more basic tenets of publishable scientific research, in a field that would benefit greatly from having no issues in those areas.

To paint as simple a picture as possible, imagine a data scientist wanting to set up a simple AI program that searches for and sorts images of blue jays from nature photos. They would write up an algorithm that detects relatively large blue forms, then compares them against the environment in the photo to determine their size, position, current activity, and other data about them, as well as to simply verify that they are what the machine thinks they are. This in-development model is run on a training data set on the researcher’s machine, which gets a GPU driver update in the middle of running. After that, somebody on the network modifies or deletes a few of the files in the training data set for one reason or another, also in the middle of a run.

Small changes like this can have a powerful effect on the end product, especially in scenarios where a machine learning system is set to work largely unsupervised, training itself on vast data sets. Even if the researcher logged every tiny codebase change they made from start to finish, which is mostly impractical, others would still have issues reproducing their results by following the same procedure with the same algorithm, machine and data set. These principles can be applied across the spectrum of AI research, for the most part, and represent a growing problem that will need to be solved before any real industrywide collaboration can occur. Given the nature of AI research, needing vast data sets and lots of training across tons of machines for grander data processing tasks, industrywide collaboration across national borders is exactly what the AI field needs to make its next growth breakthrough.