2018), and structure (Neglectos 2018 Rule et al. Instead, they explored other aspects, including use cases (Kery et al. 2018), they did not attempt to execute the notebooks and assess characteristics related to reproducibility. While studies have been carried out to better understand how notebooks are used (Kery et al. 2004), and maintenance (Horwitz and Reps 1992). 2014), regarding separation of concerns (Hürsch and Lopes 1995), tests (Myers et al.
These criticisms reinforce prior work, which has emphasized the negative impact of the lack of best practices of Software Engineering in scientific computing software (Wilson et al. In addition, the notebook format does not encode library dependencies with pinned versions, making it difficult (and sometimes impossible) to reproduce the notebook. Among the main criticisms are hidden states, unexpected execution order with fragmented code, and bad practices in naming, versioning, testing, and modularizing code. However, the format has been increasingly criticized for encouraging bad habits that lead to unexpected behavior and are not conducive to reproducibility (Pomogajko 2015 Grus 2018 Mueller 2018 Pimentel et al. ( 2016) advocate the usage of notebooks for publishing reproducible research due to their ability to combine reporting text with the executable research code.
It also allows the interleaving of not only code and text, but also different kinds of rich media, including image, video, and even interactive widgets combining HTML and JavaScript. Jupyter originated from IPython (Pérez and Granger 2007) and, in addition to Python, it supports a variety of programming languages, such as Julia, R, JavaScript, and C. The system was released in 2013, and today there are over 9 million notebooks in GitHub (Parente 2020). It was designed to make data analysis easier to document, share, and reproduce.
Jupyter Notebook is the most widely-used system for interactive literate programming (Shen 2014). We evaluate Julynter with a remote user experiment with the goal of assessing Julynter recommendations and usability.
Based on our findings and best practices we proposed, we designed Julynter, a Jupyter Lab extension that identifies potential issues in notebooks and suggests modifications that improve their reproducibility. We discuss patterns we discovered, which provide additional insights into notebook reproducibility. Finally, we mined association rules from the notebooks.
We report how these factors impact the reproducibility rates. Third, we conducted a more detailed analysis by isolating library dependencies and testing different execution orders.
Second, we sampled notebooks from the full dataset for an in-depth qualitative analysis of what constitutes the dataset and which features they have. First, we separated a group of popular notebooks to check whether notebooks that get more attention have more quality and reproducibility capabilities. In this paper, we extended the analysis in four different ways to validate the hypothesis uncovered in our original study. We presented a detailed analysis of their characteristics that impact reproducibility, proposed best practices that can improve the reproducibility, and discussed open challenges that require further research and development. To better understand good and bad practices used in the development of real notebooks, in prior work we studied 1.4 million notebooks from GitHub. At the same time, there has been growing criticism that the way in which notebooks are being used leads to unexpected behavior, encourages poor coding practices, and makes it hard to reproduce its results. The self-documenting aspects and the ability to reproduce results have been touted as significant benefits of notebooks.
They support the creation of literate programming documents that combine code, text, and execution results with visualizations and other rich media. Jupyter Notebooks have been widely adopted by many different communities, both in science and industry.