How has SPI changed in times of agile development? Results from a multi‐method study

The emergence of agile methods and practices has not only changed the development processes but might also have affected how companies conduct software process improvement (SPI). Through a set of complementary studies, we aim to understand how SPI has changed in times of agile software development. Specifically, we aim (a) to identify and characterize the set of publications that connect elements of agility to SPI, (b) to explore to which extent agile methods/practices have been used in the context of SPI, and (c) to understand whether the topics addressed in the literature are relevant and useful for industry professionals. To study these questions, we conducted an in‐depth analysis of the literature identified in a previous mapping study, an interview study, and an analysis of the responses given by industry professionals to SPI‐related questions stemming from an independently conducted survey study. Regarding the first question, we identified 55 publications that focus on both SPI and agility of which 48 present and discuss how agile methods/practices are used to steer SPI initiatives. Regarding the second question, we found that the two most frequently mentioned agile methods in the context of SPI are Scrum and Extreme Programming (XP), while the most frequently mentioned agile practices are integrate often, test‐first, daily meeting, pair programming, retrospective, on‐site customer, and product backlog. Regarding the third question, we found that a majority of the interviewed and surveyed industry professionals see SPI as a continuous activity. They agree with the agile SPI literature that agile methods/practices play an important role in SPI activities but that the importance given to specific agile methods/practices does not always coincide with the frequency with which these methods/practices are mentioned in the literature.

distributed software development, SPI initiatives are risky 21 when companies rooted in different cultures have to collaborate. Paulish et al 21 observed that individual SPI measures may be perceived and implemented differently at different locations of one and the same company, and Kuhrmann et al 22 observed that different regions follow different SPI approaches and philosophies.
As companies are seeking ways to make software development more flexible, Agile Software Development has gained momentum. Companies have adopted agile principles quickly 23,24 and proposals to run SPI projects according to agile principles 25 have been made. As a result, standardized SPI approaches lost ground and were replaced by context-specific and often demand-driven small-scale improvement activities. 5 However, a considerable share of the available publications on SPI in the context of agile focuses on ''how to apply agile methods right'' rather than investigating how to fit agile methods and practices into the existing process portfolio. Notably, as found by Theocharis et al, 26 the discussion in the past years mainly focused on analyzing advantages of agile methods without putting this analysis into context with regard to the way how SPI is or should be done. West et al 27 coined the term ''Water-Scrum-Fall'' and claimed that hybrid software development approaches will become the norm for software development. Studies like those published by Theocharis et al 26 and Vijayasarathy and Butler 28 support this claim. In their 2017 HELENA study, Kuhrmann et al 29,30 confirmed that companies of all sizes and industry sectors combine traditional and agile development approaches. They found that companies are evolving their software processes in an experience-based and situation-specific way-without necessarily implementing full-fledged SPI initiatives.

Problem statement and objective
During the past 20 years, software development processes have adopted many elements from agile methods and practices. That is, processes have become more agile and more hybrid. This brings up the question whether SPI has become more agile and hybrid as well. Do companies today still use standardized SPI frameworks like CMMI? Is SPI conducted in the form of projects with defined start and end dates, or is it a continuously performed activity that adapts to short-term goals and needs? In the latter case, can this still be done with established SPI approaches-or is SPI itself undergoing a change towards adopting an agile flavor, represented by short improvement cycles, adaptive improvement goals and flexible improvement actions? To answer these questions, it is important to understand what experience exists regarding SPI in the context of agile and hybrid development processes. We aim to study the current state of the art and practice. We aim to better understand how agile methods/practices are used in the context of planning and conducting SPI.

Context
Our research starts from a systematic mapping study 5 that aimed at identifying literature addressing a broad range of SPI dimensions (we refer to this study as main study). The main study classifies 769 publications and found that the field of SPI is shaped by a constant rate of 10 to 12 new SPI models per year. As part of the classification, an analysis of topic clusters was performed of which one was named ''agile methods in SPI.'' The 73 papers that were identified in this cluster in the main study set the starting point for our current study. How the main study relates to our current study is explained in (Section 3.2).

Contribution
In this article, we present an in-depth analysis (according to the requirements of systematic reviews 31 ) of the 73 publications identified in the main study and assigned to the category ''agile methods in SPI.'' Based on this analysis, 55 publications were selected for data extraction. To complement our literature analysis, we designed and conducted an interview study with seven companies in Estonia. As a third data source, we analyzed publicly available survey data from the first stage of the HELENA survey 29,32 with industry professionals to further complement our findings from the literature analysis and the interview study conducted in Estonia. One finding of our research indicates that what is presented and discussed in the literature with regard to SPI and agility does only partly reflect industrial practice. Few attempts have been made to study SPI with agile methods and practices-the majority of the publications is concerned with either using agile methods in classic SPI endeavors or discussing how to achieve certain maturity levels through agile methods. Agreement could be found, however, in the way SPI is conducted. The results show that choreographed SPI programs are barely implemented. In literature as well as in practice, continuous and pragmatic in-project SPI seems to be the standard approach to SPI. of time.'' In 2015, Kuhrmann and Méndez Fernández 34 reported the results of a survey conducted in Germany indicating that only 19% of the responding companies implemented a continuous improvement program. Most of the companies (approximately 60%) seem to implement SPI programs only sporadically. However, it was found that nonstandard approaches to SPI often transform into case-specific, situation-driven SPI-quite often represented by specialized SPI approaches 5 and implemented and showcased in small project environments. In 2010, Basili 35 argued that endeavors like SPI should be considered from a more informal exploratory perspective, which should be complemented with empirical studies where applicable. He advocates that software engineering as a discipline is exploratory and evolutionary; a development that can also be observed in the context of agile methods. With the growing attention that agile methods are receiving, 23 SPI has become a project-integrated activity, 36 especially in the course of so-called agile transition projects. 37 Companies have started to adopt an agile attitude quickly, 23,24 and nowadays, companies of all sizes and industry sectors combine different (traditional, agile, and lean) development approaches. [27][28][29][30] SPI approaches tend to change over time. Alongside the established SPI models such as CMMI or ISO/IEC 15504, numerous lightweight SPI models for small-and medium-sized companies were proposed, [10][11][12]38 and even standards, eg, the ISO/IEC 2 20,39 have been developed.
Especially in the context of small-and medium-sized companies, SPI approaches are under review. For example, Helgesson et al 40  Instead, the published literature comprises a variety of reports on the systematic and/or pragmatic implementations of agile transitions. 37,48,49 A link between such transition endeavors and agile maturity models, however, is hard to find.
Nevertheless, companies will increasingly adapt to the agile philosophy and, finally, will find ways to also fulfill certification requirements. 29 Therefore, an understanding regarding the existing approaches to efficient and effective SPI is needed. Several approaches propose frameworks and models to help implement an agile SPI, ie, SPI programs that are not focused on deploying agile methods and practices but using agile principles for defining the SPI process as such thus facilitating shorter improvement cycles, having less upfront planning, being closer connected to the customers (ie, the software developers), reducing unnecessary documentation, and further so-called SPI success factors. 14,50-52 In our current study, we are concerned with collecting and structuring the different approaches used for making SPI more agile. We do not focus on specific industry sectors or company sizes, but we develop a general picture of how SPI is conducted in times of agility and the transition from traditional development processes to agile/lean software development. The main study 5 on which our current research is based, due to its nature (ie, being only a mapping study), does not provide an in-depth analysis of the identified literature. Therefore, such an in-depth analysis is the first step in our current study.

RESEARCH DESIGN
To conduct this study, we opted for a multi-method research approach. The overall research is based on a systematic mapping study 5 (ie, the main study), which we used as a scoping study to set the scene. An in-depth analysis employing the rules of systematic literature reviews 31 was applied to the literature provided by the main study. In addition, we conducted an interview study with practitioners from Estonia. Complementing the interview study, data from an online survey was used to triangulate and interpret the findings. In the following, we provide details of the research approach and the individual steps carried out in the different phases.

Objective and research questions
The overall objective of our study is to investigate the role of agility in SPI, ie, to which extent agility influences SPI initiatives either as subject to improvement or as improvement approach. For this, we defined the three research questions shown in Table 1. The first two research questions were studied using an in-depth, two-staged literature review (Section 3.2.2), which was based on the set of 73 papers already identified in the main study (Section 3.2.1). The third research question was studied through an industry interview with practitioners from Estonia (Section 3.2.3) complemented by an analysis of the publicly available dataset of the first stage of the HELENA survey (Section 3.2.4). Figure 1 gives an overview of the overall research approach. Our study is based on a previously conducted systematic mapping study, 5 which yielded 769 papers. From these 769 papers, 73 (10.5%) were assigned the attribute ''Agile/Lean,'' and these 73 papers formed the input for the study presented in this article. Starting with the 73 papers, three steps were performed to conduct the study. In step 1, a two-staged systematic  review was conducted to analyze the input data, clean the result set, and perform the in-depth literature analysis. Based on the findings of this analysis, in step 2, a practitioner survey in Estonia was conducted. Finally, to improve the data and allow for drawing better conclusions, in step 3, data from the first stage of the HELENA survey was included in the analysis. For each step (including step 0, ie, the main study), we provide details in the following sections.

3.2.1
Step 0: a systematic mapping study on SPI  53 identified in the main study. In this article, we drill deeper into the literature of the topic cluster ''Agility'' and investigate how agile methods and practices influence SPI as well as how they are used in SPI or as an SPI approach.
Step 0 also includes the preparation of the dataset handed out to the students in step 1. For this, the main study's dataset was taken, and the papers to be analyzed were selected according to their attributes (detailed information about the data collection and analysis procedures implemented in the main study can be taken from Kuhrmann et al 5 ). In total, the main study defines 46 attributes in eight categories, which were set during the papers' analysis in the main study. As the purpose of the study at hand is to investigate SPI in the context of agile software development, we selected all those papers that carried the attribute ''Agile/Lean'' from the category ''Dimension: Process-Publication Objective,'' ie, those papers from the main study that have been identified dealing with the subject of this study. In the papers' selection, no further attributes were included in the selection process, since the detailed analysis-and a potential revision of the papers' classification-was included in the students' tasks (see also Section 5.4).

3.2.2
Step 1: in-depth literature analysis In step 1 (Figure 1), we performed an initial review of the literature identified in the main study. Seven students participated in this initial review (see Table 2). The seven students worked in two groups consisting of four and three students, respectively, as part of a graduate course on the scientific method and empirical software engineering at the University of Southern Denmark in Odense. 54 Both groups received the prepared dataset (Section 3.2.1) and the task to conduct a review following the guidelines by Kitchtenham et al 31 and Kuhrmann et al. 55 The two groups were not interacting with each other, such that we received two result sets from the in-depth analysis conducted by each group. A third (and final) review was conducted by a student during his master's thesis at the University of Tartu. This student received the same material as the Odense student groups and, in addition, had access to the analysis results of the Odense groups.
Being a detailed study grounded in a subset of the data generated in the main study (Section 3.2.1), no self-contained, additional literature search, and no independent identification of relevant literature had to be performed. The students were tasked to analyze the dataset (73 candidate publications) handed out to them. The in-depth analysis of these candidate publications included (a) checking the dataset for correct classification in the main study (compare the methods applied in our previously published studies 22,53 ), (b) checking the availability of the papers, (c) applying the rigor-relevance model 56 to rate the papers, and (d) extracting information from the finally selected set of publications regarding the agile elements used, as well as the how and the why SPI is connected with the notion of agile. Also, if possible, further attributes for better characterizing the context of the studies described in the publications should be collected. Finally, the three in-depth analyses were integrated. Table 2 provides an overview of the outcomes.
To collect the data extracted from the finally selected papers (see Table A1), we stored the data in a spreadsheet that was structured as shown in Figure 2. The figure shows the data structure used for the data extraction.  Shaw 58 ). However, during the analysis of the papers, these attributes have been reevaluated. The data extraction she et also contained the categorization of the articles' classification as illustrated in Figure 3.
The initial set handed out to the students included 73 papers (Section 3.2.2), and the students had the task to reevaluate the selection, ie, whether the papers in the initial dataset have been selected properly. In case a paper had to be excluded from the study, an explanation had to be provided in the field Exclusion reason. Reasons for an exclusion were unavailability of the paper for download or a misclassification in the main study. Exclusions were reviewed by the senior researchers and had to be confirmed before applied to the dataset.
Since the third review could build upon the two previously conducted reviews, the outcomes of the third review became the basis for further analyses reported in this article. For this, the findings, especially ratings and classifications, are integrated. To check the integrated data, we computed the interrater agreements on the classification according to the rigor-relevance model before determining the final rating. To compute Fleiss' , we used the student ratings instead of the team ratings, ie, if Group 1 provided a rating, we expanded it to four students providing the rating. For the dimension rigor, we found = 0.4245, and for relevance, we found = 0.4314, ie, a moderate agreement among the three reviews on the rigor-relevance rating. Finally, the integrated rigor-relevance rating ( Figure 6) was found using the median. Qualitative data such as information regarding agile elements, their use and experiences, and rationale for using agile elements in SPI were extracted and harmonized from all three review groups. The remaining 55 publications (Table A1) in the result set have been analyzed using the contribution classification illustrated in Figure 3. Specifically, we analyzed the publications regarding the actual use of agile methods and practices in the context of SPI. The first category was used to collect all papers that only use SPI approaches to introduce and deploy agile methods, ie, what is considered ''classic'' SPI. 1,2 The second category comprises all those papers that provide a utilization of agile methods and practices to steer SPI. Specifically, we distinguish contributions of agile SPI frameworks, utilization of agile practices, principles, and values to (re)organize and run SPI initiatives and, finally, the utilization of agile methods and practices to make classic SPI approaches more agile.

3.2.3
Step 2: practitioner interview in estonia In step 2 (Figure 1), we conducted interviews with practitioners in seven companies located in Estonia to confirm the findings of the literature review, ie, to test whether the results obtained from the systematic review are in line with the industrial practice. The interview guideline was implemented using GoogleForms and consisted of seven main questions, with the last question consisting of 18 subquestions (Table A2). Overview of the characteristics of the three in-depth analyses (data collection and analysis procedures) Note. All students received 73 preselected candidate publications as a starting point for their analyses.  A characterization of the participating companies is provided in Table 9. The interviews were conducted face-to-face by one researcher who either visited the companies or arranged a video conference and executed the interview form together with the participant.

Step 3: cross-analysis with the HELENA-1 survey
In step 3 ( Figure 1), to put the findings into a bigger context, we finally used the publicly available dataset from the first stage of the HELENA survey 29,32 conducted in Europe. To make use of the HELENA data, two researchers analyzed the questionnaire used in the interviews in Estonia (Table A2) and performed a matching of the two datasets.
Specifically, besides the demographic information, the questions from the HELENA survey listed in Table 3 were selected for data analysis.
The question ID (Q-ID) consists of the question number and the question group as provided in Kuhrmann et al. 29 A more detailed introduction to both surveys can be found in Section 4.1.2.

STUDY RESULTS
To present our findings, we first provide a demographic overview to characterize the dataset of the in-depth literature review and the complementing studies in Section 4.1. In Sections 4.2 to 4.4, we provide answers to the research questions. The discussion of the results along with the discussion of the threats to validity is provided in Section 5.

Descriptive statistics
To set the scene, we provide an overview of the descriptive statistics of each of the three studies combined in this article as described Section

Descriptive statistics of the literature analysis
As described in Section 3.2.2, 55 publications have been selected from the input dataset received from the main study. Figure 4 gives an overview of the publication frequency in the final result set. The figure shows that the scientific community started studying agile development approaches in the context of SPI around 2010. Since then, the interest is continuously increasing as the two trend-lines show. Most of the publications were, as expected, published after the release of the Agile Manifesto. However, just in 1996, Aoyama 61 describes concurrent development processes in the Japanese software industry, which already include practices like dividing software into smaller parts, time-fixed interval of delivery, close customer relations, and incremental construction of the system. These practices are similar to those described in XP and Scrum, eg, time-boxed sprints, producing shippable output, and on-site customer.
As in the main study, we classified the result set according to the research-type facet (RTF, Wieringa et al 57 ) and the contribution-type facet (CTF, Shaw 58 ). Figure 5 shows the classification as a bubble chart. Most papers (32 papers) report lessons learned, and another 18 papers contribute a framework. Furthermore, 22 papers have been classified as philosophical paper, and another 16 papers are solution proposals. That is, the result set is shaped by framework proposals and papers reporting lessons learned. However, evaluation research is only contributed by nine papers, thus suggesting that SPI and agile methods in combination still await a deep scientific investigation complementing a rich body of knowledge available.
To better characterize the result set, we also used the rigor-relevance model suggested by Ivarsson and Gorschek 56 to assess scientific rigor of the papers as well as the papers' relevance. Figure 6 provides the result of the classification according to the rigor-relevance model. The figure FIGURE 5 Classification of the papers from the result set according to the research-and the contribution-type facet FIGURE 6 Classification of the papers from the result set according to the rigor-relevance model shows 30 papers classified of high or very high relevance (score, 3 or 4). That is, we consider the final result set appropriate to draw conclusions of practical relevance. Furthermore, more than half of the papers (30 papers) are classified of good to very good rigor (score of 2 or higher). That is, also from the scientific perspective, we consider the result set appropriate to draw conclusions.

Descriptive statistics of complementing studies with industry professionals
To complement the findings of the in-depth literature analysis, we conducted interviews with experts from companies in Estonia and compared the resulting data with data generated during the first stage of the HELENA 29,32 survey. As we were interested into confirming the findings from the systematic review, the questionnaire used in the interview in Estonia (Table A2) was scoped to the findings, ie, to investigate whether the findings of the literature review match industrial practice and experience ( Figure 1). When comparing with the data from the HELENA survey, one should keep in mind that the HELENA project has a different goal. Specifically, the HELENA survey aims to draw a big picture of the use of different (combined) software and system development approaches in general. That is, the HELENA survey provides a more holistic picture about processes that we use as background information for the Estonian data allowing us to draw more robust conclusions. In this section, we provide an overview of the different development methods and practices used in industry to set the scene for discussing the role of agility in SPI.

Study population
In a 2-week period in Summer 2017, we conducted the interviews with seven companies from Estonia (n = 7). All companies were asked for the use of agile methods and practices and, if applicable, their respective way of conducting SPI. The data from the first stage of the HELENA survey (n = 69) was conducted from May to June 2016. Detailed information about the instrument used in the first stage of HELENA can be taken from Kuhrmann et al. 29 A summary of the company sizes of the companies included in the data analysis is shown in Figure 7. Except for the class of very large companies, the numbers show an equal distribution of participants from companies of different size. However, as the Estonian dataset comprises only seven interviews (about 10% of the size of the HELENA data), in the following, we focus on presenting absolute numbers first to draw the big picture.

Use of non-agile/agile methods
In the scoped interview in Estonia, companies were asked to state whether they use one or more of the following methods: Scrum, Extreme Programming, Lean Software Development, Crystal, or Kanban. Figure 8 summarizes the use of agile methods in the interviewed Estonian companies and relates the outcomes to the HELENA data (note that the figure presents the different methods following the categorization of the HELENA study 29 and thus includes traditional and agile methods as well as such methods that can be assigned to both categories). From the seven Estonian companies, five reported using practices and other elements of Scrum. Also, five companies stated they are using practices and other FIGURE 7 Overview of the study population the findings of the HELENA study that companies use different (agile) methods in combination, even agile methods are used in combination. 29,62 Beyond the data from Estonia, Figure 8 also shows the distribution of methods used as reported by the HELENA survey. Here, a bigger variety can be observed. However, the Estonian ''top scorers'' Scrum and Kanban are also widely used in Europe.
Use of non-agile/agile practices Figure 9 shows the results of the analysis that non-agile/agile practices are used. We use the categorization from HELENA and provide an integrated view (also, we mapped those items from the Estonian interview guideline that have different names as in the HELENA questionnaire). Figure 9 shows that companies use and combine multiple practices to run software development projects. Every company uses automated testing, and six out of seven companies use refactoring and daily meetings. Two interesting findings are the absence of a system metaphor and a product owner. While the absence of a system metaphor was already found and discussed in the context of developing a project management support system for globally distributed agile software projects, 63 the absence of a product owner, notably in the context of SPI initiatives, was an unexpected result, which we discuss later in Section 5.

RQ 1: characterization of the literature putting special focus on agile methods in SPI
The first research question aims to identify and characterize the set of publications that connect elements of agility with software SPI. In the main study, 5  approaches, which appears to be the standard way of implementing agile methods. 29 The result set also contains papers that report general improvement activities, ie, activities that primarily address a general improvement approach rather than a specific adjustment of a practice.
Hence, from the bird's-eye perspective, we conclude that agile methods are not closely linked to SPI programs. The result suggests that the majority of the papers either proposes custom approaches to address specific domains or reports experiences collected from implementing an agile method. Using agile methods in SPI programs, or even as a means to steer SPI programs, has not been extensively reported.
Based on Figure 3, we classified the 55 papers from the result set by performing an in-depth inspection of the full text. The result is summarized in Table 4 and shows that seven papers report on ''classic'' SPI approaches to introduce agile methods or practices. As these seven papers 37 that utilize agile concepts to steer SPI projects. To provide a structured approach for presenting the different approaches, we use the categories from Table 4 to structure the rest of this section.

Classic SPI to deploy agile methods
In this category, we collect papers that use standard SPI approaches to introduce/deploy agile elements in the course of SPI initiatives or that provide a general discussion on SPI. In total, seven papers were assigned to this category ( Table 5). The papers listed in Table 5

Agile SPI frameworks
In this category, we collect papers that provide method proposals to run SPI projects using agile principles, as, for instance, provided by Salo and Abrahamsson. 25 In total, 12 papers were assigned to this category. The papers from Table 6 either present a new agile SPI framework or discuss general factors to be considered when constructing an agile SPI framework. For instance, Özcan-Top and Demirörs 46 provide a study on agile maturity models, and Kruchten 71 proposes an experience-based context model to guide adoption and adaptation of agile.   These exemplarily selected papers as well as other papers 74,75 have in common that they develop ''smaller'' SPI models to better support small companies and projects. A broader approach that is not focused on agile software development and small companies only is provided by Nikitina and Kajko-Mattsson. 77 Seven of the 12 papers in this category provide a reproducible evaluation, which has been conducted in different domains, eg, telecommunications, information systems, and game engineering. For the five papers classified as ''unknown'' regarding the evaluation, the research designs provide either any information or obfuscated/vague information only (surveys are also included in this category if they are not conducted as in-company surveys).

Agile elements in SPI
In this category, we collect papers that use agile elements in SPI initiatives. In total, 20 papers were assigned to this category ( Table 7). The papers inspected show agile methods and practices used in a variety of SPI-related activities. Also, as Table 7  That is, agile methods and practices are used to improve specific processes of an existing process ecosystem.
From the software project perspective, two papers address the so-called in-project SPI, 88 In this category, we also collect barriers and success factors of adopting to agile. In this regard, Korhonen 92 reports experiences from Nokia's agile transformation, and the author provides an evaluation of selected agile methods and practices used in this transformation. In another study   90 provide an analysis of the sustainability of agile in the organization. In two pilot projects, authors observed that step-by-step adoption of agile processes leads to positive results and that agile practices are appropriate for large and complex projects. In the first pilot project, a combination of release planning, sprint planning meetings and the a test-first approach was considered beneficial. The second pilot project, however, revealed that pair programming was inappropriate for the context and required adjustments for proper use. Eventually, Laanti et al 90 found the step-wise application of agile practices reducing production cost and cycle times and improving the product quality.

Agilized classic SPI approaches
In this category, we collect papers using agile elements enrich established SPI models. In total, 16 papers were assigned to this category ( Table 8) that use agile methods and practices to improve established SPI models like CMMI and ISO/IEC 15504, modify them in order to improve their applicability to small project environments, or discuss the suitability of selected agile methods and practices for achieving maturity/capability levels.
The latter category is addressed by Selleri Silva et al, 109 Pino et al, 106 Manhart and Schneider, 101 and Salinas et al. 100  Modifying established maturity/improvement models is, however, the main use case in this category. Different author teams [102][103][104][105]111,112 provide solution proposals concerning the integration of CMMI with agile methods. The predominant agile method chosen for the integration is Scrum. A holistic picture of such integration approaches is provided by a systematic review by Selleri Silva et al. 45 The main driver behind such integration approaches is the lacking suitability of the standard models for small-and medium-sized companies, which already resulted in specialized improvement models, eg, ISO/IEC 29110. 39 However, in line with an observation from the main study, 5 such specialized descriptive models are not referred in the result set, ie, the result set comprises company-/situation-specific approaches only.

RQ 2: agile methods and practices used in the context of software process improvement
The second research question (Table 1) aims to explore to which extent agile methods have been reported in the context of SPI. Since the top-level perspective taken in Section 4.2 does not allow for a satisfactory conclusion, we extended the analysis by breaking down the result set to agile methods and practices. Therefore, we collect all agile methods mentioned in the result set, and based on a predefined set of agile practices, 59 we collect information regarding the practices used. Both steps serve two purposes: (a) improve the perspective on the result set and collect more insights and (b) develop a knowledge set from the literature that can be tested with practitioners.

Agile methods
Of the 55 analyzed publications, five do not explicitly mention a specific agile framework (incl. Lean and Kanban) in the context of SPI but speak about agile approaches in general. The remaining papers name at least one agile method/practice. Figure 10 shows that Scrum and XP are the most frequently mentioned approaches. Approximately 78% of the publications (43 of 55) mention Scrum in the context of SPI and approximately 71% (39 of 55) mention XP, followed by Lean software development with a total count of 14. Figure 11 shows the frequency with which specific agile practices are mentioned in the set of 55 publications. For more concise visualization, the initial list of 35 practices, 59 which was taken as the starting point, was reduced to those 30 practices that were actually named in the result set. Hence, practices such as move people around and when a bug is found tests are created are not shown in Figure 11. Furthermore, some names of the practices listed in Sletholt et al 59 were shortened. For example, the practice all production code is pair programmed was changed to pair programming, team members volunteer for tasks (self-organizing team) to self-organizing teams and so forth. Finally, some terms include practices that have the same meaning but different wording in the different publications. For instance, the practice give the team a dedicated open work space was made more general by calling it co-located team and includes terms used in the publications like sit together and team collaboration. Figure 11 shows that integrate often, daily meetings, test-first, pair programming, and retrospective are the most frequently mentioned agile practices in the context of SPI with at least 20 mentions each.

Agile practices
Two agile practices are only mentioned once: create spike solutions to reduce risks and no functionality is added early. Another practice that receives a low count is velocity measuring with two mentions. Silva et al 109 propose a model for agile quality assurance. They also map their model's process areas to CMMI's process areas explicitly naming create spike solutions to reduce risk. For conducting SPI within agile project teams, FIGURE 10 Explicitly mentioned agile methods in the context of SPI FIGURE 11 Explicitly mentioned agile practices in the context of SPI Salo and Abrahamsson 25 propose an iterative improvement process and name velocity measuring as a practice used for estimating tasks for the next iteration. Proposing a framework for describing the maturing process of agile software development, Fontana et al 80 reason that assessing the practice no functionality is added early is difficult to assess and therefore should be assessed by conversation and observation.

RQ 3: relevance of the findings to industry
The third research question (Table 1) aims at studying how relevant the findings from the in-depth analysis of the literature are to industry; ie, we aim to study whether the topics addressed in literature are relevant to industry. For this, we utilize two independently conducted studies (Section 3). The first study emerges directly from the systematic review. That is, grounded in the findings of the systematic review, a small and scoped interview study was developed and executed with seven companies from Estonia (Section 4.1.2). To triangulate the outcomes of the Estonian interview study and to put the findings into a bigger context, we used the publicly available dataset from the first stage of the HELENA survey, 29,32 which was conducted in Europe. In this section, we provide the findings only. An interpretation and a discussion of the findings is provided in Section 5. Table 9 provides an overview of the Estonian companies interviewed with a particular focus on their respective approaches to conduct SPI. The table shows that the interviewed companies mostly implement SPI in a continuous fashion. In the following, we provide the details.

Results of the interview with estonian companies
The primary objective of the Estonian interview is to confirm the findings concerning the role of agile methods and practices in SPI. Hence, all participating companies have been asked first what they consider SPI (Table A2, Q1). Only two of the participants provided a ''useful'' statement: ''initiative during software development process is improved'' (participant A) and ''initiative to improve software process to make it more reliable and repeatable'' (participant D). All other participants either provided no answer or stated that they do not know. However, all participants made statements regarding the general approach of improving the software process used as shown in Table 9.
Five of seven companies implement a continuous improvement approach, ie, an (implicit) improvement of the applied development process on demand. For instance, participant D explicates that there is no explicit SPI program established, and SPI is performed continuously. Additionally, we asked the participants for their most recent improvements (Table A2,  Finally, we asked the participants to rate 18 statements (Table A2, Q7). The results are presented in Table 10, which shows the statements covering two areas of interest. The first area is the general perception of the current development approach and SPI. The second area is concerned with studying how certain practices are reflected in the current discussion of the process. Table 10 shows a trend among the interviewed companies: most of the topics raised in the questionnaire are considered important, and only three statements received a neutral or negative rating. Specifically, for the first block of questions, an explicit SPI program is neutrally rated (Q7-6), and also, the happiness regarding the currently implemented development process is neutrally rated (Q7-2). In the second block, Q7-17 received a more negative rating; ie, companies do not tend to add functionality early.
Summarizing the findings presented in Table 10, the ratings confirm the trend observed in Table 9 that the interviewed companies improve their processes in a continuous demand-driven manner rather than following an explicit improvement program (Q7-3 to Q7-5). Furthermore, five companies consider SPI an important topic. Another finding from the interviews is that the companies consider agile methods and practices an important part of their process portfolio (Q7-7) and also think that agility will become even more important in future process adaptations (Q7-9 to Q7-11). These ratings indicate that agile methods and practices have found their place in the companies' process portfolios and thus do/will play an important role in future improvement activities.
The statements Q7-12 to Q7-18 have been derived from the findings of the systematic review reflecting the most/least addressed topics (see also Section 4.3). Specifically, we asked the participants whether or not these practices are used and/or discussed in the companies. The four practices continuous integration, daily meetings, retrospectives, and test-first development are considered very relevant. The pair programming practice was rated indifferent (three agree and disagree each and one neutral), even though this practice is among the most frequently discussed ones in literature. Similar, the practices add functionality early and create spike solutions received an indifferent rating. The difference to pair programming is that these two practices are among the least mentioned ones in literature.

Results from analyzing the HELENA survey data
To harmonize both survey studies, we mapped the results from the Estonian interview guideline to the HELENA categories. Figure 12 shows how companies implement SPI initiatives. The figure shows that the participants of the HELENA survey also prefer the continuous improvement approach. Yet, other than in the Estonian study, in the HELENA dataset, several participants stated that they do not conduct SPI at all. In the following, we analyze the SPI-related questions from the HELENA study. As most of the questions from the HELENA survey have been defined as optional, we provide the respective n for each question.
The HELENA study aimed at confirming a trend originally coined by West et al, 27 the so-called Water-Scrum-Fall approach for defining development processes. Hence, in the HELENA study, we asked the participants whether their actual development approach emerged from an explicitly implemented SPI program. Only 19.6% of the participants stated that their development process is an outcome of a planned SPI program, 29 whereas the majority of the processes emerges either from lessons learned and experience or in a situation-specific manner. Explicitly asking the participants if and what kind of improvement program they implement, a majority of 37.7% implements an in-project continuous SPI program (Table 11). Standardized prescriptive SPI models are used by 15.9% of the participants only, and 31.8% conduct SPI sporadically only or not at all. However, as Table 11 shows, more than 75% of the participants conduct SPI in one way or the other.     Table 13 summarizes the reason why companies implement the standards. Even though customer expectations are no major driver behind implementing an SPI program (Table 12, 8.7%), company/project business (61%) and regulations (73.2%) are major drivers, which is in line with the continuous improvement (adaptation) of the development process (Table 11). Accordingly, companies face numerous challenges, 30 especially when applying agile methods in regulated domains. If companies implement an SPI program (or conduct at least sporadic SPI initiatives), we asked the participants for the goals of such activities. Table 14 shows that major drivers behind SPI initiatives are improved effectiveness, efficiency, flexibility, product quality, project speed, and improved planability of projects in general. Further improvement goals stated by the participants in the free-text part were increased employee satisfaction (two mentions), reduced employee churn (one mention), and improved ability to handle safety functions in legacy code (two mentions).
In a nutshell, SPI in the context of hybrid development approaches barely follows a standardized descriptive approach. Instead, companies tend to continuously improve their software process on-demand, mainly aiming at improving their business and project flexibility in general.

DISCUSSION
We integrate and discuss findings from the in-depth literature review and the complementing interview and survey studies by first integrating and comparing the three data sources, before discussing the findings from the literature review and the two complementing studies in detail.

Comparison of the three data sources
The findings from analyzing the practitioner interviews presented in Section 4.  Table 15. Using the normalized values, we computed the pair-wise correlation between the three data sources in the integrated dataset. The resulting correlations in Table 16 show the two survey-based data sources having a medium to strong correlation, while the correlations between the two surveys and the methods and practices extracted from the systematic review are very weak.

Finding 1:
The correlations presented in Table 16 suggest that the use of agile methods and practices as reported in the literature does not reflect the actual use of agile methods and practices in practice.

Agility and SPI in the literature
In the literature, agile methods and practices are well represented, and the number of articles dealing with agility from different angles is steadily growing (Section 2 and Section 4.1). A considerable share of publications from the result set is concerned with the agile transformation of whole organizations (Section 4.2), thus suggesting that agile methods and practices are also used to steer process improvement initiatives. Intuitively, agile methods and practices could be considered well-suited instruments in SPI as agility is, among other things, focused on project teams, the for SPI in general could also be seen in the field of agility and SPI; ie, the majority of the approaches presented is grounded in context-or solution-specific approaches, and moreover, five of the articles listed in Table 6 give insufficient information about context and/or evaluation.
In total, 16 articles discuss the integration of standard models like CMMI and ISO/IEC 15504 with agile elements (Section 4.2.4); ie, agile methods and practices are used to ''agilize'' the established SPI and maturity models. A specific focus of such activities is the question how companies can reach certain maturity levels, eg, CMMI Level 2, by incorporating agile methods, [102][103][104][105]111,112 whereas the favorite method for such an integration is Scrum. However, a strict definition is not available, which is also reflected in the low number of tool proposals as shown in Figure 5. Most of the publications in the result set report lessons learned (32 of 55) as it could also be seen in those articles that describe the use of specific agile methods and practices during selected improvement activities (Section 4.2.3). We argue that due to the focus of agile methods and practices, ie, software projects and team interaction, companies and teams prefer a continuous bottom-up approach based on continuous learning 36 to steer improvement activities, since a planned and prescriptive SPI approach is considered conflicting with agile principles. However, we have to acknowledge that even large companies 90,92,102 and companies that develop dependable systems 73,101 are increasingly interested in agile methods, thus looking for options to integrate agile methods and practices into the companies' process tool box. On the basis of our data, we also argue that Basili's proposal 35 for exploratory and evolutionary approaches was gradually implemented for SPI during deploying the various agile methods and practices.
Regarding the most frequently used agile methods and practices, Scrum and XP dominate the publication body, and these methods are complemented with a multitude of smaller practices. 29 These methods and practices are also continuously researched over time 114 as Figure 13 shows. The figure breaks down Figure 10 and shows how often the different agile methods have been mentioned in the papers analyzed in the systematic review. Figure 13 shows that Scrum and XP are in the spotlight of researchers and practitioners alike. 115 However, not every mention of a method or practice in a publication automatically implies that the method or practice is discussed as an SPI approach as it became obvious when applying the paper categorization (Section 3.2.2, Figure 3). For example, looking more closely at one selected practice, pair programming, we found that 20 publications mention this practice ( Figure 11), but only eight papers discuss the practice's actual utilization in SPI. 66,78,79,85,86,88,91,102 Surprisingly low is, however, the discussion of Lean software development approaches. Only the article by Petersen and Wohlin 72 explicitly utilizes Lean principles for defining a measurement approach. Rodríguez et al 96 remain on a more general level by only recommending the use of Lean principles to improve software development in general.
In this context, assessment and measurement seem to be the focal points when defining agile SPI frameworks, even though neither of the agile maturity models was present in the result set. Özcan-Top and Demirörs 46 provide a secondary study on the assessment of agile maturity models, but found almost all of the inspected models being independently defined from any agile method. Furthermore, all inspected models show deficiencies, notably regarding completeness of the models, correctness and consistency, and the definition of agility levels. We thus argue that the role of such agile maturity models, especially in the light of current industrial practice, should receive a deeper investigation. So far, FIGURE 13 Explicitly mentioned agile methods in the context of SPI by publication year it remains unknown whether such models (a) contribute to SPI programs and (b) provide significant advantages compared with the established SPI/maturity models.

Finding 2:
Agile SPI as reported in literature is focused on situation-specific solutions. Emphasis is put on reporting lessons learned from applying selected agile methods and practices to specific improvement activities and discussions on how to reach certain maturity levels in the context of using agile methods and practices. The upcoming agile maturity models, so far, show deficiencies and, thus, require further research to evaluate their practicability.

Agility and SPI in practice
To back up the findings of the literature study, we conducted an interview to confirm the findings with practitioners from Estonia and, using the publicly available data from the first stage of the HELENA survey, added a third data source to improve the analysis. Section 4.3 provides an overview of the agile methods and practices most frequently used in agile SPI. Approaching the seven companies in Estonia, the picture drawn from the systematic review is not consistent with the practitioners' day-to-day work (Section 5.1). Scrum and XP are also considered the most relevant agile methods (see also Kuhrmann et al 30 ), and the companies' approach to conduct SPI is also the more pragmatic continuous in-project SPI. 36 Regarding the other methods and practices used (Section 4.1.2), we observed a significant variation in the method and practice use. We argue that this observation results from the Estonian companies' size (mostly micro-to small-sized companies), which forces the companies to focus on actual project work and to implement situation-specific and efficient development approaches. The high number of companies using Kanban (five of seven; Figure 8) could be considered an indication that Kanban is frequently used as its practices are easy to understand and to implement. Contrary, pair programming, especially from the perspective of (very) small companies, has to be considered ''expensive'' and is therefore not that frequently mentioned. In the context of executing process improvement activities, five of seven companies use retrospectives, and using retrospectives as an SPI instrument is among the most-frequently mentioned practices in the systematic review as well (similar to the daily meetings practice that was named by six of seven companies).
Compared with the systematic review, the Estonian companies use Kanban more frequently. Similarly to Lean software development, this approach seems not to be extensively researched in the software engineering literature in the context of SPI, which to a certain extent is inconsistent with the generally observed stream of conducting SPI in a more pragmatic fashion. Kanban and Lean principles could be beneficial to drive SPI programs as both methods focus on efficient and effective work, 38 ie, doing what is necessary and measuring the impact/success of the work done. Especially the limit work-in-progress philosophy should be regarded advantageous to SPI as it motivates all stakeholders to stay focused and prioritize work instead of starting all SPI activities at a time (Armbrust et al 70  As a general finding of comparing the systematic review findings with the outcomes of the interview in Estonia, (a) the participating companies agree with the relevance of agile methods and practices in the context of SPI, (b) the companies also opt for the continuous and mostly in-project SPI approach, and (c) the utilization of specific methods and practices does not correspond with the findings of the systematic review (see also Section 5.1, Table 16). Even though the Estonian interview was crafted from the systematic review findings, we see some deviations. In particular, practices that have a direct impact on the project business, eg, retrospectives, daily meetings, and continuous integration, are well accepted and in line with the review findings. However, practices that are more management oriented are not exercised in the interviewed companies, eg, velocity measuring and system metaphor. This, again, contributes to the finding that agile SPI is performed in a pragmatic project-centered manner. 36 Including the HELENA data (Section 4.4.2) in the discussion, this trend becomes even more obvious. Only 15.9% of the participants stated to improve their processes in the context of a planned SPI program (Table 11), and more than 75% of the participants conduct SPI either sporadically or as an in-project activity. This is absolutely in line with the findings from the Estonian interview (Table 9) and suggests that agile SPI is more concerned with small-scale improvements ''on-the-fly,'' ie, to solve process-related problems immediately. Especially from the interview participants from Estonia, we saw that improvements are mostly concerned with experimenting and/or deploying new practices. For instance, in response to the question for the latest specific improvements made in the respective companies (Table A2, Q6), one participant named automated testing, another one referred to the introduction of sprints along with sprint planning and planning games.
Finding 3: Agile software process improvement as reported by the participants of the two complementing studies is mostly conducted in a continuous in-project style. The methods and practices used in practice deviate from the ones in literature. We argue that this difference emerges from the research settings in which certain methods and practices are deployed/evaluated. That is, without the support of researchers or without external requirements (Table 13), agile SPI is focused on small-scale project-internal improvements, but it is not part of a company-wide SPI program.

Threats to validity
We discuss the different validity threats of our study referring to the major threat classes as described by Wohlin et al. 116 Publication bias refers to the phenomenon where positive results of a study or experiment are more likely to be published than negative. 55,117 As a literature study, the research presented in this article is affected by this particular threat. Specifically, this study suffers from potential incompleteness of the search results and the general publication bias. It implies that the analyzed articles from the literature do not include publications reporting failed experiments and lessons learned from such experiences.
The internal validity could be biased by personal ratings of the researchers (selection bias). To address this risk, we continued and refined our study, 5,22,53 which follows a proven procedure that utilizes different tools and researcher triangulation to support dataset cleaning, study selection, and classification. Specifically, the in-depth review of the dataset was performed by three groups. The first two groups implemented the review independently under the supervision of two researchers implementing the procedures from earlier studies. 22,53,55 Their findings were compared and integrated. The third review was conducted by another student, who had access to the integrated findings of the other two review groups, under the supervision of another researcher. One of the researchers supervising the first two in-depth literature reviews performed the final data integration of all three in-depth reviews, and two other researchers checked the integrated data. All literature reviews were performed following the standard guideline for systematic literature reviews 31 complemented by another guideline 55 on study selection procedures using the full text of the study-relevant papers. The internal validity is also affected by the limited data collection; in particular, no new data were collected, and data analyzed were derived from the main study that serves as an umbrella. Specifically, selecting the studies of interest from the main study only using the attribute ''Agile/Lean'' introduces a threat to validity as it is possible that this selection criterion might exclude potentially relevant publications from the further analyses. For this, the classification of the selected papers was re-evaluated in the three reviews (Section 3.2.2) and lead to a reduction of the papers from initially 73 to finally 55 papers. Calling in extra researchers to analyze and/or confirm decisions-notably the exclusion of a paper-increases the internal validity.
The external validity is threatened by missing knowledge about the generalizability of the results. Furthermore, this study ''inherits'' several limitations regarding the external validity by relying on the main study's raw data only. Consequently, this study also inherits the main study's scope thus having certain limitations regarding the generalizability. To increase external validity, new data from industry professionals located in Estonia were collected through an interview study directly developed from the initial findings of the in-depth literature analysis. However, the selection of the external experts might introduce another threat. Of the 10 contacted companies, only seven agreed to participate in the survey. Also, as we used a convenience sampling strategy, 60 we had no control over the participants, and thus, besides the requirement that we asked to companies to name experienced employees, detailed demographic data were not collected. To compensate this risk, the new data from industry professionals were further complemented by data from an independently conducted survey 29,30,32 with industry professionals in several European countries. A correlation analysis of the interview study results and the survey results show an acceptable correlation such that we consider the risk of misselected participants acceptable. Furthermore, by calling in extra researchers, revisiting the original categorizations, and conducting extra quality assurance, we improved the external validity of the literature selection. Yet, to further increase the external validity of the results from our combined studies, further independently conducted studies are required to confirm our findings. With the current study, we lay the foundation for such future research.
The construct validity could impact the results obtained from the interview conducted in Estonia. Even though the interview was developed following the guidelines by Linåker et al 118 and validated through internal reviews, still, a different understanding of concepts (wording and terminology) of interviewer and interviewees could have impacted the interviews. Furthermore, using the publicly available HELENA data 32 could affect the construct validity as the HELENA survey was not designed to investigate the role of agility in SPI. Hence, questions and data had to be mapped to the Estonian interview and, finally, to the findings of the systematic review. To increase the construct validity, two of the authors independently performed the data mapping and selection of the two studies. Finally, a team of three researchers collaboratively analyzed and finalized the data source mappings, thus defining the data relevant for the analysis and the methods to apply in the analysis.
Finally, the conclusion validity might be impacted; ie, the conclusions drawn in this study might be biased for the following reasons: Conclusions drawn from the systematic review could be too positive (publication bias) and grounded in incomplete data (selection bias). As no extra publications have been added to the dataset from the main study, further relevant publications are potentially not in the dataset. For the two complementing studies, conclusions drawn are based on the personal opinions of the participants (including prejudices and preferences regarding the survey contents). Also, the number of participants in the practitioner survey conducted in Estonia is very low and, therefore, does not allow for providing a fully generalizable statement. To improve the conclusion validity, we used three data sources and different analysis methods for the analysis.
All conclusions drawn are based on independently performed analyses in the team; that is, analyses and conclusions are double-checked by other researchers not involved in a specific analysis/interpretation step.

CONCLUSION
Based on our findings, we see that SPI has in many cases become more flexible and tailored to specific short-term needs. Comparatively few companies seem to use standard, large-scale SPI approaches like CMMI or ISO/IEC 15504. Also, there seems to be a tendency toward taking advantage of SPI activities built into agile development practices such as retrospectives. Even in highly regulated domains as, for example, software for medical devices, existing SPI frameworks have been tailored and made more flexible.
Our research indicates that the trend towards agile SPI is visible both in the research literature and in surveys and interviews with industry professionals. Differences between these data sources exist regarding the prominence of specific agile development methods and practices in the context of SPI and the focus on implicit and continuous SPI versus explicit and strategic SPI.