Technology Integration in Mathematics Education : Examining the Quality of Meta-Analytic Research

The aim of this study was to evaluate the methodological quality of meta-analyses in mathematics education research. Computerized searches were conducted in ERIC, PsycINFO, ProQuest, and JSTOR through January 2016 for meta-analyses pertinent to technology integration in the mathematics classroom. To identify studies, mathematics subject matter terms such as ‘systematic review’, literature review’ and ‘meta-analysis’ were used. Retrieved studies were screened, extracted, and the methodological quality was evaluated using the 11-item Revised Assessment of Multiple Systematic Reviews (R-AMSTAR). The characteristics of meta-analyses associated with the methodological reporting quality were identified. Results suggest that on average the quality of methodological reporting is good, but there are several areas of concern. Implications are provided to improve research and classroom practice in this area.


INTRODUCTION
Technological tools can enhance the delivery, clarity, and precision of mathematics instruction.However, the use of technological tools in the mathematics classroom must be strategic.The strategic use of technology in the mathematics classroom should be consistent with current research in order to support effective teaching (National Council of Teachers of Mathematics [NCTM], 2008).Based on the trends research examining the effective use of technology in the mathematics classroom has grown exponentially.To manage and inform this growth, critical syntheses of these results are ideal.The critical synthesis of these results should be guided by meta-analytic thinking.Meta-analytic thinking is defined as the prospective in meta-analytic research, it is increasingly more important to consider the design and reporting practices implemented.Meta-analysis is a research synthesis tool that uses summaries of effect sizes to generate empirical conclusions from ostensibly similar studies (Young, Ortiz, & Young, 2017).Because meta-analyses combine results across multiple studies, it is imperative that the design quality of included studies is assessed.Meta-analysis involves: (1) summarizing several studies in terms of an effect size, and (2) combining the results to make summative inferences (Cooper, 2016).This process is traditionally completed in three steps: calculate the average effect size, test homogeneity, and detect moderators explaining any heterogeneity (Sanchez-Meca & Martin-Martinez, 1998).Maintaining the fidelity of these processes is essential to the validity and reliability of the metaanalysis results.The overall or mean effect size represents a key element of the meta-analytic review thus the calculation and reporting of these data must be undertaken with the utmost transparency (Cooper & Hedges, 2009).Researchers recommend that the details of the literature are rigorous and explicitly reported to avoid biased or erroneous results (Kepes et al., 2012;Rothstein, 2012).It is also recommended that any formulas or software used to complete the effect size calculations are referenced as well to increase the studies transparency.
The assessment of homogeneity in meta-analysis is also an important consideration in reporting and methodological quality.Two statistics are typically reported to assess the homogeneity in meta-analyses.The homogeneity test Q and the I 2 index, assess the true homogeneity of effect sizes (Borenstein et al., 2009).These statistics should be reported together because they are complementary, as the Q indicates the presents of homogeneity, while the I 2 quantifies the magnitude.Finally, arguably the most important step to increase the quality and rigor of a meta-analysis is to detect moderators and describe their influence on the heterogeneity.The detection of moderators is the key feature of any met-analytic study; because differences in strength and direction in effect sizes is identified here.Rosenthal (1991) expounds, "The search for moderators is not only an exciting intellectual enterprise but indeed…it is the very heart of scientific enterprise" (p.447).Moderators offer conditions for the effects that are theorized, thus informing researchers of the conditions in which the effects under investigation can be reliable (Schmidt & Hunter, 2014).This is the information vital to successful implementation of technology in the mathematics classroom across instructional contexts.Moderators are also important because they identify statistical interactions, which do not imply causation but rather add context to effect size results (Cooper & Patall, 2009).Given the distinctions in the associations moderators identify they are consistently placed in three categories.Moderators are categorized as either: (1) methodological variations, (2) theoretical constructs, or (3) study characteristics (DeCoster, 2004).Moderators are recognized for their ability to enhance theory development and increase the general richness of empirical work (Aguinis, Gottfredson, & Wright, 2011).Given the empirical merit of meta-analytic research and the contextualization offered by moderator analysis, examining the reporting and design quality across studies is practically and scientifically necessary.

Problem Statement
Because the methodological rigor employed in meta-analysis continues to increase, it is important to examine the affect of methodological rigor on effect size magnitude (Cooper, 2016;Moher, et al, 2015).Due to the high level of evidence meta-analyses are expected to provide, the importance of the methodological and reporting quality of these studies is an ongoing concern.In response, the American Psychological Association (APA) issued their Meta-Analysis Reporting Standards (MARS) as a means to share common meta-analytic practices across disciplines and to account for discipline specific priorities with psychology related disciplines such as education (APA, 2010).The MARS adheres to the assumption that methodological qualities of meta-analysis are nuanced by discipline, thus particular aspects are more critical to one discipline compared to another (Kepes, McDaniel, Brannick & Banks, 2013).The MARS suggest that a common structure is followed, while allowing for some flexibility, which inhibits its use as a strict measure of reporting quality.Furthermore, within the psychological and social sciences common rubrics or measures of meta-analysis quality are underdeveloped.Yet, given the rich history of meta-analytic research in the medical sciences valid and reliable tools are more abundant.To address this problem, the current study synthesized ostensibly similar prior meta-analytic research and assessed the quality using a standardized assessment instrument adapted from the medical sciences.
Because valid and reliable tools to assess the quality of meta-analysis are currently lacking in the psychological and social sciences, study quality was assessed on an adapted version of the Revised Assessment of Multiple Systematic Reviews (R-AMSTAR).The R-AMSTAR is an instrument used to assess the methodological quality of systematic reviews and meta-analysis.Because the methodological rigor of prior meta-analysis varies across studies, the R-AMSTAR was used to assess the variation in the quality of included studies.For example, one concern with prior research is the inclusion of unstandardized outcome measures in prior meta-analysis and the possible inflation of overall effect size statistics.Using the R-AMSTAR to assess this and other methodological quality issues is important because the influence of problematic measures can be evaluated through moderator analysis to examine effects sizes across studies and inform future practice.
This critical synthesis was designed to inform research practice and consumption in mathematics education.Thus, this literature survey examined trends in results from metaanalytic research to provide recommendations for future research and instructional praxis.The results of this study summarize the overall quality of meta-analytic study designs across multiple settings and implementations.These results are important because they promote the expansion of meta-analytic thinking in mathematics education research.

RESEARCH METHOD
Comprehensive literature searches for articles written between 1980 and 2015 were conducted to produce the primary list of meta-analyses.Iterative electronic searches using educational databases identified and utilized from previous studies were conducted (JSTOR, ERIC, EBSCO, PsycINFO, and ProQuest).A combination of Boolean operators and the keywords (meta-analysis, research synthesis, literature review, literature synthesis, mathematics, achievement, technology, instructional technology, information communication technology, calculators, and computer-assisted instruction) were used to generate an initial pool of studies.Citations from retrieved studies were searched manually to identify any potentially missing meta-analysis.These methods identified 42 potential studies for preliminary review.The study titles and abstracts were read as part of the initial screening process and then the inclusion criteria presented below was applied.
The following inclusion criteria guided this literature survey: 1.The study examined the effects of digital technology applications, including computerassisted instruction, integrated tutoring systems, technology based programs, or technological tools to improve mathematics achievement.2. The studies used meta-analytic methods to calculate mean effect sizes. 3. The studies involved students in K-12 or post-secondary mathematics classrooms.4. The primary dependent measure included quantitative measures of mathematics performance such as standardized test, researcher made test, or teacher-made test. 5.Only studies conducted between 1980 and 2015 were included.The year 1980 was identified as a starting point given the significant influx of instructional technology that took place in the 80's.
Figure 1 presents the complete study inclusion and exclusion process.The initial pool of 42 studies was systematically screened and reduced to a final pool of 18.The majority of studies were removed for lack of effect size reporting and the absence of a digital technology focus.Pertinent data related to the research questions were extracted from the remaining studies.

Figure 1. Study Inclusion Flowchart
The methodological quality of the meta-analyses was assess on the R-AMSTAR tool, which is a reliable and valid instrument based on prior confirmatory factor analysis.The R-AMSTAR rating scale consists of 11 items scored on 3 to 5 criteria.Scores for each item are based on the inclusion of the related criteria and range from 0 to 4 points.The highest possible score on the instrument is 44.Sample items include, "Were the methods used to combine the findings of studies appropriate?" and "Was the likelihood of publication bias assessed?".Because scores are derived from individual items on an evaluative rubric the R-AMSTAR also allows researchers to isolate specific design strengths and weaknesses across studies.After applying the R-AMSTAR the resulting inter-rater agreement was 86.3% (Cohen's κ = .813).To reconcile any discrepancies in the R-AMSTAR ratings, the two independent coders met to establish a consensus.Scores were then categorized based on the established R-AMSTAR grading scale: D = 0 to 11, C = 12 to 22, B = 23 to 33, and A = 34 to 44.Coding reliability was assessed by comparing the independent coding results obtained by two separate researchers from a random sample of ten studies.The initial inter-rater consistency was 95% and the researchers met to resolve the remaining inconsistencies in the coding results.The complete R-AMSTAR instrument is presented in Table 1.
Descriptive statistics including mean and standard deviations (SD) for continuous variables, and as frequencies (percentages) for categorical variables are presented.The frequency of each R-AMSTAR item was calculated.Mean difference effect sizes were calculated for between study characteristics (publication status, technology intervention, and number of effect size -20 or less, less than 100, and more than 100).Finally, 95% confidence intervals for the mean R-AMSTAR score for each categorical variable were plotted for comparison across groups.If it satisfies 0 of the criteria = 1

RESULTS AND DISCUSSION
The final pool of studies comprised 18 meta-analyses inclusive of studies conducted between 1986 and 2014, representing 1,193 independent effect sizes.The median year of publication was 2007 and the range for year of publication was 28 years.The sample consisted of 10 articles and 8 unpublished dissertations.A complete list of study characteristics is presented in table 1.All studies except one included an overall mean effect size or sufficient data to calculate the overall effect size.Only one study reported an overall negative effect size.The overall effect sizes ranged from -.11 to 1.02 in magnitude.
R-AMSTAR scores ranged from 18 to 35 or from a grade of C to A. The mean score on the R-AMSTAR was 25.5(5.50).This indicates that the meta-analyses in mathematics education studies investigating technology integration received an average grade of B. Individual scale means were highest for the following three items: "Was an 'a priori' design provided?","Was a comprehensive literature search performed?",and "Were the methods used to combine the findings of studies appropriate?".These results suggest that the majority of studies examined adhered to the most common methodological quality guidelines.While individual scale means were lowest for the following items: "Was the status of publication (i.e.grey literature) used as an inclusion criterion?","Was the scientific quality of the included studies used appropriately in formulating conclusions?", and "Was conflict of interest stated?".These data indicate that more work is needed to better utilize grey literature in the inclusion and exclusion process.Additionally, researchers must begin to consider the scientific quality of the included studies in the interpretation of metaanalysis results.The complete categorical item statistics from the R-AMSTAR are provided in table 3.
Between groups mean scores were calculated for the categorical variables publication status and technology intervention type.Articles had a mean score of 26.2(6.5),while unpublished dissertations had a mean score of 22.4(3.2).Plots of 95% confidence intervals indicate that the means difference between published and unpublished studies was not statistically significantly different.The plots presented in figure 2 suggest that the confidence bands overlap substantially, thus the two groups are not statistically significantly different.Additionally, the differences across technology intervention types were also assessed.The mean scores in order from largest to smallest were: combination 28.3(5.5),software 26.12(6.3),computer assisted instruction 21.8(2.1),and calculator 19.2(1.6). Figure 4 presents the 95% confidence interval plots for technology type.The overlap between confidence bands indicates an absence of statistically significant differences between intervention types.Finally, to assess changes in study quality over time the bivariate correlation between publication year and R-AMSTAR score was analyzed.The correlational analysis revealed a positive relationship between year of publication and R-AMSTAR score, r = .59,p<.001.This indicates that as publication year increased the quality of studies also increased.Mathematics educational researchers and other social scientist should consider ways to improve reporting practices and design quality.The available meta-analyses concerning technology integration in the mathematics classroom are relatively reliable and rigorous.However, to improve study quality and to promote metaanalytic thinking, researchers should consider adopting the Meta-Analysis Reporting Standards (MARS) at a minimum to ensure that the basic design and reporting considerations are addressed.
Mathematics educators should consider using a variety of technological tools in the mathematics classroom.Prior meta-analyses indicate that the effects of technology in the mathematics classroom vary across intervention, but are overall positive.Different technologies have specific instructional affordances and constraints.Given this challenge it is important that teachers consider a variety of tools to address different content and instructional classroom needs.
Based on the summary of almost 30 years of research this study provides important conclusions related to the reporting and methodological quality of studies in this area.The results of this critique suggest that the majority of meta-analysis reviewed score relatively well, and on average earned a grade of B. This suggest that most studies represent an above average level of design rigor and transparent reporting.However, several major reporting concerns were also observed across the majority of the studies.These concerns will effect the validity and reliability of reported summary effect sizes; thus they must be considered as the field continues to move forward.Reporting quality did not vary by publication status or technology intervention.This is important because it indicates that the qualities of many metaanalytic dissertations are comparable to published research.Additionally, because design quality was not influenced by technology intervention the results across interventions are relatively similar regarding the design and reporting quality.Finally, the study substantiates the claim that as methods and reporting standards are increasing as researchers are conducting more sound meta-analyses in mathematics education research in this area.This is substantiated by the positive relationship between R-AMSTAR score and year of publication.In conclusion, as the field of mathematics education moves forward, technology will evolve as well, thus we as researchers and consumers of research must have valid and reliable data to make instructional and classroom decisions.It is my hope that this study informs current and future research towards this end.
Technology integration in mathematics education: Examining the quality of meta-analytic researchYoung 73

ERIC
Technology integration in mathematics education: Examining the quality of meta-analytic researchYoung 75 Figure 3. 95% confidence intervals for technology type

Table 1 .
Adapted Revised Assessment of Multiple Systematic Reviews (R-AMSTAR) items

Table 2 .
Study characteristics