Literature

Josh’s Literature Review

Fitting Linear Mixed-Effects Models Using lme4 | Journal of Statistical Software (jstatsoft.org)

What is the goal of the paper?

The paper presents the lme4 package for R, which facilitates the fitting of linear mixed-effects models. The authors aim to articulate the package’s capabilities in evaluating the profiled deviance or REML criterion for linear mixed models and to explain the representation and optimization of such models for parameter estimation.

Why is it important?

The significance of this paper lies in its contribution to the field of computational methods for fitting mixed models—an area with many open problems. The lme4 package represents an evolution in this domain, offering more efficient computational tools and a syntax that simplifies the modeling process, especially for models with crossed random effects.

How is it solved? – methods

The package utilizes maximum likelihood or restricted maximum likelihood (REML) estimates for linear mixed-effects model parameters, employing numerical representation and optimization functions within R. The paper delves into the model’s structure, the evaluative steps for the profiled deviance or REML criterion, and the class structure representing such models, highlighting the improvements over previous formulations.

Results/limitations, if any.

The document focuses more on methodology than specific results or limitations. It details the improvement over the nlme package, addressing efficient linear algebra tools and the incorporation of profile likelihood confidence intervals on random-effects parameters. The paper emphasizes the ongoing development of the lme4 package, acknowledging the need for stability and usability for a broad range of applications.

Statistical primer: an introduction to the application of linear mixed-effects models in cardiothoracic surgery outcomes research-a case study using homograft pulmonary valve replacement data - PubMed (nih.gov)

What is the goal of the paper?

The goal of the paper is to provide a detailed introduction to developing and interpreting linear mixed-effects models for repeated measurements in the context of cardiothoracic surgery outcomes research. The paper uses a dataset on patients undergoing surgical pulmonary valve replacement to illustrate the steps of developing such models for clinician researchers.

Why is it important?

This work is important because the emergence of large cardio-thoracic surgery datasets, including repeated measurements over time, presents an opportunity to apply advanced modeling of outcomes. Linear mixed-effects models offer a more nuanced understanding of these outcomes compared to traditional methods, which is crucial for enhancing clinical decision-making and patient care.

How is it solved? – methods

The authors used a retrospective dataset containing serial echocardiographic measurements from patients who underwent surgical pulmonary valve replacement at Erasmus MC between 1986 and 2017. The paper discusses the construction of the model, including dealing with missing values, correlated variables, and multicollinearity. It also covers model specification, variable selection, addressing nonlinearity, and interpretation of results. An R script is provided for implementing the model.

Results/limitations, if any.

The paper illustrates the construction of the model, including essential aspects such as theories of linear mixed-effects models, missing values, collinearity, interaction, nonlinearity, model specification, and results interpretation. It shows that linear mixed-effects models provide a more detailed view of repeated measurements and give more valid estimates compared to linear regression models, especially in the context of cardio-thoracic surgery outcomes research. Limitations related to model assumptions, such as linearity and normal distribution of residuals, are addressed through transformations and statistical tests.

Generalized linear mixed models: a practical guide for ecology and evolution - ScienceDirect

What is the goal of the paper?

The goal of this paper is to provide a comprehensive guide on the use and application of Generalized Linear Mixed Models (GLMMs) for ecologists and evolutionary biologists dealing with nonnormal data types, such as counts or proportions, which often do not fit well with classical statistical procedures. The paper aims to clarify the use of GLMMs, given the popularity of these models in recent years.

Why is it important?

The importance of this paper lies in its attempt to introduce GLMMs for biologists, where data often fall outside the scope of methods taught in introductory statistics classes. The paper highlights the limitations of traditional shortcuts like data transformation or ignoring random effects and advocates for GLMMs as a more appropriate statistical approach for nonnormal data with random effects.

How is it solved? – methods

The paper reviews the use and misuse of GLMMs in biology, discusses estimation and inference, and summarizes best-practice data analysis procedures. It emphasizes the need for researchers to match their statistical approaches to their data, rather than forcing data into classical statistical frameworks. The paper discusses various estimation algorithms for fitting GLMMs, including maximum likelihood (ML), pseudo- and penalized quasilikelihood (PQL), Laplace approximations, Gauss-Hermite quadrature (GHQ), and Markov chain Monte Carlo (MCMC) algorithms.

Results/limitations, if any.

While the paper provides a broad overview of GLMM procedures and best practices, it also acknowledges the challenges and controversies in statistical issues such as null hypothesis testing, stepwise regression, and the use of Bayesian statistics. It highlights that GLMMs are powerful tools but can be challenging to use, even for statisticians, due to computational difficulties in estimating parameters, especially for complex models or large numbers of random effects.

Frontiers | Linear mixed-effects models for within-participant psychology experiments: an introductory tutorial and free, graphical user interface (LMMgui) (frontiersin.org)

What is the goal of the paper?

The goal of this paper is to introduce linear mixed-effects models (LMMs) as a versatile tool for analyzing data from within-participant psychology experiments. It seeks to address the limitations of traditional analysis methods like ANOVA in handling complex data structures, such as those involving repeated measures or nested designs. The paper also introduces LMMgui, a free, graphical user interface designed to facilitate the use of LMMs for researchers using R.

Why is it important?

The importance of this work lies in its potential to enhance the analysis of experimental psychology data by providing a more flexible and robust statistical tool that can handle the complexities of within-participant designs, such as pseudoreplication and missing data. By offering a user-friendly interface for LMM analysis, the paper aims to make advanced statistical methods more accessible to researchers, thereby improving the quality and interpretability of psychological research.

How is it solved? – methods

The paper discusses the theoretical foundation of LMMs, explaining how they can accommodate various data structures and assumptions that are commonly encountered in psychology experiments. It contrasts LMMs with traditional repeated-measures ANOVA, highlighting the advantages of LMMs in terms of their flexibility and fewer stringent assumptions. The introduction of LMMgui is a significant methodological contribution, providing a step-by-step guide on how to use this tool to specify and compare different LMMs for data analysis.

Results/limitations, if any.

While the paper primarily serves as a tutorial and does not present results from a specific study, it effectively demonstrates the application of LMMs through hypothetical examples. These examples illustrate how LMMs can be used to analyze data from within-participant designs, accounting for random effects and complex variance-covariance structures. The paper acknowledges the challenges in interpreting LMM results and the potential for increased Type I error rates in certain conditions, emphasizing the need for careful model comparison and validation.

A linear mixed model to estimate COVID-19-induced excess mortality - PubMed (nih.gov)

What is the goal of the paper?

The goal of this paper is to estimate baseline mortality (mortality under non-pandemic conditions for Belgium and the Netherlands using a linear mixed model (LMM), which can account for both fixed and random effects. If baseline mortality can be modeled, then excess mortality (the measure of the increase in mortality from all causes during a specific time period) can be used to evaluate the impact of COVID-19 on mortality.

Why is it important?

Historically, 5-year weekly averages have been used to determine baseline mortality. However, this excludes year-specific trends in mortality and the effects of historical excess mortality (ex: past influenza breakouts or heat waves). Using a LMM is important because it allows for more accurate modeling that accounts for these factors in the form of random effects.

How is it solved? – methods

The paper proposes a general linear mixed model to model weekly mortality as Y_tj, with t = 1,…,52 weeks and by year j = 2009…,2020.

The model is then adjusted to: model the cyclic pattern from year to year via random effects of Fourier terms, and reduce the influence of historical excess mortality (as mentioned above) by downweighing the residuals.

Results/limitations, if any.

Several statistics were used to evaluate the model’s forecasting accuracy, including the likelihood ratio test (LRT) and the root mean square error % (RMSE%). The models were fitted to historical mortality year from 2009- week 10 of 2020. The remaining 42 weeks of 2020 were forecasted using the LMM, along with the 5-year average model, and the ground truth data. The models all performed well, so an overall recommendation to include the down-weight procedure for past excess mortality and to include a serial correlation structure were made. The LMM did fit the mortality data better and two years were better predicted compared to the 5-year weekly average models. Many limitations exist, including differences in the reporting of COVID-19 deaths in Belgium and the Netherlands, and across the world. Additionally, it is unclear if the added complexity of LMMs provide a significant benefit over 5-year weekly average models in years besides 2014 and 2016.

Jacob’s Literature Review

Multilevel analysis quantifies variation in the experimental effect while optimizing power and preventing false positives

What is the goal of the paper?

The goal of this is to show how to handle nested data in neuroscience. Oftentimes data will be collected from the same sample with different experimental conditions. The paper states that this often goes over look in neuroscience

Why is it important?

Not only are several assumptions violated but experiential effects could be miscalculated leading to representing incorrect results.

How is it solved? – methods

The paper does two simulation studies to show the significance of using the appropriate significant method. Design A had cluster data that may have random effects for just the intercept.. Design B had cluster data that may have random effects in both the intercepts and experiment effect. Both cases resulted in an increase in false positive rates.

Results/limitations, if any.

This data is simulated and only within the context of neuroscience data. It would be beneficial to see these results to a real world dataset

Multimodel inference in ecology and evolution: challenges and solutions(nih.gov)

What is the goal of the paper?

The goal of the paper is to highlight obstacles when model averaging and using information theoretic framework and their potential solutions if they exist

Why is it important?

A large number of ecologists and biologists are analyzing data with the Information theoretic framework rather than the traditional hypothesis testing. Modeling averaging becomes increasingly difficult with Linear Mixed models due to the fixed and random effects.

How is it solved? – methods

The research suggests that researchers define appropriate inputs and predictor variables and to handle collinearity with extreme care especially when dealing with the random effects of LMMs.The paper then goes on to propose strategies on model averaging and definition top model sets.

Results/limitations, if any.

As the paper mentions, it is NOT an exhaustive survival of the potential problems of applying model averaging under an IT framework. Some problems still exist like determining which IT criteria to use when comparing models with random factors.

Linear Mixed Model for Analyzing Longitudinal Data: A Simulation Study of Children Growth Differences (sciencedirect.com)

What is the goal of the paper?

The goal of the paper is to use Linear Mixed Models to analyze multilevel data of developmental growth rates in children. Different covariance structures were modeled within the LMM to capture correlated data through time.

Why is it important?

Growth curves in children are usually represented as a 2 level data structure. At level 2 is the individual child and at the second level is each individual observation. Traditional linear models are not effective due to the non-independence of the data.

How is it solved? – methods

The simulation study found that the UN covariance performed the best although it suffered from efficiency because of the high number of parameters leaving the ARH(1) as a solid alternative. The data was also not naturally collected but simulated

Results/limitations, if any.

A protocol for conducting and presenting results of regression-type analyses

What is the goal of the paper?

The goal of the paper is to streamline analysis by giving the reader a 10 step protocol. It helps fellow researchers select models, justify assumptions, and validate models.

Why is it important?

This paper is great for researchers new to Linear Mixed models and are looking to use its advantages on a dataset. The protocol is extremely straightforward. Linear mixed models offer more in depth analysis and it is important that all researchers know it to further the field of ecology.

How is it solved? – methods

The paper has 10 steps with very concrete examples that include sample datasets, visualizations, and results. The reader has something tangible and can directly apply what they learned to another dataset.

Results/limitations, if any

The paper is limited by showing sample results and shows no empirical data. It identifies potential pitfalls like overdispersion of fitted models but offers no real solution

LEVEL (Logical Explanations & Visualizations of Estimates in Linear mixed models): recommendations for reporting multilevel data and analyses

What is the goal of the paper?

Researchers use LLMs to study hierarchical data and often report them under different names like mixed effects models, multilevel data, contextual analysis and hierarchical studies. There is no standardization across these papers for analyzing hierarchical data which leads to different aspects being reported. The goal of the paper is to make a standardized process for analyzing multilevel data

Why is it important?

This is important so studies across different time periods can be compared more easily.

How is it solved? – methods

The paper suggests using the LEVEL (Logical Explanations & Visualizations of Estimates in Linear mixed models) as framework for conducting studies with reporting recommendations

Results/limitations, if any.

Lack of flexibility. Sticking to a framework can inhibit creative analysis since you’re always looking at the framework for guidance.

Estimation and selection in linear mixed models with missing data under compound symmetric structure (nih.gov)

What is the goal of the paper?

Missing values occur all the time in real data. Statisticians and scientists use linear mixed models as a way to circumvent this issue. This paper aims to examine the estimation and model selection performance when faced with different rates of missing data. The paper employs two types of missing data. Missing at random and not at random.

Why is it important?

Given the frequency of missing data it’s important to know the impact it has on the model’s results. It’s also important to be able to

How is it solved? – methods

Missingness of data is recorded using an indicator based matrix and then a likelihood based estimator is made to capture the probability of distribution of the observed data given the model parameters.

Results/limitations, if any.

There is adequate model performance when there is a moderate amount of missingness in the data. However, the paper focuses on compound symmetric structures which assumes equal variance among any given pair of observations.

Syed’s Literature Review

An Introduction to Linear Mixed-Effects Modeling in R

What is the goal of the paper?

The goal of the tutorial is to provide both theoretical understanding and practical guidance on implementing mixed-effects models in R, particularly for researchers with basic statistical knowledge but limited experience in using these models. It aims to address the limitations of traditional statistical methods like repeated measures ANOVAs in analyzing correlated data and to introduce mixed-effects modeling as a more flexible and appropriate approach.

Why is it important?

Understanding mixed-effects modeling is crucial for researchers, especially in fields like experimental psychology where traditional methods may not adequately address the complexities of correlated data. By offering accessible explanations and practical examples, the tutorial aims to empower researchers to effectively analyze their data using mixed-effects models, thereby improving the quality and validity of their research findings.

How is it solved? – methods

The tutorial provides a theoretical introduction to mixed-effects modeling, explaining concepts such as fixed and random effects in simple terms. It contrasts mixed-effects modeling with traditional methods like repeated measures ANOVAs, highlighting the advantages of the former in handling correlated data and various types of response variables. Practical guidance is offered through R code snippets and example data, allowing readers to follow along and implement mixed-effects models in their own research.

Results/limitations, if any.

The tutorial does not present empirical results but rather serves as an educational resource. It effectively communicates the benefits of mixed-effects modeling and provides step-by-step instructions for implementation in R.

A brief introduction to mixed effects modelling and multi-model inference in ecology

What is the goal of the paper?

The paper aims to provide best practices for applying linear mixed effects models (LMMs) to biological data, particularly in ecology and evolutionary studies. It seeks to address the complexities of ecological data and offer guidance on model selection, interpretation, and common pitfalls encountered during modeling.

Why is it important?

With the increasing use of LMMs in biological data analysis, particularly in ecology, establishing best practices is critical for enhancing the robustness of conclusions drawn from ecological and evolutionary studies. Effective application of LMMs can improve the accuracy and reliability of research findings.

How is it solved? – methods The paper discusses various aspects of applying LMMs to biological data, including model selection, error structure, data transformation, and methods for model selection. It emphasizes the importance of careful consideration and consultation with a statistician, particularly in complex situations.

Results/limitations, if any

While the paper does not present empirical results, it offers practical solutions and recommendations for researchers working with ecological data. It effectively communicates the advantages and disadvantages of LMMs and provides valuable insights for their application in ecology and evolutionary studies.

Robustness of linear mixed-effects models to violations of distributional assumptions

What is the goal of the paper?

The paper aims to investigate the robustness of linear mixed-effects models (LMMs) in analyzing complex datasets commonly found in ecology and evolution. It evaluates the impact of various violations of distributional assumptions and missing random effect components on model estimates.

Why is it important?

Understanding the robustness of LMMs is crucial for researchers working with complex ecological and evolutionary datasets. Despite potential violations of assumptions and missing components, LMMs are widely used in these fields. Assessing their robustness helps ensure the accuracy and reliability of model estimates, even in challenging scenarios.

How is it solved? – methods

The study evaluates the impact of skewed, bimodal, and heteroscedastic random effect and residual variances, as well as the effects of missing random effect terms and correlated fixed effect predictors on model estimates. It likely employs simulations or analytical approaches to systematically assess the performance of LMMs under various conditions.

Results/limitations, if any.

The results indicate that while violations of assumptions may lead to slight biases and decreased precision in estimates, the overall robustness of LMMs allows for accurate and unbiased estimation of fixed and random effects.

To transform or not to transform: using generalized linear mixed models to analyse reaction time data

What is the goal of the paper?

The paper aims to challenge the common practice of transforming reaction time (RT) data to meet normality assumptions in statistical analyses within cognitive psychology research. It proposes generalized linear mixed-effect models (GLMMs) as a solution to accurately analyze RT data without the need for transformation, thus avoiding potential theoretical implications and misleading conclusions.

Why is it important?

It highlights the discrepancy between analyses of raw RT data and transformed RT data, as demonstrated by Balota et al. (2013), emphasizing the need for a more nuanced approach to analyzing RT data. By advocating for GLMMs, the paper aims to promote proper assessment of individual differences and enhance the testing of cognitive theories.

How is it solved? – Methods

The study discusses the theoretical decisions involved in specifying a GLMM and provides reanalysis of datasets from Balota et al. (2013) to illustrate the application of GLMMs in RT data analysis. It emphasizes the importance of analyzing changes in RT distribution at a finer level to capture more accurate measures of group performance and effectively test cognitive theories.

Results/limitations, if any.

The paper suggests that GLMMs offer a more robust approach to analyzing RT data compared to traditional methods like linear mixed-effect models (LMMs) with transformed data. However, it acknowledges the complexities of addressing skewed dependent variables like RT in LMMs and the potential challenges in adopting GLMMs, such as the need for careful model specification.

Analysing disease incidence data from designed experiments by generalized linear mixed models

What is the goal of the paper?

The paper aims to introduce generalized linear mixed models (GLMMs) as a robust method for analyzing disease incidence data from designed experiments, specifically addressing overdispersion issues common in epidemiological research.

Why is it important?

It highlights the inadequacy of traditional methods like ANOVA for such data, underlining the need for alternative approaches like GLMMs to better capture the complexities of disease clustering and aggregation.

How is it solved? – Methods

The study presents GLMMs as a versatile tool, capable of accommodating both fixed and random effects, thus offering a more flexible framework for analyzing disease incidence data. It illustrates the application of GLMMs using real-world data from an experiment on downy mildew incidence in grapevines.

Results/limitations, if any.

The analysis using GLMMs reveals significant treatment effects and provides parameter estimates. However, the approach is not without limitations, including assumptions made in modeling and the potential challenge of interpreting results accurately, particularly in complex experimental designs.

Sara’s Literature Review

Model selection in linear mixed effect models

What is the goal of the paper?

The goal of the paper is to improve variable selection and parameter estimation in linear mixed effect models, which are critical for analyzing longitudinal, panel, and cross-sectional data in various scientific domains.

Why is it important?

Improving variable selection and parameter estimation is crucial because it directly impacts the accuracy and reliability of data analysis across scientific fields. Efficiently identifying relevant variables and accurately estimating their effects are essential for drawing valid conclusions from complex data structures.

How is it solved? – Methods

The authors introduce a simple, iterative procedure that employs the smoothly clipped absolute deviation (SCAD) penalty function to estimate and select both fixed and random effects in these models. This approach is highlighted for being a consistent variable selection method with some oracle properties, suggesting it can perform almost as well as if the true underlying model were known.

Results/limitations, if any.

The approach’s effectiveness and efficiency are validated through simulation studies and real data analysis. Nevertheless, the paper also points out limitations, including the method’s dependence on certain conditions for its asymptotic properties to hold and the potential computational challenges encountered with high-dimensional datasets.

Random Effects Structure for Testing Interactions in Linear Mixed-Effects Models

What is the goal of the paper?

The goal is to provide a more accurate method for testing interactions within linear mixed-effects models, critiquing existing guidelines and proposing new ones that emphasize the inclusion of random slopes for the highest-order combination of within-unit factors in interactions.

Why is it important?

This is important because accurately testing interactions in mixed-effects models is crucial for statistical analyses, especially in avoiding high Type I error rates. The paper aims to refine the approach to these models to ensure more reliable results.

How is it solved? – Methods

The author employs Monte Carlo simulations to test the proposed guidelines, demonstrating that neglecting critical random slopes can significantly increase the chance of a false rejection of the null hypothesis. Including appropriate random slopes in the model is shown to ensure better performance.

Results/limitations, if any.

The findings highlight that including the correct random slopes in mixed-effects models greatly improves model performance, particularly in accurately testing interactions between categorical variables. However, the paper’s limitations include its focus on interactions between categorical variables and the specific conditions of the simulations used.

Pymer4: Connecting R and Python for Linear Mixed Modeling

What is the goal of the paper?

The goal is to develop Pymer4, a tool that bridges R and Python for linear mixed modeling, addressing the gap in Python for a package as flexible as R’s lme4 for complex data analysis.

Why is it important?

Pymer4 is significant for providing Python users with an accessible, integrated tool for linear mixed modeling, which was previously lacking, enhancing the analytical capabilities within the Python ecosystem.

How is it solved? – Methods

Pymer4 offers a solution by by leveraging the rpy2 library, as it connects to R’s lme4 package, offering a Pythonic interface for mixed modeling that integrates well with scientific Python tools, simplifying the analysis process.

Results/limitations, if any.

Pymer4 successfully extends lme4’s functionality to Python users, offering features like significance testing and data visualization integration, enhancing multilevel model analysis. The paper focuses on the tool’s capabilities without detailing specific limitations.

A powerful and flexible linear mixed model framework for the analysis of relative quantification RT-PCR data

What is the goal of the paper?

The paper introduces a novel linear mixed model framework for analyzing relative quantification RT-PCR data, aiming to overcome the limitations of existing statistical methods by providing more accurate and flexible analysis tools.

Why is it important?

This framework is crucial for its potential to enhance the statistical power and flexibility in analyzing RT-PCR data, enabling researchers to conduct more reliable and varied analyses of gene expression across different experimental conditions.

How is it solved? – Methods

The method involves a sophisticated statistical approach that incorporates both fixed and random effects in a linear mixed model, allowing for a more nuanced analysis of RT-PCR data that accounts for various sources of variability.

Results/limitations, if any.

The framework has been shown to yield more accurate and statistically powerful results compared to traditional methods, facilitating better decision-making in biological research. The paper thoroughly evaluates the model’s performance and discusses its applicability to a wide range of experimental designs.

Using Generalized Linear Mixed Models to Evaluate Inconsistency within a Network Meta-Analysis

What is the goal of the paper?

The goal of the paper is to demonstrate how generalized linear mixed models (GLMMs) can evaluate inconsistency within network meta-analyses, improving upon traditional models by using an arm-based approach for more accurate results.

Why is it important?

The paper addresses the challenge of inconsistency between direct and indirect evidence in network meta-analysis, which can compromise the validity of conclusions, offering a more reliable framework for analysis.

How is it solved? – Methods

The authors propose an arm-based GLMM approach, which allows for flexible modeling of different outcome variables and shows improved accuracy over contrast-based methods, especially when event rates are low.

Results/limitations, if any.

The arm-based model provided more accurate evaluations of design inconsistency and treatment effects compared to traditional contrast-based approaches, highlighting its utility in complex analyses involving many treatments and designs.

Generalized linear mixed-model (GLMM) trees: A flexible decision-tree method for multilevel and longitudinal data

What is the goal of the paper?

The goal of the paper is to introduce GLMM trees, a method combining generalized linear mixed models (GLMMs) with decision trees for analyzing multilevel and longitudinal data, offering a novel approach to clinical prediction problems.

Why is it important?

GLMM trees provide an interpretable and flexible method that can handle complex data structures, improving upon traditional models by simplifying the analysis and enhancing the understanding of data patterns.

How is it solved? – Methods

The paper employs GLMM trees to analyze a large dataset from UK mental health services, comparing its performance with traditional GLMMs and random forests to demonstrate its predictive accuracy and efficiency.

Results/limitations, if any.

The method achieves similar predictive accuracy to traditional GLMMs and random forests but with fewer variables, showcasing its potential to streamline clinical decision-making.