Title

A Review of 'Big Data' Variable Selection Procedures For Use in Predictive Modeling

Defense Date

5-10-2017

Availability

Immediate Access

Submission Type

thesis

Degree Name

MS

Department

Computational Mathematics

School

McAnulty College and Graduate School of Liberal Arts

Committee Chair

Frank D'Amico

Committee Member

John Kern

Committee Member

Sean Tierney

Keywords

Dimension Reduction; Partial Least Squares; Penalized Regression; Predictive Modeling; Regression; Selective Inference

Abstract

Several problems arise when attempting to use traditional predictive modeling techniques on ‘big data.’ For instance, multiple linear regression models cannot be used on datasets with hundreds of variables. However several techniques are becoming common tools for selective inference as the need for analyzing big data increases. Forward selection and penalized regression models (such as LASSO, Ridge Regression, and Elastic Net) are simple modifications of multiple linear regression that can provide some guidance on simplifying a model through variable selection. Dimension reducing techniques, such as Partial Least Squares and Principal Components Analysis, are more complex than regression but have the ability to handle highly correlated independent variables. Each of the aforementioned techniques are valuable in predictive modeling if used properly. This paper provides a mathematical introduction to these developments in selective inference. A sample dataset is used to demonstrate modeling and interpretation. Further, the applications to big data, as well as advantages and disadvantages of each procedure, are discussed.

Format

PDF

Language

English

This document is currently not available here.

Share

COinS