5th Annual International Conference on Industrial Engineering and Operations Management

A massively parallel processing for the Multiple Linear Regression

Moufida Adjout
Publisher: IEOM Society International
0 Paper Citations
1 Views
1 Downloads
Track: Computers and Computing
Abstract

The amount of data generated by traditional business activities, has resulted data warehouses with a size up to petabytes. The ability to analyze this torrent of data will become the basis of competition and growth for individual firms by ever-narrower segmentation of customers, improvement of decision-making and unearth valuable insights that would otherwise remain hidden. For this purpose, the large size of data to be processed requires the use of high-performance analytical systems running on distributed environments. Because the data is so big it affects the types of algorithms we are willing to consider. Then standard analytics algorithms need to be adapted to take advantage of cloud computing models which provide scalability and flexibility.
This work illustrates an implementation of a parallel version of the multiple linear regression. It can extract coefficients from large amounts of data, based on MapReduce Framework with large scale.
Parallel processing of multiple linear regression will be based on the QR decomposition and the ordinary least squares method.

Published in: 5th Annual International Conference on Industrial Engineering and Operations Management, Dubai, United Arab Emirates

Publisher: IEOM Society International
Date of Conference: March 3-5, 2015

ISBN: 978-0-9855497-2-5
ISSN/E-ISSN: 2169-8767