Title Lasso metoda i primjene u visokodimenzionalnoj statistici
Title (english) Lasso method and applications in high-dimensional statistics
Author Bruno Ljubičić
Mentor Hrvoje Planinić (mentor)
Committee member Hrvoje Planinić (predsjednik povjerenstva)
Committee member Vanja Wagner (član povjerenstva)
Committee member Dijana Ilišević (član povjerenstva)
Committee member Miljenko Huzak (član povjerenstva)
Granter University of Zagreb Faculty of Science (Department of Mathematics) Zagreb
Defense date and country 2023-09-29, Croatia
Scientific / art field, discipline and subdiscipline NATURAL SCIENCES Mathematics
Abstract U suvremenom okruženju obilja podataka, visokodimenzionalna statistika postala je ključna disciplina za razumijevanje složenih veza među varijablama. S obzirom na sve veću prisutnost velikog broja varijabli u analizama podataka, javlja se potreba za tehnikama koje ne samo pravilno modeliraju složene veze, već i omogućavaju selekciju nekolicine najvažnijih. U tom kontekstu, Lasso metoda (skraćeno od "Least Absolute Shrinkage and Selection Operator") izdvaja se kao moćan alat za obradu
... More visokodimenzionalnih podataka. Uobičajeni procjenitelj najmanjih kvadrata (OLS) često se koristi za prilagodbu linearnog modela podacima. Međutim, ako je broj varijabli strogo veći od broja podataka, OLS nije jedinstven te se rješenja mogu drastično razlikovati, čime je onemogućena interpretabilnost koeficijenata. Ovo dovodi do potrebe za tehnikama selekcije varijabli. Glavna razlika između OLS-a i Lasso metode je što Lasso dodatno penalizira apsolutnu vrijednost koeficijenata. Ovdje leži i glavnina teškoća analiziranja Lasso metode - apsolutna vrijednost nije diferencijabilna u nuli. Ipak, ispostavlja se da je u analizi dovoljno koristiti alate konveksne analize. U prvom poglavlju razmatraju se temeljna svojstva procjenitelja najmanjih kvadrata i poopćenje inverza matrice. Drugo poglavlje posvećeno je ključnim rezultatima u polju konveksne analize i odgovara na pitanje egzistencije lasso rješenja. U trećem poglavlju uvodi se pojam subgradijenta konveksne funkcije koji je ključan u traženju minimuma. Konačno, četvrto poglavlje detaljno istražuje glavne karakteristike lasso rješenja, razmatra pitanja vezana uz jedinstvenost rješenja te daje konstruktivan algoritam za lasso putanju. Less
Abstract (english) This paper focuses on the analysis and construction of the lasso method. The first part of the paper examines the least squares estimator and the issues arising when the design matrix does not have full rank. Also it introduces the concept of the Moore-Penrose inverse that is used throughout the paper. The second part of the paper provides an overview of results from convex analysis along with proofs of the projection theorem and the hyperplane separation theorems. At the end of this
... More section, the existence of lasso solutions is proving by using the results related to recession directions and level sets. The third part of the paper concentrates on subgradients as analogs of gradients for nondifferentiable and convex functions. Subgradients are a valuable tool for finding the minimum of such functions. At the end of this chapter, the necessary and sufficient condition that lasso solutions must satisfy are derived. The fourth part explores the properties of lasso solutions. It proves uniqueness of fitted values, along with the equivalence of the ℓ1 norms of all lasso solutions. Key concepts such as the equicorrelation set and the sign vector are introduced, and their uniqueness over all Lasso solutions is established. Furthermore, it is shown that, unlike least squares estimators, two Lasso solutions with different signs on the common support cannot exist. It is also established that the Lasso solution is unique if the design matrix comes from an absolutely continuous distribution. Moreover, the paper describes the LARS algorithm, which constructs a piecewise linear and continuous Lasso path for a fixed design matrix and response vector and demonstrates useful properties of such solution. Furthermore, an algorithm for determining upper and lower bounds of coefficients across all Lasso solutions is provided. It also defines the concept of indispensable variables. Finally, the last part of the paper observes the Lasso solution as a function of the response vector and proves the continuity of the fitted value in this context. It is also shown that the LARS Lasso solution is an affine function in the neighbourhood of almost every response vector. At the end, there is an example of applying LARS algorithm on diabetes dataset. Less
Keywords
Least Absolute Shrinkage and Selection Operator
procjenitelj najmanjih kvadrata (OLS)
selekcija varijabli
Keywords (english)
Least Absolute Shrinkage and Selection Operator
least squares estimator
selection of variables
LARS algorithm
Language croatian
URN:NBN urn:nbn:hr:217:400047
Study programme Title: Mathematical Statistics Study programme type: university Study level: graduate Academic / professional title: sveučilišni magistar matematike (sveučilišni magistar matematike)
Type of resource Text
File origin Born digital
Access conditions Open access
Terms of use
Created on 2024-02-02 10:59:35