Title Numeričke metode za tekstualnu analizu
Title (english) Numerical methods for text analysis
Author Lucijan Matanović
Mentor Ivana Šain Glibić (mentor)
Committee member Ivana Šain Glibić (predsjednik povjerenstva)
Committee member Marko Radulović (član povjerenstva)
Committee member Marko Vrdoljak (član povjerenstva)
Committee member Eduard Marušić-Paloka (član povjerenstva)
Granter University of Zagreb Faculty of Science (Department of Mathematics) Zagreb
Defense date and country 2025-02-26, Croatia
Scientific / art field, discipline and subdiscipline NATURAL SCIENCES Mathematics
Abstract U današnje doba, dostupne su nam velike količine podataka tako da možemo reći da su podaci svuda oko nas. Podaci su postali jedan od glavnih orijentira u mnogim područjima, od znanstvenih istraživanja do svakodnevnog poslovanja. Postoje različite vrste podataka, no svi podaci na kraju imaju zajedničko to da ih je najlakše analizirati ako ih imamo u brojčanom obliku. U ovom radu bavit ćemo se tekstualnim oblikom podataka koji je često nestrukturiran što može otežati izdvajanje korisnih
... More informacija. Naglasak u radu je na razvoju i primjeni numeričkih metoda u kojima tražimo prikladne matrične faktorizacije. Početni korak je pripremiti podatke, odnosno napraviti pretvorbu teksta u matričnu formu jer će nam takav pristup omogućiti lakšu obradu i analizu. U našem slučaju imat ćemo dostupnu kolekciju dokumenata gdje stupcem želimo reprezentirati dokument, a retkom korištene riječi. Rad istražuje primjenu različitih metoda matričnih fatkorizacija poput Latent Semantic Indexing (LSI), klasteriranja, nenegativne matrične faktorizacije i LGK bidijagonalizacije. Jedan od ciljeva rada je razviti efikasne tehnike za ekstrakciju i klasifikaciju informacija što može značajno unaprijediti pretraživanje informacija. Rad je podijeljen u tri ključna dijela: najprije obrađujemo osnovne rezultate vezane za rad s matricama i vektorskim prostorima, zatim obrađujemo numeričke metode, te na kraju vršimo usporedbu tih metoda kako bi se procijenila njihova učinkovitost. Sadržaj rada oslanja se na djelo Fundamentals of Algorithms: Matrix Methods in Data Mining and Pattern Recognition, 2019.. Less
Abstract (english) In today's era, we have access to vast amounts of data, so we can say that data is all around us. Data has become one of the main reference points in many fields, from scientific research to everyday business operations. There are different types of data, but ultimately, all data shares the common characteristic that it is easiest to analyze when in numerical form. In this paper, we will focus on the textual form of data, which is often unstructured, making it more difficult to
... More extract useful information. The emphasis of this work is on the development and application of numerical methods in which we seek suitable matrix factorizations. The initial step is to prepare the data, i.e., to transform text into a matrix format, as this approach will enable easier processing and analysis. In our case, we will have a collection of documents where we aim to represent each document as a column and each used word as a row. The paper explores the application of various matrix factorization methods such as Latent Semantic Indexing (LSI), clustering, Non-Negative Matrix Factorization (NMF), and LGK bidiagonalization. One of the objectives of this study is to develop efficient techniques for information extraction and classification, which can significantly enhance information retrieval. The work is divided into three key sections: first, we cover fundamental results related to working with matrices and vector spaces; next, we discuss numerical methods; and finally, we compare these methods to assess their effectiveness. The content of this paper is based on the book Fundamentals of Algorithms: Matrix Methods in Data Mining and Pattern Recognition, 2019. Less
Keywords
Tekstualna analiza
Vektorski model
Singularna dekompozicija matrice (SVD)
Klasteriranje
LGK bidijagonalizacija
Nenegativna matrična faktorizacija
Matrične faktorizacije
QR dekompozicija
Keywords (english)
Text Analysis
Vector Model
Singular Value Decomposition (SVD)
Clustering
LGK Bidiagonalization
Non-Negative Matrix Factorization (NMF)
Matrix Factorizations
QR Decomposition
Language croatian
URN:NBN urn:nbn:hr:217:575251
Study programme Title: Financial and Business Mathematics Study programme type: university Study level: graduate Academic / professional title: sveučilišni magistar matematike (sveučilišni magistar matematike)
Type of resource Text
File origin Born digital
Access conditions Open access
Terms of use
Created on 2025-02-10 16:44:22