Abstract | U ovome radu fokus je bio na metodama kombiniranja klasifikatora. Osnovni algoritmi nisu detaljno opisani već su korišteni kao temelj uz pretpostavku o znanju u području strojnog učenja. Prvo se izvodi nekoliko kombinatora, tj. algoritama koji stvaraju sintezu među izlazima više klasifikatora. Zatim su predstavljeni najčešće korišteni algoritmi u ansamblima, vođeni težnjom za stvaranjem optimalnog ansambla, kroz koncept raznolikosti opisani su ključni elementi uspješnog ansambla. Nakon toga su testirane neke od prikazanih metoda na problemu prepoznavanja govora koristeći poznati benchmark skup podataka Librispeech. Problematika je riješena korištenjem naprednih Transformerskih neuronskih mreža. Demonstrirana je visoka preciznost modela u prepoznavanju govora. Provedena je opsežna analiza pri čemu je korištena većina prezentiranih algoritama, uz primjenu 5 vizualnih reprezentacija audio signala. Rezultati su prikazani grafički i objašnjeni korištenjem mjera raznolikosti. Na kraju rada potvrđena je hipoteza da ansambli kada se pravilno koriste nadilaze standardne modele, bilo da se radi o Linearnoj regresiji, KNN-u ili Transformerskim neuronskim mrežama |
Abstract (english) | In this work, the focus was on methods of combining classifiers. Base algorithms were not described in detail but used as a foundation with the assumption of knowledge in the field of machine learning. Initially, several combiners, i.e., algorithms that create a synthesis among the outputs of multiple estimators, were explored. Then, the most commonly used algorithms in ensembles were presented and driven by the desire to create an optimal ensemble, key elements of a successful ensemble were described through the concept of diversity. Subsequently, some of the presented methods were tested on the problem of speech recognition using the well-known benchmark dataset Librispeech. The problem is addressed using advanced Transformer Neural Networks, demonstrating high precision of the model in speech recognition. An extensive analysis was conducted, employing most of the presented algorithms along with five visual representations of audio signal. Results were graphically displayed and explained using measures of diversity. Ultimately, this work confirms the hypothesis that ensembles, when used correctly, surpasses standard models, whether it be Linear Regression, KNN, or Transformer Neural Network. |