Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features

Meštrović, Ana; Petrović, Milan; Beliga, Slobodan

doi:10.3390/app122111216

prikaz prve stranice dokumenta Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features

Preuzmi
PDF 554.11 KB

Znanstveni rad - Izvorni znanstveni rad

Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features

Applied sciences (Basel), 12 (2022), 21; 11216-11237. https://doi.org/10.3390/app122111216

Meštrović, Ana; Petrović, Milan; Beliga, Slobodan

Institucijski repozitorij: Repozitorij Fakulteta informatike i digitalnih tehnologija Sveučilišta u Rijeci

Citirajte ovaj rad

APA 6th Edition

Meštrović, A., Petrović, M. i Beliga, S. (2022). Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features. Applied sciences (Basel), 12. (21), 11216-11237. doi: 10.3390/app122111216

MLA 8th Edition

Meštrović, Ana, et al. "Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features." Applied sciences (Basel), vol. 12, br. 21, 2022, str. 11216-11237. https://doi.org/10.3390/app122111216

Chicago 17th Edition

Meštrović, Ana, Milan Petrović i Slobodan Beliga. "Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features." Applied sciences (Basel) 12, br. 21 (2022): 11216-11237. https://doi.org/10.3390/app122111216

Harvard

Meštrović, A., Petrović, M. i Beliga, S. (2022) 'Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features', Applied sciences (Basel), 12(21), str. 11216-11237. doi: 10.3390/app122111216

Vancouver

Meštrović A, Petrović M, Beliga S. Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features. Applied sciences (Basel) [Internet]. 2022. [pristupljeno 07.11.2024.];12(21):11216-11237. doi: 10.3390/app122111216

IEEE

A. Meštrović, M. Petrović i S. Beliga, "Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features", Applied sciences (Basel), vol. 12, br. 21, str. 11216-11237, 2022. [Online]. Dostupno na: https://urn.nsk.hr/urn:nbn:hr:195:441549. [Citirano: 07.11.2024.]

Za citiranje koristite ovu mrežnu adresu: https://urn.nsk.hr/urn:nbn:hr:195:441549

Podaci o radu

Naslov (engleski)	Retweet Prediction Based on Heterogeneous Data Sources: The Combination of Text and Multilayer Network Features
Autor	Ana Meštrović
Autor	Milan Petrović
Autor	Slobodan Beliga
Autorova ustanova	Sveučilište u Rijeci (Fakultet informatike i digitalnih tehnologija)
Znanstveno / umjetničko područje, polje i grana	DRUŠTVENE ZNANOSTI Informacijske i komunikacijske znanosti Informacijski sustavi i informatologija
Sažetak (engleski)	Retweet prediction is an important task in the context of various problems, such as information spreading analysis, automatic fake news detection, social media monitoring, etc. In this study, we explore retweet prediction based on heterogeneous data sources. In order to classify a tweet according to the number of retweets, we combine features extracted from the multilayer network and text. More specifically, we introduce a multilayer framework for the multilayer network representation of Twitter. This formalism captures different users’ actions and complex relationships, as well as other key properties of communication on Twitter. Next, we select a set of local network measures from each layer and construct a set of multilayer network features. We also adopt a BERT-based language model, namely Cro-CoV-cseBERT, to capture the high-level semantics and structure of tweets as a set of text features. We then trained six machine learning (ML) algorithms: random forest, multilayer perceptron, light gradient boosting machine, category-embedding model, neural oblivious decision ensembles, and an attentive interpretable tabular learning model for the retweet-prediction task. We compared the performance of all six algorithms in three different setups: with text features only, with multilayer network features only, and with both feature sets. We evaluated all the setups in terms of standard evaluation measures. For this task, we first prepared an empirical dataset of 199, 431 tweets in Croatian posted between 1 January 2020 and 31 May 2021. Our results indicate that the prediction model performs better by integrating multilayer network features with text features than by using only one set of features.
Ključne riječi (engleski)
Jezik	engleski
Vrsta publikacije	Znanstveni rad - Izvorni znanstveni rad
Status objave	Objavljen
Vrsta recenzije	Recenziran - međunarodna recenzija
Verzija publikacije	Objavljena verzija rada (izdavačev PDF)
Naslov časopisa	Applied sciences (Basel)
Brojčani podaci	vol. 12, br. 21, str. 11216-11237
p-ISSN	2076-3417
DOI	https://doi.org/10.3390/app122111216
URN:NBN	urn:nbn:hr:195:441549
Datum objave publikacije	2022
Projekt	Šifra: IP-CORONA-2020-04-2061 Naziv (hrvatski): Višeslojni okvir za karakterizaciju širenja informacija putem društvenih medija tijekom krize COVID-19 Kratica: InfoCoV Voditelj: Ana Meštrović Pravna nadležnost: Hrvatska Financijer: HRZZ Linija financiranja: IP-CORONA
URL dokumenta	http://bib.irb.hr/1247233
Vrsta resursa	Tekst
Prava pristupa	Otvoreni pristup
Uvjeti korištenja
Datum i vrijeme pohrane	2023-01-25 13:48:43

Search form