Jezična i terminološka analiza korpusa računalnih pričaonica

Karabatić, Tereza

prikaz prve stranice dokumenta Jezična i terminološka analiza korpusa računalnih pričaonica

Download
PDF 7.02 MB

doctoral thesis

Jezična i terminološka analiza korpusa računalnih pričaonica

Split: University of Split, Faculty of Humanities and Social Sciences, 2024. urn:nbn:hr:172:339249

Karabatić, Tereza

University of Split
Faculty of Humanities and Social Sciences

Institutional repository: Repository of Faculty of humanities and social sciences

Cite this document

APA 6th Edition

Karabatić, T. (2024). Jezična i terminološka analiza korpusa računalnih pričaonica (Doctoral thesis). Split: University of Split, Faculty of Humanities and Social Sciences. Retrieved from https://urn.nsk.hr/urn:nbn:hr:172:339249

MLA 8th Edition

Karabatić, Tereza. "Jezična i terminološka analiza korpusa računalnih pričaonica." Doctoral thesis, University of Split, Faculty of Humanities and Social Sciences, 2024. https://urn.nsk.hr/urn:nbn:hr:172:339249

Chicago 17th Edition

Harvard

Karabatić, T. (2024). 'Jezična i terminološka analiza korpusa računalnih pričaonica', Doctoral thesis, University of Split, Faculty of Humanities and Social Sciences, accessed 24 December 2024, https://urn.nsk.hr/urn:nbn:hr:172:339249

Vancouver

Karabatić T. Jezična i terminološka analiza korpusa računalnih pričaonica [Doctoral thesis]. Split: University of Split, Faculty of Humanities and Social Sciences; 2024 [cited 2024 December 24] Available at: https://urn.nsk.hr/urn:nbn:hr:172:339249

IEEE

T. Karabatić, "Jezična i terminološka analiza korpusa računalnih pričaonica", Doctoral thesis, University of Split, Faculty of Humanities and Social Sciences, Split, 2024. Available at: https://urn.nsk.hr/urn:nbn:hr:172:339249

Cite this item: https://urn.nsk.hr/urn:nbn:hr:172:339249

Metadata

Title	Jezična i terminološka analiza korpusa računalnih pričaonica
Title (english)	Linguistic and terminological analysis of computer forum corpora
Author	Tereza Karabatić
Mentor	Milica Mihaljević (mentor)
Committee member	Danica Škara (predsjednik povjerenstva)
Committee member	Tanja Brešan Ančić (član povjerenstva) MBZ: 65656506524
Committee member	Dalibor Vrgoč (član povjerenstva) MBZ: 51091877526
Granter	University of Split Faculty of Humanities and Social Sciences Split
Defense date and country	2024, Croatia
Scientific / art field, discipline and subdiscipline	HUMANISTIC SCIENCES Interdisciplinary Humanistic Studies
Universal decimal classification (UDC)	81 - Linguistics and languages
Abstract	Disertacija predstavlja postupak sastavljanja i analize korpusa temeljenoga na korisničkim objavama iz mrežnih pričaonica hrvatskoga časopisa Bug. Od 19 kategorija Bugovih pričaonica za istraživanje je odabrano sljedećih devet: Internet i mreže, Hardver, Softver, ICT Pro & Biznis, Razvoj, Igre, Mobiteli, Digitalije i Samogradnja. Korisničke su objave iz odabranih pričaonica dohvaćene uporabom autoričina vlastoručno sastavljenoga programa. Prikupljeno je otprilike 3,9 milijuna korisničkih poruka iz nešto više od 140 000 pričaoničkih tema. Najranije su poruke objavljene početkom 2008., a najnovije krajem 2022. godine. Učitavanjem filtrirane i pripremljene građe u korpusni alat Sketch Engine nastao je dovršeni korpus s otprilike 3,7 milijuna poruka. Istraživanje se sastoji od jezične i terminološke analizu. Jezična analiza razmatra promjene na pravopisnoj, morfološkoj, tvorbenoj, sintaktičkoj i leksičkoj razini, dok je terminološka analiza posvećena postanku računalnih naziva preuzimanjem engleskih naziva, prihvaćanju internacionalizama, terminologizaciji i reterminologizaciji. Terminološka analiza također razmatra višečlane nazive, semantičke odnose u korpusu, odnos žargonskih i standardnojezičnih računalnih naziva te uključuje dijakronijsku analizu korpusa. Istraživanje pokazuje da je u posljednjih desetak godina došlo do velike promjene u odnosu na prethodna istraživanja hrvatskih računalnih naziva. Žargonizmi su u korpusu mnogo češći od standardnojezičnih naziva, a gotovo trećina žargonizama pripada području računalnih igara. Istraživanje također ukazuje na to da standardnojezično nazivlje češće ima šire značenje, dok je značenje žargonizama češće uže. U korpusu je pronađen veći broj primjera engleskih riječi nepotpuno prilagođenih hrvatskomu jeziku nego prilagođenica. U usporedbi s prethodnim istraživanjima, mnogi standardnojezični nazivi koje navodi Mihaljević (1990.) nisu potvrđeni u korpusu Bugovih pričaonica, dok je većina žargonizama koje navodi Halonja (2006.) u korpusu potvrđena u značenju koje navodi autor. Razlozi su za to razgovorna priroda korpusa i vrijeme proteklo od prethodne analize. Dijakronijska analiza parova ustaljenih računalnih žargonizama i standardnojezičnih naziva pokazuje da većini u drugoj polovici promatranoga razdoblja opada broj pojavnica u korpusu. Ta pojava odgovara općenitoj težnji u korpusu, a smanjenje broja korisničkih objava u drugoj polovici promatranoga razdoblja onemogućuje provođenje prave dijakronijske analize. 445 Jedan je mogući smjer nastavka istraživanja dopunjavanje postojećega korpusa novom građom i ponovna provedba analize čestoće i dijakronijske analize. Također bi bilo korisno ispitati stavove govornika hrvatskoga jezika o prilagodbi novijih engleskih žargonizama. Još je jedna mogućnost razvoj jezičnoga modela koji bi osim hrvatskih riječi i struktura prepoznavao i engleske, što bi olakšalo pretraživanje hrvatskih riječi u korpusu i pružilo uvid u neologizme u nastanku.
Abstract (english)	The thesis outlines the procedure of assembling and analyzing a corpus based on the user forum posts on the website of Croatian computer magazine Bug. Out of the 19 Bug forum categories available, the following 9 were selected for the research: Internet i mreže (Internet and networks), Hardver (Hardware), Softver (Software), ICT Pro & Biznis (ICT Pro & Business), Razvoj (Development), Igre (Games), Mobiteli (Mobile phones), Digitalije (Digital devices), and Samogradnja (DIY). User posts were fetched from the selected fora using a computer program composed by the author herself. Approximately 3.9 million user messages were gathered from a little over 140 thousand forum topics. The earliest posts were published in early 2008, and the most recent posts were published at the end of 2022. Uploading the filtered and prepared content to the Sketch Engine corpus tool resulted in a finished corpus of approximately 3.7 million posts. The research consists of linguistic analysis and term analysis. The linguistic analysis evaluates the changes at the levels of spelling, morphology, word formation, syntax, and lexix, whereas the term analysis is dedicated to the creation of computer terms by assimilating English terms, accepting international terms, terminologization, and reterminologization. The term analysis also evaluates multi-word terms, semantic relations within the corpus, the relationship between jargon and standard computer terms, in addition to a diachronic study of the corpus. The research shows that major changes have occurred over the past 10-15 years in comparison to past research of Croatian computer terms. Jargon words appear more frequently in the corpus than standard language terms, and almost a third of jargon terms belong to computer game jargon. The research also indicates that standard terms frequently have broader meaning, while jargon words frequently have narrower meaning. The corpus contains more examples of English words that are only partially adapted to Croatian, as opposed to English words that have undergone a full adaptation. Compared to previous research, many standard language terms mentioned by Mihaljević (1990) have not been found in the Bug forum corpus, while most jargon terms mentioned by Halonja (2006) are present in the corpus in the same meaning stated by the author. The 447 reasons for that are the informal nature of the corpus and the time elapsed since the publication of the authors' works. The diachronic analysis of pairs of common computer jargon and standard language terms shows a decrease in the number of corpus tokens for most of the terms. The phenomenon is in accordance with a general trend in the corpus, and the reduction of the number of user posts in the second half of the time period in the analysis precludes a true diachronic analysis. One possible research direction includes extending the existing corpus with new material and conducting the frequency and diachronic analyses again. Surveying the attitudes of Croatian speakers to adapting newer English jargon terms would also be useful. Another possible research direction is developing a language model that would recognize both Croatian and English words and language structures, which would facilitate searching for Croatian words in the corpus and provide insight into emergent neologisms.
Keywords
Keywords (english)
Language	croatian
URN:NBN	urn:nbn:hr:172:339249
Promotion	2024-05-14
Study programme	Title: Postgraduate doctoral studies in humanities Study programme type: university Study level: postgraduate Academic / professional title: doktor/doktorica znanosti, područje humanističkih znanosti, polje interdisciplinarne humanističke znanosti (doktor/doktorica znanosti, područje humanističkih znanosti, polje interdisciplinarne humanističke znanosti)
Type of resource	Text
File origin	Born digital
Access conditions	Open access
Terms of use
Created on	2024-06-27 09:05:28

Search form