Sažetak | Temeljni je cilj ovoga rada opis unutarleksičkih i međuleksičkih struktura hrvatskoga jezika s naglaskom na imeničku sufiksaciju. Opis unutarleksičkih struktura podrazumijeva opis morfološke strukture hrvatskih imenica, a opis međuleksičkih struktura temelji se na morfotaktičkome modelu kojim se pokazuju tvorbena povezanost hrvatskih leksema i ograničenja koja utječu na mogućnost sufiksalne tvorbe. Za potrebe ovoga rada prikupljene su najčestotnije imenice iz dvaju najvećih mrežno dostupnih korpusa hrvatskoga jezika – Hrvatskoga nacionalnog korpusa i hrWaC-a. Iz svakoga od tih korpusa izdvojeno je 5.000 najčestotnijih imenica jednostavnom pretragom s pomoću popisa riječi. Izbacivanjem duplih unosaka i dodatnim ručnim čišćenjem dobiveno je 5.536 najčestotnijih hrvatskih imenica za morfološku i tvorbenu analizu. Rad je podijeljen u tri dijela. U prvome se dijelu u okviru temeljne lingvističke teorije za hrvatski jezik utvrđuju načela morfološke i tvorbene analize hrvatskih imenica. Unutar morfološke analize razlikuju se morfska i morfemska analiza. Morfska analiza podrazumijeva raščlambu površinske postave riječi, a morfemskom se analizom površinski morfovi u dubinskoj postavi spajaju na temeljni morf koji služi za prikaz morfema. Tvorbenom analizom utvrđuju se polazišna riječ u tvorbi i tvorbeni afiksi. Zatim se na temelju rezultata morfološke i tvorbene analize donose podatci o najčestotnijim hrvatskim korijenima u tvorbi imenica i usporedba s najčestotnijim korijenima u tvorbi glagola, kao i o najčestotnijim sufiksima i njihovim kombinacijama u morfološkoj strukturi imenica. Takvi podatci dosad nisu postojali za hrvatski jezik. Na kraju prvoga dijela opisuju se postojeći računalni resursi na tvorbenoj razini, među kojima i CroDeriv ‒ prvi javno dostupan računalni resurs koji se bavi morfologijom hrvatskoga jezika na tvorbenoj razini. Razrađuje se struktura rječničke natuknice u CroDerivu i pokazuje se kako rezultati ovoga rada obogaćuju računalni prikaz hrvatske morfologije. U drugome je dijelu analizirana polisemna struktura 19 imeničkih sufikasa koji se pojavljuju u sufiksalnim kombinacijama u morfološkoj strukturi hrvatskih imenica. Oblikovan je model opisa polisemnih struktura hrvatskih afikasa koji se temelji na analizi velikoga broja tvorenica tvorenih istim sufiksima slijedeći jasno definirane postupke koji osiguravaju ujednačenost analize i čiji su rezultati 1) primjenjivi u oblikovanju morfotaktičkoga modela za hrvatski jezik te 2) prikladni za računalni opis hrvatske tvorbe. Radi se o pristupu opisu značenjske strukture hrvatskih sufikasa za koji se jasno utvrđuju načela opisa i čija se primjenjivost zatim provjerava analizom većega broja sufikasa. Analizom je utvrđeno koji sufiksi mogu izražavati iste značenjske kategorije, odnosno koji su sufiksi u nekome od svojih značenja bliskoznačni. Osim toga, utvrđeno je koji se od analiziranih sufikasa mogu međusobno kombinirati, čime je pokazano kako se ostvaruje vrlo malen broj mogućih sufiksalnih kombinacija u hrvatskome jeziku. U trećemu se dijelu analiziraju utvrđene sufiksalne kombinacije u morfološkoj strukturi hrvatskih imenica. Opisuju se postojeći pristupi poretku afikasa u jezicima svijeta i pokazuje se kako su svi osim kognitivnoga modela binarnih kombinacija sufikasa neprimjenjivi na hrvatsku jezičnu građu. Stoga se upravo taj model primjenjuje na hrvatski jezik te se pokazuje kako je on uistinu primjenjiv. Međutim, ukazuje se i na to kako nije dostatan za cjelovit opis poretka sufikasa u morfološkoj strukturi hrvatskih imenica, nego mora biti nadopunjen pojedinačnim fonološkim, morfološkim, sintaktičkim, značenjskim i etimološkim načelima. Osim toga, pokazano je kako se načela nerijetko primjenjuju hijerarhijski te kako postojeća riječ može utjecati na odabir netipičnoga sufiksa pri tvorbi nove riječi. Prvim prikazom načela koja djeluju na poredak afikasa u hrvatskome jeziku potvrđena je glavna hipoteza rada: da poredak afikasa u hrvatskome jeziku nije arbitraran, odnosno da se mogu utvrditi načela koja utječu na to da se ostvaruju samo određene kombinacije morfema |
Sažetak (engleski) | Although Croatian is a morphologically rich language, overall and detailed descriptions of morphological properties of Croatian language are scarce, especially when it comes to morphological and word-formation analysis. The main goal of this thesis is to overcome this shortfall by describing intralexical and interlexical structures of Croatian nouns derived via suffixation. In order to achieve this goal, the thesis is divided into three major parts:1. Morphological and word-formation analysis of Croatian nouns, 2. Semantic description of Croatian nominal suffixes, and 3. The principles of the morphotactic model of the Croatian nominal suffixation. Although models of affix ordering exist for a wide range of languages (cf. Manova and Aronoff 2010), none of these models has been applied to Croatian language data, mainly due to the non-existence of morphosemantically analysed lexemes. The first two parts of the thesis are thus preparatory steps for the morphotactic model in the third part of the thesis. Our starting hypothesis is that affix ordering in Croatian is not arbitrary and that principles governing the possible morpheme combinations can be established. 1. Morphological and word-formation analysis of Croatian nouns In the first major part of this thesis, we extracted 5,000 most frequent nouns from the two major Croatian corpora ‒ the Croatian National Corpus (Tadić 2009a) and the Croatian Web Corpus hrWaC (Ljubešić and Erjavec 2011). The nouns were obtained via a simple wordlist search and manually cleaned. The initial set consisted of 5,536 both motivated (derived) and non-motivated (base) Croatian nouns. Only after the nouns in this initial set were morphologically analysed and their word-formation patterns were established were we able to extract suffixed nouns, which were the main focus of further analysis. However, in order to perform morphological and word-formation analysis, it was necessary to establish principles of analysis. Our model is formulated within the framework of basic linguistic theory (Haspelmath 2009; Dryer 2006; Dixon 1997), the descriptive and nonrestrictive theory which enables the description of the wide range of grammatical phenomena. Our approach is a formal, morpheme-based approach. It considers morphology to be a part of grammar, although it allows that there are some idiosyncratic combinations stored directly in the lexicon. Moreover, it includes both phonological (e.g. minimal pairs, complementary distribution) and syntactic (e.g. affix ordering) formalisms. Finally, the model presupposes that meaning, especially word-formation meaning, is usually incremental, i.e. compositional. The principles of morphological and word-formation analysis established in this thesis are the first major outcome of our research. We have differentiated between the morphological, morph and morpheme analysis on the one hand and the word-formation analysis on the other hand. Morphological analysis is a hypernym and includes both morph and morpheme analysis. The morph analysis is the analysis of the surface form of a word, and the morpheme analysis connects surface morphs with their basic morphs in the deep layer. We have also emphasised that morphological analysis has to include both lists of morphemes and rules which determine their combinations as a precondition for building a morphotactic model. Moreover, we have emphasised that it is necessary to distinguish between morphological and word-formation analysis. The morphological analysis enables us to determine intralexical structures of the word analysed, while word-formation analysis enables the description of interlexical structures within word-formation families. The established principles were applied to the analysis of the initial set of Croatian nouns. The morphological analysis resulted in the list of Croatian nominal morphemes, both lexical and affixal, and possible suffixal combinations. We have demonstrated that only a small number of the possible suffixal combinations actually occurs. Moreover, only ca. 20 suffixal combinations occur in the morphological structure of more than 10 derived words. Thus, we have confirmed the first hypothesis: only some of the possible suffixal combinations occur and some of them are more frequent than others. We have also presented the most frequent lexical morphemes and the most productive nominal suffixes. The data on word-formation families enabled the interlexical description of the Croatian lexicon that had not been possible earlier. The second major outcome of this thesis is the computational representation of morphological and word-formation analysis of Croatian nouns in CroDeriv, Croatian derivational lexicon. CroDeriv consisted only of morphologically analysed verbs, and our analysis enabled its further expansion in two directions: 1) we have included another major POS ‒ nouns ‒ in the lexicon, and 2) we have expanded the structure of the entry with the word-formation analysis and the affixal senses. These expansions will make CroDeriv a unique morphological resource which exhibits a thorough morphological description of one of the world languages. As a final step of the first part of the research, we have extracted nominal suffixes that occur in the confirmed suffixal combinations for the semantic analysis in the second part of the thesis. 2. Semantic description of Croatian nominal suffixes The second major part of this thesis consists of the semantic analysis of Croatian nominal suffixes. First, we have shown that there is no coherent theoretical approach to affixal semantics in the contemporary linguistic literature. The most systematic model is the model presented in (Bagasheva 2017). However, the principles which govern the determination of affixal senses are not explicitly stated. Thus, we have presented our own approach to the affixal semantics. It is based on the explicitly formulated principles following the regular polysemy approach (Apresjan 1974) and based on the analysis of the numerous nouns derived via same suffix. We determined 27 semantic categories which can be realised by Croatian nominal suffixes. This approach was immediately applied to a wide range of Croatian nominal suffixes, and their polysemous structures were determined. The semantic analysis of Croatian suffixes confirmed our second hypothesis: suffixes that can combine with a wide range of other suffixes and bases have more complex polysemous structures. The obtained results were used along with the results of the morphological and the word-formation analysis in the first part of the thesis to establish the principles governing Croatian nominal morphotactics in the third part of the thesis. 3. The principles of the morphotactic model of the Croatian nominal suffixation In the first part of this section, we have described the existing models of affix ordering and general principles governing affix order in the languages of the world. We have shown that only the cognitive model of binary suffix combinations, presented in (Manova 2011a), has not been challenged so far. Moreover, it is the only model that was built on the Slavic data. Thus, we used this model as the starting point for the morphotactic model of the Croatian nominal suffixation. This model was complemented with language-specific principles discovered in the Croatian data. We have focussed our analysis on the most problematic cases: the suffixal combinations in which there are two nominal suffixes with the same semantic properties and in which it cannot be stated that one of these suffixes is applied by default. For example, Croatian suffixes -lo and -lica are both nominal suffixes with instrumental meaning and can follow the verbal thematic suffix, i.e. they can occur in the same suffixal combinations, e.g. sjed-a-lo ʻseatʼ ~ sjed-a-lica ʻseatʼ. These examples show that the governing principles of Manova’s model, although functional in Croatian, are not sufficient to describe Croatian morphotactics in full. We have thus analysed the semantic categories which can be expressed by several suffixes: ʻagent/professionʼ, ʻlocationʼ, ʻinstrumentʼ, ʻproperty/characteristicʼ to determine additional principles. Moreover, we have extended our analysis to include not only binary suffixal combinations but also the base ‒ suffix combinations to gain deeper insight into intralexical structures of Croatian nouns. Finally, on the basis of the data analysed in this thesis, we have determined phonological, syntactic, morphological, semantic and etymological principles governing affix ordering in Croatian. These principles enable some of the possible combinations and restrict others. We have also emphasised that the principles are hierarchically governed and that the existing words can block the application of the default suffix, thus resulting in atypical combinations. The examples of mirror-image combinations have additionally confirmed that both the order and the meaning of all morphemes in the morphological structure contribute to the meaning of the derived word. Finally, the principles of affix ordering have confirmed the main hypothesis of this thesis: affix ordering in Croatian is not arbitrary and principles governing the possible morpheme combinations can be established. |