A computational morphological analysis of Italian verbal system

Pascoli, Matteo

Verbal inflection in Italian, as it happens in other romance languages, is complex. Its complexity derives not just from the number of forms, each coupled with a distinct set of morphosyntactic properties –mood, tense, person, number– but also, especially, from the variability of said forms. While the process of structuring the verbal lexicon into classes can account for the variability in the ending of the inflected forms (the desinence), it can not account for the variability in the stem part, because there would be too many classes needed to classify these phenomena of allomorphism. The traditional approach requires the speaker to memorize a list of the forms whose stem part is not identical to other forms of the same paradigm, or in particular to the presentation form of the lexeme (infinitive for Italian verbs), as exceptions. In the last twenty years, there has been much interest in studying the paradigmatic distribution of allomorphy, or the way in which the variation (the traditional “irregularity”) between forms of a paradigm (not only of verbs, but also of nouns and adjectives) rests on regular schemes. Said interest has at least three directions. The first one is purely technical, suggested by the desire to pack morphological information as dense as possible to build computing efficient applications that parse, interpret, analyse, translate or produce texts (or speech), without the need to peruse enormous amounts of redundant data. The second one is cognitive: studies on the analogical associations and on how these associations form patterns and schemes can contribute to the insight on how our brain works. The third one is didactical, since the learning of languages can greatly benefit from the knowledge on such patterns of association and their operation. The practical approach of these researches has the goal of analysing the paradigmatical structure of inflection, that is, to decompose the paradigm in zones where the forms are realized on possibly distinct basic stems, and to examine the formal relations (on the phonological level) between these basic stems, studying the chains of predictability that permit us, the speakers, to handle both regular and irregular lexemes. With this work I have carried an analysis of the Italian verbal system. Following a Word and Paradigm point of view, and researches who have studied the inflectional morphology with paradigmatic approach, my goal was to build algorithms and programs to calculate relations between the word forms comprising the whole flexion of a sample of Italian verbs. The set of evaluated verbs covers all models of conjugation, including highly irregular verbs. The contribution to inflectional morphology articulates on these points: – the analysis is on the phonetic forms, as opposed to orthographic forms. I have thus developed a database for generating forms for all paradigm cells in their phonetic transcription. – the analysis is fully automated. I have developed all the algorithms needed in Java language, so that after a change in the database (for further lexemes, or possibly correction of mistakes), or even the switch to another set of data, for analysing other languages, the whole computation takes few minutes to run. – the analysis does not depend on the supposition that inflection happens at the end of the word, or by suffixation: the algorithms developed can work with discontinuous flexion (as found in Semitic languages, or partially in German and Greek, for example) with the same principles.

A computational morphological analysis of Italian verbal system

Pascoli, Matteo

2015

Abstract

Verbal inflection in Italian, as it happens in other romance languages, is complex. Its complexity derives not just from the number of forms, each coupled with a distinct set of morphosyntactic properties –mood, tense, person, number– but also, especially, from the variability of said forms. While the process of structuring the verbal lexicon into classes can account for the variability in the ending of the inflected forms (the desinence), it can not account for the variability in the stem part, because there would be too many classes needed to classify these phenomena of allomorphism. The traditional approach requires the speaker to memorize a list of the forms whose stem part is not identical to other forms of the same paradigm, or in particular to the presentation form of the lexeme (infinitive for Italian verbs), as exceptions. In the last twenty years, there has been much interest in studying the paradigmatic distribution of allomorphy, or the way in which the variation (the traditional “irregularity”) between forms of a paradigm (not only of verbs, but also of nouns and adjectives) rests on regular schemes. Said interest has at least three directions. The first one is purely technical, suggested by the desire to pack morphological information as dense as possible to build computing efficient applications that parse, interpret, analyse, translate or produce texts (or speech), without the need to peruse enormous amounts of redundant data. The second one is cognitive: studies on the analogical associations and on how these associations form patterns and schemes can contribute to the insight on how our brain works. The third one is didactical, since the learning of languages can greatly benefit from the knowledge on such patterns of association and their operation. The practical approach of these researches has the goal of analysing the paradigmatical structure of inflection, that is, to decompose the paradigm in zones where the forms are realized on possibly distinct basic stems, and to examine the formal relations (on the phonological level) between these basic stems, studying the chains of predictability that permit us, the speakers, to handle both regular and irregular lexemes. With this work I have carried an analysis of the Italian verbal system. Following a Word and Paradigm point of view, and researches who have studied the inflectional morphology with paradigmatic approach, my goal was to build algorithms and programs to calculate relations between the word forms comprising the whole flexion of a sample of Italian verbs. The set of evaluated verbs covers all models of conjugation, including highly irregular verbs. The contribution to inflectional morphology articulates on these points: – the analysis is on the phonetic forms, as opposed to orthographic forms. I have thus developed a database for generating forms for all paradigm cells in their phonetic transcription. – the analysis is fully automated. I have developed all the algorithms needed in Java language, so that after a change in the database (for further lexemes, or possibly correction of mistakes), or even the switch to another set of data, for analysing other languages, the whole computation takes few minutes to run. – the analysis does not depend on the supposition that inflection happens at the end of the word, or by suffixation: the algorithms developed can work with discontinuous flexion (as found in Semitic languages, or partially in German and Greek, for example) with the same principles.

Scheda breve

Scheda completa

Scheda completa (DC)

	Corso di studio
	
				Linguistica
Linguistica
			
	Data di pubblicazione
	
				2015
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				morfologia; computazionale; paradigmi; distribuzione; italiano; morphology; computational; paradigm; distribution
			
	Relatore, Supervisor, Advisor o Tutor
	
				Cotticelli-Kurras, Paola; Montermini, Fabio
Cotticelli-Kurras, Paola; Montermini, Fabio
			
	Numero di pagine
	
				157
			
	Collezione di appartenenza
	
				Università degli Studi di Verona

File in questo prodotto:

File	Dimensione	Formato
phd_thesis_mpascoli.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 3.39 MB Formato Adobe PDF Visualizza/Apri	3.39 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/112739

Il codice NBN di questa tesi è URN:NBN:IT:UNIVR-112739