Statistical and computational approaches to first language acquisition. Mining a set of French longitudinal corpora (CoLaJE)

Briglia, Andrea

This thesis is based on a French datasets composed by seven longitudinal corpora of child spoken language. Each monthly transcript can be turned in a machine-readable spreasheet which is the base of all the computations that have been made, as well as the related graphical visualisations. Hypotheses about phonemes acquisition, phonological acquisition and grammar acquisition have been tested by using tools and concept from descriptive and inferential statistics, regression (chi squared) and clustering. A complete part-of-speech tagging of around 15'000 sentences is proposed to study the emergence of syntax (from one-word to multi-word utterances). A convolutional neural network trained on the same dataset is proposed and the accuracy of its prediction is discussed. A final consideration on the importance of modelling phonetic variations within the syllable level is finally discussed, as the main limit of the thesis has been to having put aside the coarticulatory differences that a given phoneme can have according to the place it occupies in the syllable structure (onset-nucleus-coda).

Statistical and computational approaches to first language acquisition. Mining a set of French longitudinal corpora (CoLaJE)

BRIGLIA, ANDREA

2021

Abstract

This thesis is based on a French datasets composed by seven longitudinal corpora of child spoken language. Each monthly transcript can be turned in a machine-readable spreasheet which is the base of all the computations that have been made, as well as the related graphical visualisations. Hypotheses about phonemes acquisition, phonological acquisition and grammar acquisition have been tested by using tools and concept from descriptive and inferential statistics, regression (chi squared) and clustering. A complete part-of-speech tagging of around 15'000 sentences is proposed to study the emergence of syntax (from one-word to multi-word utterances). A convolutional neural network trained on the same dataset is proposed and the accuracy of its prediction is discussed. A final consideration on the importance of modelling phonetic variations within the syllable level is finally discussed, as the main limit of the thesis has been to having put aside the coarticulatory differences that a given phoneme can have according to the place it occupies in the syllable structure (onset-nucleus-coda).

Scheda breve

Scheda completa

Scheda completa (DC)

	Facoltà/Dipartimento
	
				Dipartimento di Scienze cognitive, psicologiche, pedagogiche e degli studi culturali
			
	Corso di studio
	
				Scienze Cognitive
			
	Data di pubblicazione
	
				9-mar-2021
			
	Lingua
	
				Inglese
			
	Parola chiave
	
				first language acquisition; French language; data mining; NLP, phonetics and phonology;
			
	Relatore, Supervisor, Advisor o Tutor
	
				Jérémi Sauvage (tesi in cotutela con l'Université "Paul-Valéry" Montpellier 3- Francia)
MUCCIARDI, Massimo
			
	Correlatore, Controrelatore, Co-Supervisor,  Co-Tutor o Coordinatori
	
				FALZONE, Alessandra
			
	Collezione di appartenenza
	
				Università degli Studi di Messina

File in questo prodotto:

File	Dimensione	Formato
Thesis_final_Andrea_Briglia.pdf accesso aperto Licenza: Tutti i diritti riservati Dimensione 6.9 MB Formato Adobe PDF Visualizza/Apri	6.9 MB	Adobe PDF	Visualizza/Apri

I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14242/126172

Il codice NBN di questa tesi è URN:NBN:IT:UNIME-126172