Neural ordinary differential equations (neural ODEs) model the forward pass of a neural network as the evolution of a feature vector \( u(t) \) through a continuous-time dynamical system of the form \begin{equation}\label{eq:neuralODE} \dot{u}(t) := \dv{}{t}u(t) = f(u(t), t, \theta), \quad t\in[0,T], \end{equation} where \( u(t) \in \mathbb{R}^d \) represents the state, and \( f \) is a neural network parameterised by \( \theta \). This formulation generalises residual and recurrent neural networks and has been extensively explored in modern machine learning, achieving notable success across diverse applications with various choices of \( f \), including second-order damped oscillators, state-space models, diffusion-based generative models, and graph neural networks. A neural ODE naturally leads to discrete architectures through numerical integration. For instance, applying the Euler method to \eqref{eq:neuralODE} over a partition \( 0=t_0< t_1 < \dots < t_N =T \) of $[0,T]$ with step size \( h = T/N \) results in a residual neural network \[ u_{k+1} = u_k + h f(u_k, t_k, \theta), \quad k=0,1,\dots,N-1, \] where \( u_k \approx u(t_k) \), and the number of discretisation points \( N \) determines the depth of the neural network. One of the key advantages of formulating neural networks via ODEs is the ability to study their theoretical properties through dynamical systems analysis, particularly in terms of stability, contractivity and conservation laws. These properties are crucial for understanding the robustness of deep learning models, particularly against perturbations in the input data. In this thesis, we focus on designing deep neural networks as discretisations of neural ODEs that are inherently robust against data perturbations, including adversarial attacks, carefully crafted perturbations in input designed to mislead the network into making incorrect predictions. To analyse robustness, we consider the sensitivity of the solution of \eqref{eq:neuralODE} to variations in the initial condition. Under suitable Lipschitz assumptions on the vector field \( f \), a neural ODE satisfies the bound \begin{equation*} \|u_1(t)-u_2(t)\| \leq C \|u_1(0)-u_2(0)\|, \quad t\in[0,T], \end{equation*} for a constant \( C > 0 \), an unspecified norm $\|\cdot\|$ on $\R^d$, and any two solutions \( u_1(t) \) and \( u_2(t) \) corresponding to initial conditions \( u_1(0) \neq u_2(0) \). This constant \( C \) governs the stability behaviour of the neural ODE: \begin{itemize}%[leftmargin=*,noitemsep,topsep=0pt] \item if \( C \gg 1 \), small perturbations in input may amplify significantly, leading to instability; \item if \( C \) is moderate, the neural ODE is stable; \item if \( C < 1 \), the neural ODE is contractive, meaning that perturbations decay over time. \end{itemize} From a numerical ODE point of view, a key challenge is ensuring that numerical discretisations of \eqref{eq:neuralODE} preserve these stability properties. Assuming this is the case, by designing neural ODE models that maintain stability or contractivity, we can enhance robustness against input perturbations, including adversarial ones. This can be achieved through explicit regularisation in the loss function or by imposing specific structural constraints on the network parameters. However, enforcing strong stability or contractivity can degrade model accuracy. In classification tasks, requiring contractivity may significantly reduce the fraction of correctly classified test samples. This trade-off is inevitable: overly accurate models tend to be unstable, while strongly stable models may lack expressiveness. Consequently, an optimal balance must be found, where \( C > 1 \) remains moderate, ensuring both accuracy and robustness. Building on this insight, we propose a two-level optimisation strategy that optimises the network parameters to balance accuracy and stability. Specifically, we formulate an optimisation problem where we seek the closest (structured) perturbation of the weight matrices that enforces a desired stability bound for \( C \). This approach allows us to explicitly control the Lipschitz constant while maintaining high predictive performance. We validate our method through a range of numerical experiments against existing baseline methods to stabilise neural ODEs on standard classification benchmarks, demonstrating that our approach effectively improves robustness while preserving competitive accuracy, outperforming existing approaches. We eventually resort to the approximation theory for neural ODEs to provide a theoretical understanding of the trade-off between accuracy and robustness, occurring as a result of the closest (structured) perturbation of the weight matrices introduced to enforce a desired stability bound for \( C \). If we intuitively think of the approximation error as the quantity that measures the error made when attempting to approximate an unknown function with a neural ODE, we quantify the loss in accuracy by deriving an upper and a lower bound to the approximation error.
Stability and approximation properties of neural ordinary differential equations
DE MARINIS, ARTURO
2025
Abstract
Neural ordinary differential equations (neural ODEs) model the forward pass of a neural network as the evolution of a feature vector \( u(t) \) through a continuous-time dynamical system of the form \begin{equation}\label{eq:neuralODE} \dot{u}(t) := \dv{}{t}u(t) = f(u(t), t, \theta), \quad t\in[0,T], \end{equation} where \( u(t) \in \mathbb{R}^d \) represents the state, and \( f \) is a neural network parameterised by \( \theta \). This formulation generalises residual and recurrent neural networks and has been extensively explored in modern machine learning, achieving notable success across diverse applications with various choices of \( f \), including second-order damped oscillators, state-space models, diffusion-based generative models, and graph neural networks. A neural ODE naturally leads to discrete architectures through numerical integration. For instance, applying the Euler method to \eqref{eq:neuralODE} over a partition \( 0=t_0< t_1 < \dots < t_N =T \) of $[0,T]$ with step size \( h = T/N \) results in a residual neural network \[ u_{k+1} = u_k + h f(u_k, t_k, \theta), \quad k=0,1,\dots,N-1, \] where \( u_k \approx u(t_k) \), and the number of discretisation points \( N \) determines the depth of the neural network. One of the key advantages of formulating neural networks via ODEs is the ability to study their theoretical properties through dynamical systems analysis, particularly in terms of stability, contractivity and conservation laws. These properties are crucial for understanding the robustness of deep learning models, particularly against perturbations in the input data. In this thesis, we focus on designing deep neural networks as discretisations of neural ODEs that are inherently robust against data perturbations, including adversarial attacks, carefully crafted perturbations in input designed to mislead the network into making incorrect predictions. To analyse robustness, we consider the sensitivity of the solution of \eqref{eq:neuralODE} to variations in the initial condition. Under suitable Lipschitz assumptions on the vector field \( f \), a neural ODE satisfies the bound \begin{equation*} \|u_1(t)-u_2(t)\| \leq C \|u_1(0)-u_2(0)\|, \quad t\in[0,T], \end{equation*} for a constant \( C > 0 \), an unspecified norm $\|\cdot\|$ on $\R^d$, and any two solutions \( u_1(t) \) and \( u_2(t) \) corresponding to initial conditions \( u_1(0) \neq u_2(0) \). This constant \( C \) governs the stability behaviour of the neural ODE: \begin{itemize}%[leftmargin=*,noitemsep,topsep=0pt] \item if \( C \gg 1 \), small perturbations in input may amplify significantly, leading to instability; \item if \( C \) is moderate, the neural ODE is stable; \item if \( C < 1 \), the neural ODE is contractive, meaning that perturbations decay over time. \end{itemize} From a numerical ODE point of view, a key challenge is ensuring that numerical discretisations of \eqref{eq:neuralODE} preserve these stability properties. Assuming this is the case, by designing neural ODE models that maintain stability or contractivity, we can enhance robustness against input perturbations, including adversarial ones. This can be achieved through explicit regularisation in the loss function or by imposing specific structural constraints on the network parameters. However, enforcing strong stability or contractivity can degrade model accuracy. In classification tasks, requiring contractivity may significantly reduce the fraction of correctly classified test samples. This trade-off is inevitable: overly accurate models tend to be unstable, while strongly stable models may lack expressiveness. Consequently, an optimal balance must be found, where \( C > 1 \) remains moderate, ensuring both accuracy and robustness. Building on this insight, we propose a two-level optimisation strategy that optimises the network parameters to balance accuracy and stability. Specifically, we formulate an optimisation problem where we seek the closest (structured) perturbation of the weight matrices that enforces a desired stability bound for \( C \). This approach allows us to explicitly control the Lipschitz constant while maintaining high predictive performance. We validate our method through a range of numerical experiments against existing baseline methods to stabilise neural ODEs on standard classification benchmarks, demonstrating that our approach effectively improves robustness while preserving competitive accuracy, outperforming existing approaches. We eventually resort to the approximation theory for neural ODEs to provide a theoretical understanding of the trade-off between accuracy and robustness, occurring as a result of the closest (structured) perturbation of the weight matrices introduced to enforce a desired stability bound for \( C \). If we intuitively think of the approximation error as the quantity that measures the error made when attempting to approximate an unknown function with a neural ODE, we quantify the loss in accuracy by deriving an upper and a lower bound to the approximation error.| File | Dimensione | Formato | |
|---|---|---|---|
|
2025_PhDThesis_DeMarinis.pdf
accesso aperto
Licenza:
Tutti i diritti riservati
Dimensione
1.81 MB
Formato
Adobe PDF
|
1.81 MB | Adobe PDF | Visualizza/Apri |
I documenti in UNITESI sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.
https://hdl.handle.net/20.500.14242/306749
URN:NBN:IT:GSSI-306749