Aller au contenu principal

Public Thesis Defense of Maxime ZANELLA - ICTEAM

sst |

sst
3 March 2026 , modifié le 26 February 2026

Inductive and transductive learning with vision-language models in low-shot regimes

Tuesday March 3rd, 2026 - 4pm - Auditorium SUD11 - Place Croix du Sud, 1 - 1348 Louvain-la-Neuve

Vision-Language Models (VLMs) have revolutionized open-vocabulary recognition, yet adapting them to specialized domains under data-scarce conditions remains a significant challenge. This thesis investigates efficient strategies to bridge the gap between inductive fine-tuning and transductive inference, focusing on minimizing supervision and computational costs. We first address inductive few-shot learning with CLIP-LoRA, a parameter-efficient fine-tuning method. By injecting trainable low-rank matrices into the encoder, it outperforms existing approaches while eliminating dataset-specific hyperparameter tuning. For scenarios restricted to single-sample inference, we propose MTA (MeanShift for Test-time Augmentation). Formulated as a robust variation of the MeanShift algorithm, this training-free method refines predictions through iterative mode-seeking and automatic importance weighting of augmented views. Expanding into the transductive paradigm, we introduce TransCLIP. This framework explicitly models the underlying structure of unlabeled test batches by fitting a Gaussian Mixture Model (GMM), constrained by a text-driven Kullback-Leibler (KL) regularization term. This dual objective dynamically aligns visual statistics with textual priors, achieving substantial gains in both zero-shot and few-shot settings. Finally, to address realistic deployment challenges, we propose StatA (Statistical Anchor). We demonstrate that existing transductive methods fail under non-i.i.d. data streams and introduce a regularization technique based on the KL divergence with respect to initial text priors. Collectively, this thesis provides a suite of transparent, low-compute algorithms that, through the use of fixed hyperparameters, ensure operational simplicity and robust performance across challenging and diverse visual recognition benchmarks.

 

Jury members

Prof. Benoît Macq  (UCLouvain) Supervisor

Prof. Saïd Mahmoudi  (UMons) Supervisor

Prof. Christophe Craeye (UCLouvain) Chairperson

Prof. Bernard Gosselin (UMons) Secretary

Prof. Christophe De Vleeschouwer (UCLouvain)

Prof. Pierre Geurts (ULiège)

Prof. Ismail Ben Ayed (ETS Montréal, Canada)

 

Pay attention : the public defense of Maxime ZANELLA will also take place in the form of a videoconference