Roberto Amoroso

Research Engineer @ NVIDIA
ELLIS PhD | AI & Computer Vision

NVIDIA

About me

Ciao! I am Roberto Amoroso, a Research Engineer at NVIDIA in Munich, Germany 🇩🇪, where I build Multimodal Video Understanding and Vision-Language Models (VLMs) systems for large-scale text–image and text–video retrieval, with a focus on Autonomous Vehicle applications.

My work focuses on retrieval architectures that align language with visual signals at scale, enabling users and systems to find relevant moments, objects, and scenes in large image and video collections.

I genuinely enjoy turning research ideas into practical, high-impact solutions.

I completed my PhD through the ELLIS program and the International Doctorate in ICT at the AImageLab research group of the University of Modena and Reggio Emilia (UNIMORE) 🇮🇹, under the supervision of Prof. Rita Cucchiara and Prof. Lorenzo Baraldi.

During my PhD, I also completed a PhD internship at LMU - Ludwig-Maximilians-Universität of Munich, in Germany 🇩🇪, focusing on Multimodal LLM for Video Question Answering and Open-vocabulary Segmentation, under the co-supervision of Prof. Volker Tresp.

I was also a Research Scholar at the Networking Research Group in Saint Louis, USA 🇺🇸, working on Super-resolution techniques applied to Internet traffic matrices.

My primary areas of research are Multimodal Video Understanding and Vision-Language Models for information retrieval, with a focus on MLLM-based text-to-visual retrieval architectures for both images and videos. In addition, I have also conducted research on the pre-training and optimization of Transformer-based architecture for image classification, open-vocabulary segmentation, self-supervised learning, deepfake detection of synthetic images, and the development of image watermarking systems.

Feel free to reach me out if you have any questions or curiosities! :)

Interests

Computer Vision
Deep Learning
Machine Learning
Multimodal Video Understanding
Vision-Language Models
Text-to-Visual Retrieval

Education

ELLIS PhD in AI and Computer Vision, 2024
UNIMORE, Italy 🇮🇹 | LMU, Germany 🇩🇪 | NVIDIA, Germany 🇩🇪
MS in Artificial Intelligence, 2020
UNIMORE, Italy 🇮🇹 | AGH, Poland 🇵🇱 | Saint Louis University, USA 🇺🇸
BS in Computer Engineering, 2018
UNIMORE, Italy 🇮🇹

Experience

Research Engineer

NVIDIA

Jan 2024 – Present Munich, Germany 🇩🇪

Research activity focused on the engineering, development, and deployment of Multimodal Video Understanding techniques for Autonomous Vehicles.

Machine Learning Engineering Intern

NVIDIA

Jan 2024 – Oct 2024 Munich, Germany 🇩🇪

Research activity focused on Video-Text Retrieval techniques for Autonomous Vehicles.

PhD Intern

LMU @ Ludwig-Maximilians-Universität of Munich

Jun 2023 – Nov 2023 Munich, Germany 🇩🇪

Research activity focused on the development of novel Multimodal LLM for Video Question Answering and Open-vocabulary Image Segmentation techniques, under the co-supervision of Prof. Volker Tresp.

ELLIS PhD Student | International Doctorate in ICT

AImageLab @ University of Modena and Reggio Emilia

Nov 2021 – Oct 2024 Modena, Italy 🇮🇹

Successfully defended cum laude (with highest honors) my PhD thesis titled “Multimodal Attentive Deep Learning Architectures for Visual-Semantic Understanding”.
The European Laboratory for Learning and Intelligent Systems (ELLIS) supports cutting-edge machine learning research in Europe. ELLIS PhD students (<5% 2021 acceptance rate) are selected on the basis of academic achievement. My research activity is focused on multimodal machine learning, image segmentation, image classification, self-supervised learning, video retrieval, and video question answering.
1st in the ranking of student candidates for the International Doctorate in ICT.

Research Fellow

AImageLab @ University of Modena and Reggio Emilia

Feb 2021 – Nov 2021 Modena, Italy 🇮🇹

Research activity under the supervision of Prof. Rita Cucchiara and Prof. Lorenzo Baraldi, aimed at the study, analysis, and development of novel Computer Vision and Deep Learning techniques.

Research Engineer

CINI - Consorzio Interuniversitario Nazionale per l’Informatica

Nov 2020 – Jan 2021 Modena, Italy 🇮🇹

Development of a web platform for the management of data concerning the activities of European research centers, as part of the HumanE-AI-NET project, funded by the EU Framework Programme for Research and Innovation Horizon 2020.

Research Scholar

Saint Louis University

Mar 2020 – Sep 2020 St. Louis, USA 🇺🇸

Conducted research to develop my MS thesis, winner of the Best Poster Award at CoNEXT 2020.

Erasmus+

AGH Akademia Górniczo-Hutnicza

Sep 2019 – Feb 2020 Krakow, Poland 🇵🇱

Completed the following courses: Advanced Python Programming | Computer Vision | Cybersecurity and Cryptography | Programming in Javascript | Mobile App Development

Honors and Awards

[Apr. 2025] PhD cum laude (with highest honors) for my PhD thesis “Multimodal Attentive Deep Learning Architectures for Visual-Semantic Understanding” @ UNIMORE

[Sep. 2024] Outstanding Reviewer Award @ ECCV 2024

[Sep. 2023] Best Project Award @ ELLIS Summer School on Large-Scale AI for the project “Radioactive Watermarks”, in which we investigate the “radioactivity” of watermarked texts, i.e. , whether they contaminate other models when used as fine-tuning data. The project was evaluated by Tal Hassner (Meta AI) and Marc’Aurelio Ranzato (DeepMind)

[Jul. 2022] ICVSS 2022 Reading Group Competition Award sponsored by Amazon Web Services (AWS) @ ICVSS 2022 for participation in the reading group led by Prof. Dr. Stefano Soatto (AWS and University of California Los Angeles)