Luca Di Liello - Applied Scientist II @ Amazon AGI

Hello!

Welcome to Luca Di Liello's personal portfolio.

Applied Scientist II @ Amazon AGI

About Me

I'm an Applied Scientist II at Amazon AGI, where I develop large-scale information retrieval systems for RAG and pre-train and fine-tune multimodal language models (VLMs) for multimedia analysis and processing at scale. My work spans text, images, and videos, leveraging LLMs to extract insights from diverse data sources. I hold a PhD in Information and Communication Technology and an MS in Computer Science from the University of Trento, Italy. My research centers on Large Language Models for Natural Language Processing and Information Retrieval, with broad interests in Computer Vision, multimodal learning, and constrained generative models.

Frameworks

Python PyTorch TensorFlow JAX vLLM SGLang Docker Kubernetes

Libraries

Transformers Datasets Accelerate Lightning Torchmetrics Fabric DeepSpeed LangChain LangGraph LangFuse

Platforms

AWS GCP

Experience

Applied Scientist II

Amazon AGI

April 2025 – PresentSunnyvale, CA

Design, pre-train, and fine-tune compact Vision-Language Models (VLMs) for large-scale, efficient multimodal information retrieval across diverse data sources.
Develop and optimize Vision-Language architectures for multimodal data processing, quality improvement, and ingestion pipelines at scale.

Applied Scientist II

Amazon AGI

October 2023 – March 2025Milan, Italy

Built state-of-the-art computer vision and natural language processing models for large-scale information retrieval and multimodal re-ranking tasks.
Led fine-tuning of Vision-Language Models for sensitive-content detection and classification, improving precision and recall on large production datasets.

Applied Scientist Intern II

Amazon Alexa AI

February 2022 – July 2022Manhattan Beach, CA

Extended research and development of large language models for Answer Sentence Selection, improving response accuracy and ranking quality through large-scale experimentation.

Applied Scientist Intern I

Amazon Alexa AI

April 2021 – October 2021Milan, Italy

Advanced state-of-the-art Answer Sentence Selection models, contributing to improved natural language understanding and retrieval performance within Alexa systems.

Big Data Analyst Intern

SpazioDati SRL

March 2017 – June 2017Trento, Italy

Analyzed millions of geolocated tweets across Italy to identify correlations between individuals, companies, and employment networks, using large-scale data mining and natural language analysis techniques.

Education

Ph.D. Candidate in Information and Communication Technologies

University of Trento

November 2019 – September 2023

Research focused on Natural Language Processing (NLP) and Information Retrieval (IR), with applications to large-scale multimodal and text-based retrieval systems.
Developed and evaluated novel deep learning architectures for language understanding and retrieval efficiency.
Collaborated with international research groups on applied machine learning and multimodal AI projects.

Master’s Degree in Data Science

University of Trento

August 2017 – October 2019

Specialized in Big Data, Machine Learning, and Statistical Modeling.
Completed thesis research on large-scale data analysis and predictive modeling.
Engaged in interdisciplinary coursework bridging computer science, mathematics, and data-driven decision-making.

Bachelor’s Degree in Computer Science

University of Trento

August 2014 – July 2017

Built strong foundations in Algorithms, Databases, and Web Development.
Developed several software projects in JavaScript, Python, and Java, emphasizing scalable backend systems.
Graduated with distinction, consistently achieving top marks in core computer science courses.

Selected Publications

Context-Aware Transformer Pre-Training for Answer Sentence Selection

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics: ACL 2023

This paper proposes three pre-training objectives designed to mimic the downstream fine-tuning task of contextual Answer Sentence Selection, and shows that their pre- training approaches (applied to RoBERTa and ELECTRA) can improve baseline contextual AS2 accuracy by up to 8% on some datasets.

Notes:

State-of-the-art results on TrecQA and WikiQA benchmarks.

Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection

Conference on Empirical Methods in Natural Language Processing: EMNLP 2022

This paper proposes three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for Answer Sentence Selection, and mitigate the requirement of large labeled datasets.

Paragraph-based Transformer Pre-training for Multi-Sentence Inference

North American Chapter of the Association for Computational Linguistics: NAACL 2022

This paper shows that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks, and proposes a new pre-training objective that models the paragraph-level semantics across multiple input sentences.

Structural Self-Supervised Objectives for Transformers

PhD Thesis: 2023

Self-supervised pre-training tasks that align structurally with downstream applications, reducing the need for labeled data and achieving state-of-the-art results on various benchmark datasets are proposed.

Effective Pretraining Objectives for Transformer-based Autoencoders

Findings of the Association for Computational Linguistics: EMNLP 2022

It is proved that eliminating the MASK token and considering the whole output during the loss computation are essential choices to improve performance and it is shown that ELECTRA benefits heavily from a state-of-the-art hyper-parameters search.

TorchMetrics - Measuring Reproducibility in PyTorch

The Journal of Open Source Software: JOSS 2022

A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mechanisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation (Heusel et al., 2017) will differ based on the specific interpolation method used.

Notes:

TorchMetrics packages is downloaded about 300K times a day (source PyPI)

Efficient Generation of Structured Objects with Constrained Adversarial Networks

The Thirty-Sixth Annual Conference on Neural Information Processing Systems: NeurIPS 2020

Constrained Adversarial Networks are proposed, an extension of GANs in which the constraints are embedded into the model during training and which efficiently generate valid structures that are both high-quality and novel.

Books

Semantic Loss Functions for Neuro-Symbolic Structured Prediction

Book: Compendium of Neurosymbolic Artificial Intelligence

Series: Frontiers in Artificial Intelligence and Applications

This chapter introduces the semantic loss, a training method that incorporates symbolic structural constraints into neural networks to ensure outputs respect underlying dependencies (like valid graph paths). We enhance it with entropy minimization to prefer simpler solutions and demonstrate its versatility by integrating it with both discriminative and generative models, enabling efficient generation of complex structured objects.

Projects

Transformers-Framework

Transformers-Framework helps developers train and evaluate language models across diverse natural language processing tasks including masked language modeling, token detection, answer selection, machine reading, denoising, and summarization. We support various model architectures, from contextual models to ELECTRA-style generator-discriminator configurations, enabling flexible pipeline combinations. Built on PyTorch Lightning, the framework automatically handles multi-device and distributed training while maintaining dual checkpoint compatibility with both PyTorch Lightning (for training restoration) and Transformers (for model sharing).

Semantic Loss

Create Semantic Loss equivalent circuits in PyTorch using SDDs for knowledge compilation.

BLEURT ported to PyTorch

Porting of BLEURT models as HuggingFace Transformers compatible models.