Luca Di Liello

Via Sommarive 9 · Trento, TN 38123 · luca.diliello@unitn.it

I'm a PhD student in Information and Communication Technology at the University of Trento, Italy, where I also received my MS in Computer Science. I work mainly on Large Language Models (LLMs) for Natural Language Processing (NLP) and Information Retrieval (IR), but I am also interested in Computer Vision (CV) and constrained generative models.

Interests: Natural Language Processing, Pre-Training of Neural Language Models, Question Answering (Answer Sentence Selection) and Generative models.


Experience

Applied Scientist Intern II

Amazon Alexa

Continuing the work of the previous internship about the design, development and test of large language models for Answer Sentence Selection.

February 2022 - July 2022

Applied Scientist Intern I

Amazon Alexa

Improve the state-of-the-art for Answer Sentence Selection models.

April 2021 - October 2021

Professor of Informatics

IISS Galileo Galilei Bolzano

Teach students the basics of Assembly, C and C++ languages, then give them the necessary notions to implement fast & efficient algorithms.

November 2018 - January 2019

Big Data Analyst Intern

SpazioDati SRL

Analyze million of tweets, geolocated in Italy, to find interesting relations with companies and their empolyees.

March 2017 - June 2017

Education & Research

University of Trento

PhD Candidate in Information and Communication Technologies
Topics Natural Language Processing and Information Retrieval
November 2019 - Now

University of Trento

Master in Data Science
Topics Big Data, Machine Learning
August 2017 - October 2019

University of Trento

Bachelor in Computer Science
Topics Computer Science, Algorithms, Databases Management, Web development
August 2014 - July 2017

IISS Galileo Galilei Bolzano

Expert in Electronics and Telecommunications
Topics Microelectronics, Telecommunications, Robotics, Low Level Programming
August 2009 - June 2014

Skills

Programming Languages
Tools
Workflow
  • Deep Learning models for NLP
  • Generative Models for Computer Vision

Publications

  • Context-aware Transformer Pre-Training for Answer Sentence Selection - 2023

    Answer Sentence Selection (AS2) is a core component for building an accurate Question Answering pipeline. AS2 models rank a set of candidate sentences based on how likely they answer a given question. The state of the art in AS2 exploits pre-trained transformers by transferring them on large annotated datasets, while using local contextual information around the candidate sentence. In this paper, we propose three pre-training objectives designed to mimic the downstream fine-tuning task of contextual AS2. This allows for specializing LMs when fine-tuning for contextual AS2. Our experiments on three public and two large-scale industrial datasets show that our pre-training approaches (applied to RoBERTa and ELECTRA) can improve baseline contextual AS2 accuracy by up to 8% on some datasets.

    PDF

  • Pre-training Transformer Models with Sentence-Level Objectives for Answer Sentence Selection - 2022

    An important task for designing QA systems is answer sentence selection (AS2): selecting the sentence containing (or constituting) the answer to a question from a set of retrieved relevant documents. In this paper, we propose three novel sentence-level transformer pre-training objectives that incorporate paragraph-level semantics within and across documents, to improve the performance of transformers for AS2, and mitigate the requirement of large labeled datasets. Our experiments on three public and one industrial AS2 datasets demonstrate the empirical superiority of our pre-trained transformers over baseline models such as RoBERTa and ELECTRA for AS2.

    PDF

  • Effective Pre-Training Objectives for Transformer-based Autoencoders - 2022

    In this paper, we study trade-offs between efficiency, cost and accuracy when pre-training Transformer encoders with different pre-training objectives. For this purpose, we analyze features of common objectives and combine them to create new effective pre-training approaches. Specifically, we designed light token generators based on a straightforward statistical approach, which can replace ELECTRA computationally heavy generators, thus highly reducing cost. Our experiments also show that (i) there are more efficient alternatives to BERT's MLM, and (ii) it is possible to efficiently pre-train Transformer-based models using lighter generators without a significant drop in performance.

    PDF

  • TorchMetrics - Measuring Reproducibility in PyTorch - 2022

    A main problem with reproducing machine learning publications is the variance of metric implementations across papers. A lack of standardization leads to different behavior in mechanisms such as checkpointing, learning rate schedulers or early stopping, that will influence the reported results. For example, a complex metric such as Fréchet inception distance (FID) for synthetic image quality evaluation (Heusel et al., 2017) will differ based on the specific interpolation method used.

    PDF

  • Paragraph-based Transformer Pre-training for Multi-Sentence Inference - 2022

    Inference tasks such as answer sentence selection (AS2) or fact verification are typically solved by fine-tuning transformer-based models as individual sentence-pair classifiers. Recent studies show that these tasks benefit from modeling dependencies across multiple candidate sentences jointly. In this paper, we first show that popular pre-trained transformers perform poorly when used for fine-tuning on multi-candidate inference tasks. We then propose a new pre-training objective that models the paragraph-level semantics across multiple input sentences. Our evaluation on three AS2 and one fact verification datasets demonstrates the superiority of our pre-training technique over the traditional ones for transformers used as joint models for multi-candidate inference tasks, as well as when used as cross-encoders for sentence-pair formulations of these tasks.

    PDF

  • Efficient pre-training objectives for Transformers - 2021

    Transformer-based neural networks have heavily impacted the field of natural language processing, outperforming most previous state-of-the-art models. However, well-known models such as BERT, RoBERTa, and GPT-2 require a huge compute budget to create a high quality contextualised representations. In this paper, we study several efficient pre-training objectives for Transformersbased models. By testing these objectives on different tasks, we determine which of the ELECTRA model’s new features is the most relevant: (i) Transformers pre-training can be improved when the input is not altered with artificial symbols, e.g., masked tokens; and (ii) loss functions computed using the whole output reduce training time. (iii) Additionally, we study efficient models composed of two blocks: a discriminator and a simple generator (inspired by the ELECTRA architecture). Our generator is based on a much simpler statistical approach, which minimally increases the computational cost. Our experiments show that it is possible to efficiently train BERT-like models using a discriminative approach as in ELECTRA but without a complex generator. Finally, we show that ELECTRA largely benefits from a deep hyper-parameter search.

    PDF

  • Language Transfer for Identifying Diagnostic Paragraphs in Clinical Notes - 2021

    This paper aims at uncovering the structure of clinical documents, in particular, identifying paragraphs describing “diagnosis” or “procedures”. We present transformer-based architectures for approaching this task in a monolingual setting (English), exploring a weak supervision scheme. We further extend our contribution to a cross-lingual scenario, mitigating the need for expensive manual data annotation and taxonomy engineering for Italian.

    PDF

  • Cross-Language Transformer Adaptation for Frequently Asked Questions - 2020

    Transfer learning has been proven to be effective, especially when data for the target domain/task is scarce. Sometimes data for a similar task is only available in another language because it may be very specific. In this paper, we explore the use of machine-translated data to transfer models on a related domain. Specifically, we transfer models from the question duplication task (QDT) to similar FAQ selection tasks. The source domain is the well-known English Quora dataset, while the target domain is a collection of small Italian datasets for real case scenarios consisting of FAQ groups retrieved by pivoting on common answers. Our results show great improvements in the zero-shot learning setting and modest improvements using the standard transfer approach for direct in-domain adaptation.

    PDF

  • Efficient Generation of Structured Objects with Constrained Adversarial Networks - 2020

    Generative Adversarial Networks (GANs) struggle to generate structured objects like molecules and game maps. The issue is that structured objects must satisfy hard requirements (e.g., molecules must be chemically valid) that are difficult to acquire from examples alone. As a remedy, we propose Constrained Adversarial Networks (CANs), an extension of GANs in which the constraints are embedded into the model during training. This is achieved by penalizing the generator proportionally to the mass it allocates to invalid structures. In contrast to other generative models, CANs support efficient inference of valid structures (with high probability) and allows to turn on and off the learned constraints at inference time. CANs handle arbitrary logical constraints and leverage knowledge compilation techniques to efficiently evaluate the disagreement between the model and the constraints. Our setup is further extended to hybrid logical-neural constraints for capturing very complex constraints, like graph reachability. An extensive empirical analysis shows that CANs efficiently generate valid structures that are both high-quality and novel.

    PDF

Interests

Apart from being a reseacher, I enjoy most of my time being outdoors or practicing my favourite sport: Kick Boxing. During the summer, I try to play Volleyball as much as I can while in the winter I'm lucky to live close to many ski facilities. When forced indoors, I love reading and watching tv-series like Rick & Morty, Dr. House and Game of Thrones.