1

Analyzing AI Evaluation Benchmarks Through Information Retrieval and Network Science

Many analyses have been performed on Information Retrieval (IR) evaluation benchmarks. Benchmarking also plays a central role in evaluating the capabilities of Large Language Models (LLMs). In this paper, we apply an IR approach to LLM evaluation. …

Large Language Models as Assessors: On the Impact of Relevance Scales

Traditionally, relevance judgments have relied on human annotators, but recent advances in Large Language Models (LLMs) have prompted growing interest in their use as a proxy for relevance judgments. In this setting, a key yet underexplored factor is …

AIDME: A Scalable, Interpretable Framework for AI-Aided Scoping Reviews

Scientific publishing is expanding rapidly across disciplines, making it increasingly difficult for researchers to organize, filter, and synthesize the literature. Systematic reviews address this challenge through structured analysis, but the early …

Efficiency and Effectiveness of LLM-Based Summarization of Evidence in Crowdsourced Fact-Checking

Assessing the truthfulness of information is a critical task in fact-checking, and is typically performed using binary or coarse ordinal scales (2-6 levels), though fine-grained scales (e.g., 100 levels) have also been explored. Magnitude Estimation …

PILs of Knowledge: A Synthetic Benchmark for Evaluating Question Answering Systems in Healthcare

Patient Information Leaflets (PILs) provide essential information about medication usage, side effects, precautions, and interactions, making them a valuable resource for Question Answering (QA) systems in healthcare. However, no dedicated benchmark …

The Magnitude of Truth: On Using Magnitude Estimation for Truthfulness Assessment

Evaluating the truthfulness of online content is critical for combating misinformation. This study examines the efficiency and effectiveness of crowdsourced truthfulness assessments through a comparative analysis of two approaches: one involving …

Large Language Models for Combinatorial Optimization: A Systematic Review

This systematic review explores the application of Large Language Models (LLMs) in Combinatorial Optimization (CO). We report our findings using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. We conduct a …

In Crowd Veritas: Leveraging Human Intelligence To Fight Misinformation

The spread of online misinformation poses serious threats to democratic societies. Traditionally, expert fact-checkers verify the truthfulness of information through investigative processes. However, the volume and immediacy of online content present …

Agent-Based Healthcare Chatbots for Regional System Services: A Case Study in the Friuli-Venezia Giulia Region

The scholarly publishing process relies on peer review to uphold the quality of scientific knowledge. However, challenges such as increasing submission volumes and potential malicious behavior undermine its effectiveness. In this study, we evaluate …

Search Trajectory Networks Applied to a Real-World Parallel Batch Scheduling Problem

We investigate solution methods for the Oven Scheduling Problem (OSP), a parallel batch scheduling optimization problem in semiconductor manufacturing, using Search Trajectory Networks (STNs). STNs are a recently introduced tool to analyze and …