Using Large Language Model-Oriented Retrieval to Interpret Public Botanical Measurement Rules from the Iris Dataset

Carlos Moreno; Haruto Nakamura; Yichen Zhang

pdf

Published: 2025-12-21

Carlos Moreno

University of the West of England

Haruto Nakamura

University of the West of England

Yichen Zhang

University of the West of England

Abstract

Rule-based decision processes are common in scientific and operational settings, but they can be difficult to retrieve accurately when users describe a case in natural language. This study evaluates an LLM-oriented retrieval framework, Iris-RAG, for retrieving encoded botanical identification procedures from the public UCI Iris dataset. The method mirrors graph-encoded retrieval studies: decision procedures are written as directed rule graphs, converted into semantic and structural embeddings, retrieved from a natural-language case description, and evaluated by comparing predicted nodes and edges with ground-truth graphs. The experiment used all 150 public Iris records with five deterministic train-test splits. Iris-RAG achieved classification accuracy of 0.911, node accuracy of 0.881, edge accuracy of 0.846, and MRR of 0.956, outperforming graph-only retrieval and improving edge preservation over keyword and embedding baselines. The results indicate that lightweight graph-aware retrieval can preserve executable rule structure before an LLM produces a final recommendation.

Issue

Vol. 1 No. 2 (2025)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details

Issue

Section