Using Large Language Model-Oriented Retrieval to Interpret Public Botanical Measurement Rules from the Iris Dataset

Main Article Content

Carlos Moreno
Haruto Nakamura
Yichen Zhang

Abstract

Rule-based decision processes are common in scientific and operational settings, but they can be difficult to retrieve accurately when users describe a case in natural language. This study evaluates an LLM-oriented retrieval framework, Iris-RAG, for retrieving encoded botanical identification procedures from the public UCI Iris dataset. The method mirrors graph-encoded retrieval studies: decision procedures are written as directed rule graphs, converted into semantic and structural embeddings, retrieved from a natural-language case description, and evaluated by comparing predicted nodes and edges with ground-truth graphs. The experiment used all 150 public Iris records with five deterministic train-test splits. Iris-RAG achieved classification accuracy of 0.911, node accuracy of 0.881, edge accuracy of 0.846, and MRR of 0.956, outperforming graph-only retrieval and improving edge preservation over keyword and embedding baselines. The results indicate that lightweight graph-aware retrieval can preserve executable rule structure before an LLM produces a final recommendation.

Article Details

Section

Articles