Capital funding is a critical component of school operations, yet there is limited publicly available information on how states define and implement their capital funding policies. Moreover, no studies systematically categorize states based on these policies to enable comparative analysis. This study addresses this gap by leveraging large language models (LLMs) to extract and synthesize state-level capital funding policies from websites, reports, and educational publications. The result is the National Capital Funding Mechanisms and Expenditures (NCFME) dataset— a novel, unified dataset that captures detailed funding mechanisms, allocation criteria, and legal statutes.
Using the NCFME dataset, we propose a novel approach to categorize states with similar capital funding policies. Unlike traditional clustering techniques that rely on numerical data, our method employs LLMs to assess funding similarity based on shared qualitative and quantitative characteristics. This method was validated using expert-labelled operational funding data to both assess the LLM’s accuracy and guide prompt development. This approach enables a more nuanced understanding of policy alignment across states.
The goal of this work is to conduct a comprehensive analysis of state-level capital funding policies, using LLMs to extract and categorize funding mechanisms. The analysis will focus on identifying states with similar funding policies and assessing what state characteristics impact funding structure. Key aspects include LLM-based text extraction and prompt engineering, identifying patterns in resource allocation, and predictive modeling of policy classifications. This work advances the understanding of capital funding mechanisms and introduces a methodology for analyzing complex policy documents, providing a foundation for future research.