entitymatch is a Python package for semantic entity matching with geographic blocking and LLM validation. It links entity records across messy datasets using sentence-transformer embeddings, two-tier geographic blocking, and optional LLM-based validation. Designed for researchers and analysts who need to match organizations, firms, or institutions across administrative datasets where names are inconsistent, misspelled, or abbreviated, the package supports semantic similarity matching that goes beyond exact string comparison, with configurable geographic blocking to reduce the search space and improve precision at scale.
Install via pip:
pip install entitymatch
The capstone course Dr. Howell teaches, PAF 516 | Community Analytics (Course Website), trains students to build fully open-source community analytics dashboards for any location in the United States. Students construct their own composite indices for economic hardship, housing vulnerability, environmental risk, and other policy-relevant dimensions, integrating census data, spatial analysis, and interactive visualizations into stakeholder-ready tools deployed via GitHub Pages. The interactive dashboard below demonstrates the final product.