Download PDFOpen PDF in browserA Domain Knowledge-Enhanced Large Vision-Language Model for Construction Site Safety Monitoring15 pages•Published: August 28, 2025AbstractTo address the industry-wide and policy-driven requirements toward construction site safety monitoring, this paper develops a virtual assistant agent based on a large vision-language model (VLM), integrated into on-site surveillance camera system for real-time identification and alerting of unsafe worker behaviors. First, we designed a semi-automatic image-text labeling pipeline, employing in-context learning to enhance data annotation efficiency. Then, we established a two-stage curriculum learning paradigm to deeply embed construction domain knowledge into the VLM, which is eventually embedded into a real-time video analytical engine for safety compliance inspection and interactive visual question answering. The system has been deployed on a real construction site, with around 90% accuracy in identifying violations of work-at-height safety regulations.Keyphrases: construction site safety monitoring, data efficient fine tuning strategy, domain tailored large vision language model, multi modal safety compliance checking, virtual construction safety assistant In: Jack Cheng and Yu Yantao (editors). Proceedings of The Sixth International Conference on Civil and Building Engineering Informatics, vol 22, pages 894-908.
|