BIMgent: Towards Autonomous Building Modeling via Computer-use Agents

Abstract

Existing computer-use agents primarily focus on general-purpose desktop automation tasks, with limited exploration of their application in highly specialized domains. In particular, the 3D building modeling process in the Architecture, Engineering, and Construction (AEC) sector involves open-ended design tasks and complex interaction patterns within Building Information Modeling (BIM) authoring software, which has yet to be thoroughly addressed by current studies.

In this study, we propose BIMgent, an agentic framework powered by multimodal large language models (LLMs), designed to enable autonomous building model authoring via graphical user interface (GUI) operations. BIMgent automates the architectural building modeling process, including multimodal input for conceptual design, planning of software-specific workflows, and efficient execution of the authoring GUI actions.

We evaluate BIMgent on real-world building modeling tasks, including both text-based conceptual design generation and reconstruction from existing building design. The design quality achieved by BIMgent was found to be reasonable. Its operations achieved a 32% success rate, whereas all baseline models failed to complete the tasks (0% success rate). Results demonstrate that BIMgent effectively reduces manual workload while preserving design intent, highlighting its potential for practical deployment in real-world architectural modeling scenarios.

Method

Workflow diagram of the BIMgent framework, showing the Design, Action-Planning, and Execution layers.

Overview of the BIMgent framework. Given the multimodal design requirements provided by the user, the Design Layer first transforms them into a refined floorplan and extracts the necessary semantic and geometric information to guide the modeling process. Based on the interpreted design information and domain knowledge, the Action Planning Layer hierarchically organizes the modeling procedure and decomposes it into detailed substeps, guided by the official software documentation. These substeps are then executed through specialized action workflows in the Execution Layer, each equipped with verification mechanisms. Execution trajectories are stored in a memory module, which supports both self-reflection and cooperation among different parts of the framework.

Experiments

Mini Building Benchmark

We present a Mini Building Benchmark consisting of 25 BIM authoring tasks. These tasks cover five input scenarios:

Conceptual text descriptions of design intent, detailing the building type, room program, and one- to three-storey height.
Hand-drawn sketch floor plans illustrating both regular and irregular shapes, with the specified number of floors.
Unmodified floor-plan images randomly selected from the CubiCasa5K dataset.
The same hand-drawn floor plans with explicit modification requests (e.g. “add a room”).
The same CubiCasa5K floor plans with explicit modification instructions.

Dataset distribution across the five input scenarios for the Mini Building Benchmark. — **Task Distribution in Mini Building Benchmark**

Design Evaluation

We conducted a manual review to evaluate the generated floorplans against six design criteria, comparing our full design layer with two baselines:

SVG floorplans generated by Claude 3.7
Our design layer without the floorplan-interpretation module

The results indicate our method is a promising direction. It consistently scored above 3 / 5 on all criteria and outperformed both baselines. However, we acknowledge that the overall design quality is not yet sufficient for professional use and there remains a significant gap to bridge before the designs meet a standard satisfactory to architects.

Operation Evaluation

Comparison of design-criterion scores across methods

On the Mini Building Benchmark, BIMgent achieves a 32 % end-to-end success rate. Because overall success depends on many intermediate steps, we augmented the 25 complete modeling tasks with subtasks targeting key architectural components:

Design layers — 41 tasks
Walls — 82 tasks
Slabs — 41 tasks
Openings — 82 tasks
Roofs — 25 tasks

A subtask is successful only if all required elements are created and their parameters are correctly configured. BIMgent excels at component-level subtasks, achieving 86.58 % success on walls and 92.68 % on openings. The strongest baseline (Claude 3.7) failed every end-to-end task due to heavy planning and extensive GUI operations, and performed poorly on all subtasks.

Our framework also attains 46.34 % on layer and 60 % on roof tasks (vs. 0 % for the baseline), and significantly outperforms the baseline on walls, slabs, and openings. These results highlight BIMgent's potential to automate tedious modeling operations in professional BIM authoring software.

Conclusion

In this study, we explore an important yet under-explored question: Is it possible to achieve automated modeling in professional design software using GUI agent technology? Although current performance limitations prevent immediate deployment of the proposed approach, this framework lays a foundation for developing domain-specific computer-use agents in high-stakes fields like AEC. The contributions of this study can be summarized as follows:

Compared to existing GUI agents, BIMgent demonstrates the ability to handle open-ended design tasks, bridging the gap in applying GUI agents to professional building design software.
BIMgent addresses the limitations of domain-specific task handling by integrating software documentation into the planning process through a retrieval-augmented strategy.
BIMgent significantly boosts performance by combining dynamic GUI grounding, reflective feedback, and hierarchical planning, thereby overcoming the challenges of complex GUIs and the hundreds of steps required for building modeling tasks.
By evaluating two stages of the framework, we show that BIMgent can complete the entire modeling process autonomously, particularly excelling in the most labor-intensive parts of the modeling workflow.

Future work will focus on extending the framework to other design software to test its generalizability, and on optimizing the agent's step count and execution time to further reduce manual effort. Fine-tuning open-source models may also improve adaptability and performance. Additionally, we plan to develop automated evaluation methods and introduce a dedicated benchmark for more consistent and scalable assessment.

BibTeX

@article{deng2025bimgent,
      title={BIMgent: Towards Autonomous Building Modeling via Computer-use Agents},
      author={Deng, Zihan and Du, Changyu and Nousias, Stavros and Borrmann, Andr{\'e}},
      journal={ICML 2025 Workshop on Computer Use Agents},
      year={2025}
    }

BIMgent: Towards Autonomous Building Modeling via Computer-use Agents

We present BIMgent, the first LLM-powered agentic framework to explore autonomous building modeling through computer control.

BIMgent generates BIM model in Vectorworks via GUI operations.

Abstract

Execution trajectories

User Input

Floorplan Design

Layer Creation

External Walls Creation

Slab Creation

Internal Walls Creation

Windows Creation

Doors Creation

Roof Creation

Method