Building veribim-kg: Weaving IFC, Knowledge Graphs, and LLMs for Smarter Construction Compliance

Cover Image for Building veribim-kg: Weaving IFC, Knowledge Graphs, and LLMs for Smarter Construction Compliance
Vasil Nedev
Vasil Nedev

Building veribim-kg: Weaving IFC, Knowledge Graphs, and LLMs for Smarter Construction Compliance

As developers, we often see industries buried in data but starved for insight. The construction sector is a prime example. It's an industry grappling with the collision of legacy processes—spreadsheets, siloed documents, and dense regulations—with modern digital twins like BIM (Building Information Modelling).

Veribim-kg is a proposed architectural blueprint to tackle this. It’s an idea for a Knowledge Layer that answers complex compliance questions by intelligently unifying three powerful technologies: IFC.js for the 3D model, Neo4j for the relationships, and LLMs for the interface.

Let's break down the stack.

1. IFC.js: Parsing the Digital Twin with Open Standards

In construction, the universal language for BIM data is Industry Foundation Classes (IFC), an open, neutral format. Rather than being locked into a specific vendor's API (like Revit), we can parse any IFC file to build our data universe.

This is where IFC.js from That Open Company comes in. It's a game-changing JavaScript/Node.js library that allows us to:

  • Parse and Traverse IFC Hierarchies: We can extract not just geometric data, but more importantly, the rich semantic information: IfcWall, IfcDoor, their properties, and their spatial relationships (IfcRelContainedInSpatialStructure).
  • Build a Web-Based Viewer: It provides the components to create a 3D viewer for visualisation and interaction, making the model the central navigational element of the application.
  • Access the Source of Truth: IFC is the canonical data model. By building on it, we ensure our system is interoperable and not tied to a single design software.

In veribim-kg, IFC.js acts as the data ingestion engine. It processes the IFC file to create a stream of structured entities (walls, doors, beams) and their basic properties, which are then fed into the heart of the system: the Knowledge Graph.

2. Neo4j: Weaving the Web of Compliance with a Knowledge Graph

A list of building elements and a mountain of documents are useless without the context. The real magic lies in the relationships. This is the core of the veribim-kg concept: a Neo4j-powered Knowledge Graph.

We use Neo4j to create a dynamic, connected map of the entire project ecosystem. The parsed IFC entities become nodes, but so do other critical pieces of information:

  • Document Nodes: Entities like Regulation, British Standard, or Client Briefs.
  • Requirement and Guidance Nodes: Elements like Requirement, Guidance, or Information.
  • Evidence Nodes: Records like Certificate, Photograph, or Inspection Report.

The power is in the connections, expressed as Cypher relationships:

// Link a design element to a regulation
(:IfcWall)-[:MUST_COMPLY_WITH]->(:BuildingRegulation {code: "Part B"})

// Connect a requirement to the inspection plan that verifies it
(:ClientRequirement)-[:VERIFIED_BY]->(:InspectionTestPlan)

// Attach photographic evidence to the constructed element
(:IfcSlab)-[:HAS_EVIDENCE]->(:Photograph {url: "...", date: "..."})

This graph becomes the project's compliant knowledge base, answering complex transversal questions that would be impossible with SQL queries across normalised tables.

3. LLM + RAG: The Natural Language Interface to the Graph

With the Knowledge Graph built, we have a rich, connected dataset. But how do we make it accessible to non-technical project managers and engineers? We use a Large Language Model (LLM) with a Retrieval-Augmented Generation (RAG) pattern.

The key here is that we don't fine-tune the LLM on specific project data. Instead, we use it as a supremely capable reasoning engine and natural language interface. Here's the RAG workflow for veribim-kg:

  1. Query Reception: A user asks a natural language question: "What evidence do we have for the fire rating of this wall?"
  2. Intent & Entity Recognition (LLM): The LLM parses the user's intent and identifies key entities (fire rating, wall). It converts this into a structured query or a keyword search.
  3. Graph Retrieval (Cypher): The system uses this structured intent to query the Neo4j graph. It might find the specific IfcWall node, traverse to the relevant Regulation node for fire safety, and then retrieve all Evidence nodes connected to that wall-regulation relationship.
  4. Context Augmentation: The results from the graph—specific regulation clauses, wall properties, and links to photographic evidence—are packaged into a context window.
  5. Informed Generation (LLM): This context is fed back to the LLM with the original question, instructing it to generate a coherent, natural language answer based solely on the provided context. This prevents hallucinations and ensures every claim is grounded in project data.

This RAG pipeline effectively gives the LLM a perfect, real-time memory of your specific project's compliance state, stored within Neo4j.

Architectural Overview: The Data Flow

The conceptual architecture of veribim-kg looks like this:

[IFC File] -> [IFC.js Parser (Node.js)] -> [Neo4j Knowledge Graph]
                                                              |
[User Query] -> [LLM Orchestrator] -> [Cypher Query] -> [Graph Retrieval]
                                                              |
[Final Answer] <- [LLM Synthesis] <- [Context Augmentation]

Conclusion: A Call for Collaboration

veribim-kg is currently a functional concept and a work-in-progress. It represents a compelling use case for a modern tech stack solving a deeply entrenched industrial problem. It leverages IFC.js for open standard data access, Neo4j for contextual intelligence, and LLMs for human-centric interaction.

I'm actively developing this idea and believe it has the potential to provide a fundamentally new tool for construction quality management. If you're a developer fascinated by graph technologies, the AEC (Architecture, Engineering, and Construction) industry, or building complex RAG systems, I'd love to connect and explore this space together.

The code is in its early stages, but the vision is clear. Let's build it at GitHub.