Understanding Knowledge Graphs

Graphs store information as nodes (things) and edges (connections between things). Simple idea, but it changes what questions you can ask your data.

Where graphs shine

A best practice for heat pump adoption references a measure: building insulation. That measure has a prerequisite: energy audit. And it depends on a funding program expiring in 2027.

These connections exist. Relational databases struggle to model them. SQL was built in the 1970s for tabular data, not for inheritance or interconnected structures. Joins get expensive, integrity must be manually enforced. Plain text loses the connections entirely.

The difference isn't just storage, it's what you can ask. Search gives you documents that might be relevant. A knowledge graph gives you answers:

What are the prerequisites for heat pump adoption?
Which measures depend on building insulation?
What funding programs expire this quarter?
Which projects are affected if program X ends?
What's missing before we can start measure Y?

Some domains benefit so evidently from graphs that the question is not if but when:

Drug Discovery, where molecules, targets, pathways and side effects are all interconnected
Content management, where articles refer to concepts, people, processes
Process knowledge, where best practices, guidelines, and how-tos contain implicit logic

The misconception

Most data treated as unstructured comes from structured sources:

Research papers have methodology, results, references
Reports have key figures, time periods, entities
Analytics have dimensions, metrics, segments
Interviews have questions, answers, speakers

We pour structured information into text, store it in documents and then try to extract structure again with Retrieval Augmented Generation. And when LLMs hallucinate, it's often because they lack the structured context to reason accurately.

Structured capture solves the problem at its root, not just the symptom.

The spectrum

Not all information can be structured meaningfully. Or the effort is disproportionate to the benefit.

For example: Constructing a terrace as a sustainable land management practice involves information about the slope, orientation or length among other construction details. You could model each as separate attributes or entities that link to a step within a construction process. But if terrace construction is one of 200 concepts? A description stored within a string attribute attached to the step does the job. In contrast, if your database only contains terrace construction descriptions or is meant for robotics, the level of detail should certainly increase.

The granularity depends on requirements: What needs to be queryable? What only needs to be readable?

Granularity on demand

Full structure

Entities, relations,
attributes, constraints

Description as attribute

Free text as an
attribute of an entity

Queryable & machine-readable

Terrace

→ slope

15°

→ orientation

South-West

→ length

12m

Readable & contextual

The terrace is built at a 15° slope, oriented south-west, approx. 12m long. Particularly suited for orchards due to optimal sun exposure.

Modeling knowledge with hypergraphs

Simple triples subject, predicate, object do not do justice to reality.

Measure X was implemented by Organization Y in Project Z with Budget W and led to Result V.

Traditional graph databases still operate on low-level primitives: nodes and edges. A relationship is always binary, connects two things, nothing more. But reality is messier. A project involves a team, a budget, a deadline, and a client. A measure was implemented by an organization, in a project, with funding, and led to a result. Modeling this with binary edges requires artificial intermediate nodes, called reification, and the structure you wanted to capture gets buried in workarounds.

Hypergraphs solve this. Relations can connect any number of participants directly. They can carry their own attributes. And they can play roles in other relations: a contract isn't just a link between parties, it can itself be referenced by an amendment, a dispute, a renewal. No normalization. No joins. No artificial nodes. Just the structure as it actually exists.