[*]

## 1. Introduction

Graph databases have moved to the forefront of trendy technologies. There are a lot of mature companies with graph database technologies and a lot of new players seem to be arriving on the scene almost daily and for good reason; graphs are a more natural way to represent data. They excel at modeling and managing the connections between data elements and this opens up new possibilities for what we can accomplish with our data.

In this blog, we will introduce the concept of a weighted graph and weighted graph queries in the InfiniteGraph database and show how InfiniteGraph offers several unique advantages for performing weighted graph queries.

### 1.1 What Is a Graph?

A graph is made up of nodes, sometimes called vertices, and edges. The nodes of a graph typically represent a person, place, or thing and are drawn using a circle. Edges represent relationships between nodes and are drawn as a line between the two nodes.

* Figure 1: A Graph*

### 1.2 What Is InfiniteGraph?

InfiniteGraph is a massively scalable, distributable, graph database. InfiniteGraph is a schema-based graph database. InfiniteGraph has its own query language called “DO”.

**2. Simple Paths**

Consider the graph shown in Figure 2.

*Figure 2: A Simple Graph*

In this graph, all we have are nodes with labels and edges that connect the nodes. If this graph represented people and their social connections to other people, you could figure out how two people are connected. You could figure out that A and C are a separated by minimum of two degrees, and this graph is perfectly fine for this kind of question. However, if this graph represented towns and the roads that connect them, this graph can be used to find routes between towns, but it doesn’t provide enough information to find the best route since it doesn’t provide information on how far apart the towns are. This graph is insufficient for finding the shortest route from one town to another. For this, and many other kinds of problems, we need weighted graphs.

**3. ****Weighted Graphs**

A weighted graph is a graph where a weight value is associated with each edge. The weight values, called edge-weights, typically serves as some kind of cost factor that is incurred when traversing the edge. In our map example above, the cost factor might be the distance associated with each road represented by the edge. Consider the following graph in Figure 3:

*Figure 3: A Weighted Graph*

Here, the edge weight on the edge from A to B represents the distance from A to B and has a value of 4. By summing the edge weights for a particular route, we will know the total distance for that route. To go from A to B to C, we add 4 + 5 and get a total distance of 9. We can compare the total distances for the different routes between a given starting point and endpoint to determine which route is the shortest.

In many cases, the edge-weight will simply be the value of an attribute on the edge data item. In our case, we would have a Road data item with a distance attribute.

In other cases, the edge-weight might be more complicated. It can be useful to be able to build a calculated edge-weight based on a variety of edge attributes or other data features. Consider our map example again. What if the shortest path by distance had considerably lower speed limits or considerably more stoplights? If we cared about travel time instead of travel distance, we might want to compute an edge-weight based on some function of the distance, speed limits, and the number of stoplights along each edge. This graph is shown in Figure 4.

*Figure 4: Weighted Graph, Distance, Speed Limit, and Stoplights*

This is still a weighted graph, but now the definition of the edge-weight is much more complex. How do we deal with this? In many graph databases, the edge-weight must be a single attribute value on the edge data item. If the edge-weight is something more complex, like in Figure 4, we’d have to write and execute an update statement to sweep across the entire graph, perform the required computation on all of the edges, and then store the result as an attribute on each edge. Then, we’d be back in the situation where we could run our weighted graph query using that new attribute as the weight. This is terribly inefficient.

**4. Introducing the User-Defined Weight Calculator**

What if there was another way? What if, instead of having to sweep across our entire dataset updating the edges with a newly computed value, we could define an operation that could be used to compute the edge-weights on the fly at query time? And, what if the operation only needed to be executed for those edges that were actually touched by the navigational query? That would save an enormous amount of computing time. We would no longer have to compute the edge-weight on edges that were going to be trimmed during path processing.

This is exactly what InfiniteGraph does. InfiniteGraph’s DO query language allows the user to define a weight calculator operator and then use it in navigational path queries.

Let’s start with a simple example, using Figure 3, and find the shortest path from A to H.

First, we need to define our weight calculator as follows:

```
CREATE WEIGHT CALCULATOR routeDistance {
minimum: 0,
default: 0,
edges: {
()-[r:Road]->(): r.distance
}
};
```

The weight calculator allows the user to specify a minimum weight, a default weight, and a set of edge patterns and their associated edge-weight functions. When a navigational query is executed using a weight calculator, the weight calculator is invoked only on the edges along candidate paths.

In this example, there is one pattern, `()-[r:Road]->()`

, in the edges block that matches all Road edges and the edge-weight function states that the weight of edges of type Road, represented by the variable “r”, shall be defined as r.distance, i.e., the value of the distance attribute on the Road edge.

The edges block within the weight calculator can be used to define several different edge-weight calculators based on the type of the “from” node, “to” node, edge type, or values on them. If our schema contained definitions for Path, Road, and Highway, we might define a route distance calculator as follows with different multipliers based on the type of the edge:

```
CREATE WEIGHT CALCULATOR routeDistance {
minimum: 0,
default: 0,
edges: {
()-[p:Path]->() : p.cost * 2.0,
()-[r:Road]->() : r.cost * 1.0,
()-[h:Highway]->(): h.cost * 0.5
}
};
```

We can now use the route distance weight calculator to execute a query to find the shortest path. The query is as follows:

```
MATCH m = LIGHTEST routeDistance
((:Town {name == ‘TownA’})-[*]->(:Town {name == ‘TownH’}))
RETURN m;
```

Here, we want to find the path from **TownA **to **TownH **through any number of edges (“-[*]->”) and we want the path with the lightest weight based on the **route distance** weight calculator.

When we execute the query, we get the following results:

```
{
_Projection
{
m:WALK
{
length:2, <—- The number of edges in the path.
weight:11.0000, <—- The weight of the lightest path.
edges:
{
EDGE
{
weight:3.00000,
from:3-3-1-4, <—- TownA
fromClass:’Town’,
attribute:’to’,
edgeData:3-3-1-35,
edgeDataClass:’Road’,
to:3-3-1-10, <—- TownD
toClass:’Town’
},
EDGE
{
weight:8.00000,
from:3-3-1-10, <—- TownD
fromClass:’Town’,
attribute:’to’,
edgeData:3-3-1-41,
edgeDataClass:’Road’,
to:3-3-1-18, <—- TownH
toClass:’Town’
}
}
}
}
}
```

The following image shows the object IDs of the nodes in the database:

*Figure 5: Weighted Graph Showing Object IDs*

**5. ****A More Complex Example**

How complex can a weight calculator be? The answer to that question is *very.*

Graphs are great for keeping track of social networks including who emailed who or who called who and when. In this example, we will build a model of telephone calls between friends and then demonstrate the real power of the weight calculator.

### 5.1 Our Data Model

The thing about building a graph of interactions like emails, text messages, or telephone calls is that you don’t want to create a new edge between two people for every interaction. What you want to do is to create an edge that represents the fact that two people have interacted and then adds a detailed record to that edge to document each interaction.

The graph representation looks like this:

*Figure 6: CallDetails on a Call*

In DO, the schema definition looks like this:

```
UPDATE SCHEMA {
CREATE CLASS Phone {
number : STRING,
callsIn : List {
Element: Reference {
EdgeClass: Call,
EdgeAttribute: from
},
CollectionTypeName: TreeListOfReferences
},
callsOut : List {
Element: Reference {
EdgeClass: Call,
EdgeAttribute: to
},
CollectionTypeName: TreeListOfReferences
}
}
CREATE CLASS CallDetail {
start : DateTime,
end : DateTime,
duration : REAL { Storage: B32 }
}
CREATE CLASS Call {
from : Reference { referenced: Phone, Inverse: callsOut },
to : Reference { referenced: Phone, Inverse: callsIn },
callDetails : List {
Element: Reference {
referenced: CallDetail
},
CollectionTypeName: TreeListOfReferences
}
}
};
```

Note that in the Call class, there is an attribute, call details; that is a list of references to CallDetail objects.

The graph of our data looks like this:

*Figure 7: A Call Graph with CallDetails*

In our example, there are three Call Detail objects hanging off of each edge. The AB edge has three CallDetail objects in the call details list and those three CallDetail objects have durations of 18.5, 22.3, and 26.0.

Let’s define a weight calculator that computes the call volume for each edge as the sum of the CallDetail durations:

```
CREATE WEIGHT CALCULATOR callVolume {
minimum: 0,
default: 0,
edges: {
()-[c:Call]->(): SUM(c.callDetails.duration)
}
};
```

In this example, “c” is the current Call edge object. The weight of the edge becomes the sum of the CallDetail duration values in the call details list in “c”. For the AB edge, this would be 18.5 + 22.3 + 26.0 = 66.8.

We can then use our call volume weight calculator to find the path from A to C with the lightest call volume with the following query:

```
MATCH m = LIGHTEST callVolume
((:Phone {number == ‘555-1111’})
-[*..10]->
(:Phone {number == ‘555-3333’}))
RETURN m;
```

The resulting path with its calculated weight is shown below:

```
{
_Projection
{
m:WALK
{
length:2, <— The length of the lightest path.
weight:31.0000, <— The weight of the lightest path.
edges:
{
EDGE
{
weight:16.2000,
from:3-3-1-4, <— Phone A (555-1111)
fromClass:’Phone’,
attribute:’callsOut’,
edgeData:3-3-1-32,
edgeDataClass:’Call’,
to:3-3-1-10, <— Phone D (555-4444)
toClass:’Phone’
},
EDGE
{
weight:14.8000,
from:3-3-1-10, <— Phone D (555-4444)
fromClass:’Phone’,
attribute:’callsOut’,
edgeData:3-3-1-40,
edgeDataClass:’Call’,
to:3-3-1-8, <— Phone C (555-3333)
toClass:’Phone’
}
}
}
}
}
```

As you can see, the lightest path based on the call volume weight calculator is the path from A to D to C, with a length of 2 and weight of 31.0000.

This is just the beginning. The function definitions inside the edge-weight calculator can use any of the available DO operators to build up a complex operations to calculate the edge-weight. And, it is all done without updating the data in the graph.

**6. ****Advantages**

There are many advantages to the InfiniteGraph user-defined weight calculator:

- The edge-weight can be taken directly from the value of an edge attribute.
- The edge-weight can be computed from one or more edge attributes.
- The edge-weight can be computed from a wide range of data associated with the edge as was shown in the CallDetail example.
- Weight calculators are easy to define and modify and support tying out different scenarios quickly.
- The cost of executing the weight calculator is only incurred on those edges that are actually examined by the navigational query.
- InfiniteGraph filters paths on the fly, not after all potential paths have been found. This can lead to a significant performance advantage over other graph database products.

**7. ****Conclusion**

Nearly all graph databases have some support for weighted queries. However, most graph databases use implementations that limit the weight of an edge to the value of an attribute in the edge data. InfiniteGraph’s weight calculator provides a powerful yet simple and effective way to define an operator that can be used to compute both simple and complex edge weights on the fly.

[*]

[*]Source link