Intro to Graph Databases: More than just GraphQL

Developer with a passion for elegant solutions and design.

Nowadays, you’ll see GraphQL used in a lot of production and high load projects—and for good reason. The technology provides a simple and elegant method of implementing to APIs.

I’m sure there are a lot articles giving a great overview of GraphQL APIs and comparing them to REST/SOAP and/or RPC (continue the list with your personal favorite). But, here, we’re talking about graph databases, so let’s start with the basics.

When conversation lands on graph databases most people say:

Graph databases? Yeah, we use GraphQL.

To some degree they’re correct, but let’s dig deeper to understand what GraphQL and graph databases have in common and how are they different.


Before we go any further let’s set up a foundation. What’s a graph? In general, it’s simple: treat relationships between entities, elements, and nodes as important as those elements themselves.

Simple graph

Graphs can be designed in different shapes and forms. There are standards which you might like to follow: some allow you to put properties inside of an entity (e.g., property graphs), others insist on making every property an entity and connecting them with relationships (e.g., RDF or other semantic graphs).

Property graph vs RDF
Property graph vs RDF

Graph databases allow you to store data in such representations.

Relation databases – we’ve be doing it for ages

I can hear a valid question: what’s the difference with classic SQL? We’ve been doing relationships there for ages and they seem to work fine. We have primary keys and even triggers which enforce these relations between elements in different tables. In case we need many-to-many relationships (a well-known case), just create a join/pivot/many-to-many table and you’re fine.

Relational many-to-many schema

Well, yes and no. It’s true that SQL allows us to represent relationships, but it’s very artificial—maybe even imaginary. All you do is say that this column in a table is used to find a row in another table. The word find here represents the main idea: there is no direct link or connection. The PropertyID of an entity is used to locate it in another table. If the table has good indexing, it’s very fast, but the idea stays the same. It almost as simple as:

Find me all ROWs in ROOMs table where id is 1

In short, we can say that there are no real connections—the database just gives us a direction to potentially find elements.

This comes with interesting issues and limitations. Especially when we start to use ORM (Object Relational Mapping) frameworks to abstract from databases and operate on entities and between them. When you get very comfortable with a tidy, concise codebase, it hits hard with performance issues (just to remind you that joins are going crazy under the hood).


Now go back to GraphQL. As we now know, it’s a graph query language, but we use it with non-graph databases like SQL—this doesn’t click. And we can see the following schema as a common way to implement GraphQL in many projects:

Common GraphQL schema

As you can see, the query is converted from GraphQL to SQL, then the database is queried. This is a very popular use-case. It’s incredible that GraphQL allows us to use graph queries on clients without switching to a graph database in our stack. This massively lowers the entry threshold for the technology and allows us to gradually implement GraphQL in our production projects.

Yet working with relational databases this way could result in performance issues and limitations. We simply don’t use the power of graphs (more on this a bit later).

Graph GraphQL schema

Now, using GraphQL with graph databases is a much more natural experience. Data is requested and processed in the graph way. In fact, it’s possible to configure the API gateway to allow requests to be directly routed to the database without interference with business logic (e.g., public API).

Graph databases

Hooray! It’s only 3.7k characters into the article and we’re already talking about graph databases. The main idea of graph databases is that relationships (edges) are just as important as entities (nodes). It doesn’t sound like a big difference, but boy is it ever.

It’s simple: there are real relationships in place. This provides an interesting performance effect. When one object has a “connection” to another, you can think about it as a pointer in C. It’s fast, reliable, and you get directly to the element you need.

In SQL we’d often cheat to achieve a similar effect by duplicating the necessary information in several tables, so we wouldn’t need to join tables, which introduces data duplication and denormalization.

In fact, sometimes relationships are way more important than properties of entities. Think about social networks or page ranks.

Graph thinking

As you already noticed, graph database design is quite different from relational. I would argue it’s way more natural. Just think about it.

Whether you need to discuss a complex system with your peers or you’re simply trying to make sense of one, there’s a great chance you’d start drawing entities of system and relationships between them. It could be a piece of paper, whiteboard, or even fancy UML suite. Guess what you’re doing? Constructing a graph.

Notice how you never think about foreign keys and join tables, unless there is a case of professional deformation or a database design session? That’s the beauty of graphs.

Use cases

Sounds great, let’s use it everywhere! Of course, graphs are not a silver bullet, but provide you advantages in suitable situations. What are such situations? Let’s have a look:

•          Highly connected data – when relationships are extremely important. You could think about a knowledge graph.

•          Fraud detection – searching for hidden connections is a great use case and using graphs for connection hopping is a very cheap operation.

•          AI / ML – a neural network is a graph itself and it makes perfect sense to use a graph database for interesting model calculations and research.

•          Recommendations engine – in essence it’s a very similar use case to AI, but it’s a very specific type of project with constant adaptation, so I decided to outline it.

•          Network/Operations mapping – finding bottlenecks and optimizing networks. Interesting logistics problems fit very well into graphs.

Let’s have a look at an example!

Here we have a very simplified version of a hotel database setup. We have a chain, hotel, rooms, and customers. Nice and clear, right?

MATCH (r:Room {name: "701"} )RETURN r// Select * from Rooms// Where name="701"
Simple Cypher query and SQL-like translation

*Psst…* For the sake of this article, schema and queries focus on simplicity.

Even at this step, I’d like to point out one thing: notice that I don’t need any technical knowledge to understand relationships between entities. Just double-click a node to expand or hide its connections.

Open a huge dataset on a tablet and your kid would be playing with it the whole day, opening, closing, and moving nodes around.
Lifehack of the day

Now let’s go deeper. We have a reservation in our hotel and for some reason the final price is lower than we expected. What could be the reason?

We see that the room was booked for three people and paid by credit card. Let’s go see if we can find out more.

Aha! The payment card belongs to someone working at the hotel and it seems he gave a discount to his friends. It depends on the circumstances if this is good or bad, but at least we know a possible reason. This was a simple case of fraud detection.

Well, this was a lot of clicking, but that’s how it goes. Fraud detection departments consist of thousands of people clicking through every data-point to find something that looks strange… Sadly, some companies could miss this as a joke.

Of course, we need to act more intelligently, so let’s have a look at another example. Imagine our guest is worried about getting COVID. Could we find out if any of his friends are infected and take some helpful steps?

MATCH paths=(u:Customer {email:""})-[*..4]-(i:Infection {name: "COVID-19"})RETURN paths

This query here can find people who are connected to our guest with maximum of 4 hops (depth of search).

Found something! But a closer look reveals that the person was infected several month ago, so it’s not relevant anymore.

Let’s refine our query to look at a more specific timeframe (past two weeks) and increase the depth to 7. Please note, we don’t describe how people should be connected to one another.

MATCH   paths = (Customer {email:""})-[*..7]-(op),        (op)-[ir:INFECTED_WITH]->(i:Infection {name: "COVID-19"})WHERE   datetime(ir.created_at) > datetime('2020-09-07')        AND datetime(ir.created_at) < datetime('2020-09-24')RETURN  paths, ir, i

OK, we got something. Our guest is staying with a friend who went to a restaurant with another person and that person’s partner was recently confirmed with case of coronavirus. Wow… It’s a deep connection, which is good for our friend as the chance of infection is lower, but we can still advise caution to our other visitors and do an extra deep cleaning of the guest’s room to lower the risk for others.

Now that’s the power of graphs. Please note, the query was very vague, but it worked quite nicely.

Graph query languages

Being a different database type, graph databases have their own languages. When I first started with graph databases this was the most interesting part for me as it’s not only languages but a completely different way of interacting with data than what I got used to with relational databases.

So far you have seen queries written in Cypher and running Neo4j. But there are many more to choose from.

Let’s have a quick look at some of the most common ones:

•          Cypher – (used in this article. Created and used with Neo4j)

•          SPARQL – (RDF query language by W3C)

•          Gremlin – (Azure Cosmos Graph DB)

•          GraphGL – (Yes. Beloved client query language can also be used natively – Dgraph)

•          GQL – (to become a standard graph query language by W3Cs)

Please go ahead and check them out and play with the one you feel is more to your taste. All of them have something to offer.

Relational vs. graph and myths

For those of you who scrolled to this section to see the juicy performance benchmarks and response time comparison charts for terabytes of data, please read the whole article to see that there is no need to compare apples to oranges. Both technologies are suitable in specific scenarios.

…No? Still need charts and numbers? OK, let’s get it over with.

Write Performance

I heard about this one many times, so I decided to test it. Let’s create a very simple test (I wouldn’t call it a benchmark). Generate and write 100k user records containing very generic properties (email, password, DOB, firstName, lastName, isActive, createdUtc, updatedUtc).

So, case closed. Graph databases are 10x faster than SQL. Post this screen on your twitter, close the article, and collect angry comments. *evil laughing*

…For those few who are still here, of course, this proves absolutely nothing, and you can optimize both databases to your specific case. But it surely proves that this simple statement “Graph databases are slow on write” is misleading. I would strongly recommend checking your use case on a small proof of concept project before going all-in with either technology.

Don’t go deeper than 5 relations / joins

This issue pops up when using GraphQL with a SQL database. Joins are extremely expensive operations for a relational database, hence lower performance.

Graph databases, on the other hand, traverse relationships in an extremely efficient way. As you’ve seen above, it can even hop through wildcard relationships to find an answer.

Pros, cons, and a rule of thumb

Here is a simple set of suggestions that can help you decide which technology might be a better fit for your project.

To Graph or not to Graph

…I guess it’s up to you, but at least give it a shot. I hope you enjoyed this introduction to graph databases. There’s only one thing I’d like you to get from this article:

Graphs are fun. Try it.
—This article

Find the best shape or form for yourself and play with it one evening. I promise, it’ll be an interesting experience. Maybe you’ll become another node connected to graph databases.

Watch me talk about this topic on Reactive Online Meetup:

Developer with a passion for elegant solutions and design.

More About