Recently, developers at Mews started approaching us with observations like these:
- Why don’t we rebuild indexes more often when their fragmentation is so high?
- We were trying to optimize a query and couldn’t do much. But we think rebuilding the indexes would really help as it would fix the fragmentation.
This is a very interesting topic and I felt we have not provided a satisfactory answer to our developers yet. This is my attempt to clarify the subject of index fragmentation in SQL Server.
Defrag all the things
Most of us find pleasure in seeing that things around us are in order and well organized. Do you remember the Disk defragmentation tool in ancient versions of Windows?
It was incredibly satisfying to watch the process go through the disheveled file layout and produce long stretches of blue. It is not surprising then that when we see a fragmented database index, our first reaction is “That’s why my queries are slow. Rebuilding it will surely help!” But will it?
There are two types of fragmentation, and they are not born equal.
The first type is external fragmentation, which means that the (8 kB) data pages the index consists of are not in order. The second type is internal fragmentation, which means the pages are not fully packed with your data. Therefore, when discussing fragmentation issues, make sure you are referring to the same phenomenon.
Is external fragmentation a problem?
Should we care that the index pages are not in order? Let’s go case by case, starting with the most typical ones, down to the less frequent ones:
- Reading a page from RAM: the physical page order obviously does not matter here.
- Seeking from disk: we are reading a single page from the disk and it does not matter that its logical neighbors are someplace else.
- Scanning from disk: the page order does matter, mostly by reducing the efficiency of read-aheads.
As a reminder, index scan means reading all (or many of) the leaf-level pages of an index rather than seeking to a particular page. Looking at the most problematic aspect of external fragmentation, how many index scans is your database doing compared to seeks? In Mews, it is just 0.17 %, and considering we have a pretty typical OLTP workload, I believe we are not an outlier in this regard. There are valid cases where index scans are to be used, but often they are a sign of insufficient or poor indexing or unoptimized queries.
Is internal fragmentation a problem?
Internal fragmentation means there is free space inside the data pages rather than having them fully packed with row data. It makes your indexes larger and many operations slower by making the SQL Server access more pages than necessary. Because the smallest unit of data to be cached is an 8 kB data page, not a single row, it also means wasting space in the buffer pool, the SQL Server’s cache for data and index pages. If your data pages are, on average, only 75% full, you are effectively throwing away 25% of your server’s memory.
Should you be fixing index fragmentation?
Ideally, our indexes would have their data pages full and in order. And on a read-only database, that might be an achievable goal. However, this is simply not possible once you start inserting, updating, and deleting data.
Pages will have to be split to accommodate the new rows or ones which grow in size, leaving free space. When deleting data, the removed rows will also introduce free space. Both of these mechanisms means growing internal fragmentation. New pages, on the other hand, often cannot be physically adjacent to their logical neighbors, increasing the external fragmentation.
It is therefore not surprising that the two types of fragmentation often go hand in hand, as the underlying logical operation of splitting a page is a common cause of them both. So why not rebuild the indexes every now and then, bringing them back to their defragmented, ideal state?
There are a couple of points I would consider before establishing an index-rebuilding routine:
- Rebuilding the index is a big strain on the database. Even though it is possible to be done online in recent versions of the SQL Server, minimizing locking, it still generates a lot of writes to the transaction log, slowing down backups, replication, etc. Consider if having the indexes “perfect” for a while is worth these negative consequences.
- If external fragmentation is a big problem for your database (how do you measure it, by the way?), there are probably more pressing issues you should address. Why are your queries touching the disk so much? Throwing more RAM at the server or fixing the worst queries in terms of logical reads done should give you more bang for your buck.
- External fragmentation is becoming less of a problem in an era when most of us are storing the databases on SSDs.
- To decrease internal fragmentation, consider compressing your indexes, allowing more rows to fit in a page.
- If you rebuild the indexes with a 100 % fill factor, you are increasing the expected amount of page splits to happen once updates or inserts come. For certain access patterns, it makes sense to set the fill factor to a lower value, leaving free space for new or updated rows.
- On the other hand, be careful with setting the fill factor lower than 100 % and only do it for indexes where all or most of the data pages have a high chance of being updated or inserted into.
- Most likely, it is the case that some of your indexes need to be defragmented (perhaps with a custom fill factor) while others are best left alone. It is important to know the access patterns of your workload.
Standa’s life is more than just index fragmentation…
The best way to find out about everything is working with him and his colleagues every day. Are you up for it?
For some reason, index fragmentation and how to deal with it seems to be a contentious topic in the SQL Server community. From what I have gathered so far, the industry standard used to be to regard index fragmentation as a pretty serious drag to performance and rebuilding indexes was not contested much as the recommended solution.
However, in the recent era, there are many who preach a more reserved stance to this issue. For example, Grant Fitchey in his SQL Server 2017 Query Performance Tuning concludes that “I lean heavily towards the “stop defragmenting your indexes” camp.”
I think our best bet is to have good observability into our databases, understand the principles of what fragmentation actually means, and decide based on data and experiments. And if this article left you with more questions than answers, that is because there really is no one-size-fits-all solution to this problem.
How are you dealing with fragmentation in your databases?