Why Defragmentation is Needed

There is a fierce debate in some Windows circles about whether defragmentation is required. Some argue that the NTFS system doesn't really fragment files, or that the occasional fragmented file is not a problem. Others argue that any enormous hard drive has insignificant fragmentation. And, of course, there is the confusing notion that flash drives should not be defragmented. Also, Linux users seldom experience fragmented files.

Why defrag? The goal is efficiency. If you are defragmenting your computer for any other reason, then you've missed the point. We defrag a drive to make it more efficient. Fewer fragments mean less unnecessary work for the drive, which means better read times when a file is read or written. Since hard drives are slow compared to RAM, an efficient drive reduces waiting times, which means the computer can get the job done faster.

Hard Disk Drive

How File Fragmentation Occurs

Consider any generic computer with several processes running in the background, and each process creates a log file with entries in it. If the files reside in different parts of the disk, they can usually grow without running into any other files. Eventually, though, one of the files will encounter a situation where the next available block of disk space is already in use by another file. At that point the operating system will allocate another block of space on the disk. This will be a "non-contiguous" block, i.e. not adjacent to the other ones. At this point the file is said to be fragmented. Any file that grows over time risks becoming fragmented into many non-contiguous pieces.

On a hard drive, fragmented files result in additional performance costs. If the software wants to open the file and read it from start to finish, then there are some delays caused by the hard drive mechanism:

  1. The seek time to move the hard drive "head" to the start of the file;
  2. The time taken to read the first block of data;
  3. The time to move the hard drive "head" to the location of the second block;
  4. The time taken to read the second block of data; etc.

If the blocks of data are physically close to one another on the drive, the time taken to move the head to the correct position for each subsequent block of data (3.) is very small, i.e. insignificant. But whenever a non-contiguous block occurs, the seek time is usually much longer. Reading a file with several hundred fragments is therefore going to take longer than a file of the same length where all blocks are contiguous. So there is a mechanical performance cost associated with a fragmented file.

Hard drives, being mechanical devices, also use more power, get hotter, and wear out faster when they are jumping around from location to location, either as a result of fragmented files or any other kind of intensive hard drive usage. This is bad for the life of the drive and the power consumption of the computer.

Now consider a solid state drive, such as a USB memory stick. Each file block is associated with a region of RAM, so the "read time" will be the same for any block anywhere on the disk, because there are no mechanical parts involved, so there is no variation in the "seek time" of each block. At the other extreme, a tape drive can have extremely long seek times, especially if the tape needs to be rewound to get to the next requested block.

File Positioning

In addition to the level of fragmentation of a file, there is also the question of the location of the file on a hard drive. It is easy enough to measure the performance of a hard drive and note that the "inside" of the drive has slower read times to the "outside" of the drive. When the drive is formatted, this usually means that the lower-numbered clusters can be read faster than higher numbered ones.

In an ideal world we would therefore want to put all our most important files near the outside of the disk, so they are read faster. But there are other factors. In order to open a file in Windows, you first need to read through the file directory, find the file entry and the Master File Table (MFT) entry, and then read the MFT to get the file placement information, before eventually venturing out on the disk to open the file.

If the directory and MFT are in different areas of the drive, the hard drive has to do some extra travelling to load the required information. If the directory and the MFT can be "cached" in RAM, then the RAM copy can be consulted without using the drive at all. So the placement of the directories and MFT has to be determined along with the placement of the files.

Another consideration is the location of recently modified files and the remaining free space on the disk. The ideal would be to have the file that you want to edit immediately next to the hard drive's free space, so that as the file grows it expands into the free space, and very little fragmentation occurs, or the location of the fragments are pretty close to the rest of the file. The bigger the free space, the lower the chance of bad fragmentation occurring.

Defragmenting Files

There are times when it is best to just leave the file alone, and not worry if it is fragmented. Other times it is better to do something about it, for the sake of the PC performance and the hard drive life. How can you tell? Consider a large SQL database file, say 15GB in size. The file is fragmented into two pieces, which are separated by a tiny file that uses only one block. Given that the two file pieces are close together and the mechanical performance cost is so low, it isn't worthwhile rearranging 15GB of data to eliminate one fragment. If the database is in constant use, the extra hard drive load caused by trying to defragment the file may be far worse than the minor hiccup of one small file.

Each defrag utility has a different approach to the problem of defragmenting files. Some just move the file into the nearest open space big enough to contain the whole file. Others start by trying to figure out the ideal placement of all the files, and then move files around until the ideal layout is archived.

In Windows 95 and 98 there were no "hooks" into the operating system that allowed a program to request that a file be moved from location A to location B. This meant that the defrag program had to take over control of the drive and manipulate the files directly, which was dangerous and slow. DOS programs like Norton Speed Disk and the Microsoft Defrag utility (which it licensed from Norton) had to have full access to the entire drive.

Starting with Windows NT, Microsoft included some API calls that allowed defrag utility programmers to ask the operating system to please move a file (or part of a file) from point A to point B, and the operating system itself handled issues such as what to do if point B was already in use. It also meant that the defrag program could run while other programs and processes were active, and it meant that there was no data loss if the request failed, because Windows would still have the original data at point A.

Boot-Time or Offline Defragmentation

Some files are in use by the operating system from the time it loads. An obvious example of this is the Windows Page File ("pagefile.sys"). Since it is used a lot by the operating system, particularly when available RAM is low, it is best to have a single contiguous file. This led to the idea of doing a defrag of this file as soon as possible during the boot-up stage of the computer, before Windows started using the file. Another method was to create a boot disk that would allow the user to perform maintenance of the drive, while running the operating system from a different drive, such as the CD or USB memory stick. This is known as "offline" defragmentation, because the normal operating system is not in use.

Both boot-time and offline defrag programs work on the assumption that not being able to use the computer for a while is okay if the computer works better after the defrag is completed. It also requires more discipline to do defrags on a regular basis.

Another approach is to create a boot CD so that all the files on the hard drive are not in use. You can use BartPE and then run JkDefrag or Contig, which works fine, provided your computer's BIOS is supported. A more risky approach is to use a different OS entirely. The problem with this is that the normal (safer) Windows API calls are not available, so you are relying on the software to correctly support the (foreign) NTFS or FAT32 file system, without screwing up any files. Make sure you have a complete hard drive backup before trying this.

Background Defragmentation

Another approach to file fragmentation is to "chip away" at the problem as a background or scheduled task. This "set it and forget it" approach means that the user doesn't have to run the defrag program manually. The problem with this approach is to determine a convenient time to run the defrag. If the PC is idle at a particular time every week, such as during a weekly management meeting, this is the obvious time to choose. Another approach is to leave the computer running all night and do the defrag in the early hours of the morning. But this uses power, which is expensive.

A constant defrag process can interfere with normal file operations, resulting in sluggish performance. This is exactly the wrong reason to do a defrag. It can also result in much greater hard drive activity, reducing the lifespan of the drive, which is counter-productive. One obvious solution is to run the defrag as a screen saver, because in many cases this is when the computer is idle. But the software needs to be clever enough to determine how busy the hard drive is. If the computer is very busy processing a long task, even though the keyboard and mouse is inactive, this is not a good time to do a defrag.

By default Windows Vista and Win7 have a scheduled task set to run their own defrag program once a week. Ideally this should be done after all the Windows Updates have been installed.

Optimisation Techniques

There are twice as many optimisation techniques as there are programmers. Some techniques concentrate purely on removing file fragmentation. Others try to place the files in a more optimal position, based on one or more of a dozen factors. Some also try to allocate free space in a way that would be helpful to the user or the operating system. Some try to be as thorough as possible, while others adopt a more pragmatic approach, defragmenting files where this could improve performance.

There is another debate over what constitutes improved performance. One ideal would be to analyse which files are used during the Windows boot-up procedure, and to move them to a region of the disk where they would be stored in the order in which they are loaded, to minimise seek times, even from one file to another. Another ideal would be to figure out which files are used most often, and group them together.Another approach adopts a policy of leaving free space next to the MFT, so that if it has to grow, it can do so contiguously by using this free space.

Drive Utilisation

As a drive fills up with data, the effects of fragmentation become more obvious, and general system performance also slows down. Newer Windows systems change the colour of the drive information when space is getting low, typically 90% full. Some commercial defrag programs become ineffective at around 75%, or when the available free contiguous space is less than the size of the largest file to be defragmented. Some programs try to minimise the amount of file movement that needs to be done, by figuring out what to do in a single pass. Others try a multi-pass approach.

What about RAID drives?

Defragmentation of RAID drives is required if the files are not contiguous and they are causing performance problems. Bear in mind that the data in a RAID5 array is spread over 5 drives, so a badly fragmented file could cause all 5 drives to work harder. Defrag utilities work at the logical file system level, not the physical level. When you are defragmenting a RAID drive, you are also causing a lot more hardware work to be done, because of all the error correction and redundancy work that goes on. A more pragmatic approach is called for. In fact this applies to server hard drives in general.

What about USB Hard Drives?

The USB interface to the hard drive slows down the data transfer rates to a crawl, so you shouldn't bother too much about trying to improve USB drive performance by defragmentation, especially if the drive is being used for archives or backups. A light defrag pass and optimising the directories is all you are likely to need. This will improve the performance from dreadful to awful, and may prolong the life of the drive a bit. Just make sure you don't let the drive overheat, or you can kiss your data goodbye.

See Also

See the article on "Before You Defrag Your PC" for advice on speeding up your PC and reducing risk.



blog comments powered by Disqus
free counters