Linux Defragmentation Issues

Like all operating systems, Linux does have fragmented files. Unlike Windows, most users never notice the fragmentation, or even care. Fragmentation only starts having a noticeable effect after the hard drive is 90% full, but since the last 5% is reserved for use by super-users (i.e. root), the issue of a full hard drive usually takes precedence over the performance issues caused by fragmented files.

Dominic Humphries provides a simple, non-technical answer as to why some filesystems suffer more from fragmenting than others. Here are slightly edited versions of two of his brilliant articles:

Linux File Organisation

Rather than simply stumble through lots of dry technical explanations, I'm opting to consider that an ASCII picture is worth a thousand words. Here, therefore, is the picture I shall be using to explain the whole thing:

   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
a  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
b  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
j  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
o  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
q  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
r  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
s  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
u  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
v  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
w  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
y  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

This is a representation of a (very small) hard drive, as yet completely empty - Hence all the zeros. The a-z's at the top and the left side of the grid are used to locate each individual byte of data: The top left is aa, top right is za, and bottom left is az. You get the idea, I'm sure. . .

We shall begin with a simple filesystem of a sort that most users are familiar with: One that will need defragmenting occasionally. Such filesystems, which include FAT, remain important to both Windows and Linux users: if only for USB flash drives, FAT is still widely used - unfortunately, it suffers badly from fragmentation.

We add a file to our filesystem, and our hard drive now looks like this:

   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
a  T O C h e l l o . t x t a e l e 0 0 0 0 0 0 0 0 0 0
b  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T O C 
e  H e l l o , _ w o r l d 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

(Empty rows g-z ommitted for clarity)

To explain what you see: The first four rows of the disk are given over for a "Table of contents", or TOC. This TOC stores the location of every file on the filesystem. In the above example, the TOC contains one file, named "hello.txt", and says that the contents of this file are to be found between ae and le. We look at these locations, and see that the file contents are "Hello, world"

So far so good? Now let's add another file:

   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
a  T O C h e l l o . t x t a e l e b y e . t x t m e z 
b  e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T O C 
e  H e l l o , _ w o r l d G o o d b y e , _ w o r l d 
f  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

As you can see, the second file has been added immediately after the first one. The idea here is that if all your files are kept together, then accessing them will be quicker and easier: The slowest part of the hard drive is the stylus, the less it has to move, the quicker your read/write times will be.

The problem this causes can be seen when we decide to edit our first file. Let's say we want to add some exclamation marks so our "Hello" seems more enthusiastic. We now have a problem: There's no room for these exclamation marks on our filesystem: The "bye.txt" file is in the way. We now have only two options, neither is ideal:

  1. We delete the file from its original position, and tack the new, bigger file on to the end of the second file - lots of reading and writing involved
  2. We fragment the file, so that it exists in two places but there are no empty spaces - quick to do, but will slow down all subsequent file accesses.

To illustrate: Here is approach one

   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
a  T O C h e l l o . t x t a f n f b y e . t x t m e z 
b  e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T O C 
e  0 0 0 0 0 0 0 0 0 0 0 0 G o o d b y e , _ w o r l d 
f  H e l l o , _ w o r l d ! ! 0 0 0 0 0 0 0 0 0 0 0 0

And here is approach two:

   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
a  T O C h e l l o . t x t a e l e a f b f b y e . t x 
b  t m e z e 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T O C 
e  H e l l o , _ w o r l d G o o d b y e , _ w o r l d 
f  ! ! 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Approach two is why such filesystems need defragging regularly. All files are placed right next to each other, so any time a file is enlarged, it fragments. And if a file is reduced, it leaves a gap. Soon the hard drive becomes a mass of fragments and gaps, and performance starts to suffer.

Let's see what happens when we use a different philosophy. The first type of filesystem is ideal if you have a single user, accessing files in more-or-less the order they were created in, one after the other, with very few edits. Linux, however, was always intended as a multi-user system: It was gauranteed that you would have more than one user trying to access more than one file at the same time. So a different approach to storing files is needed. When we create "hello.txt" on a more Linux-focussed filesystem, it looks like this:

   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
a  T O C h e l l o . t x t h n s n 0 0 0 0 0 0 0 0 0 0
b  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T O C 
e  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
j  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n  0 0 0 0 0 0 0 H e l l o , _ w o r l d 0 0 0 0 0 0 0
o  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
q  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
r  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
s  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
u  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
v  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
w  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
y  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

And then when another file is added:

   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
a  T O C h e l l o . t x t h n s n b y e . t x t d u q 
b  u 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T O C 
e  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
j  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n  0 0 0 0 0 0 0 H e l l o , _ w o r l d 0 0 0 0 0 0 0
o  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
q  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
r  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
s  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
u  0 0 0 G o o d b y e , _ w o r l d 0 0 0 0 0 0 0 0 0
v  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
w  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
y  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The cleverness of this approach is that the disk's stylus can sit in the middle, and most files, on average, will be fairly nearby: That's how averages work, after all.

Plus when we add our exclamation marks to this filesystem, observe how much trouble it causes:

   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
a  T O C h e l l o . t x t h n u n b y e . t x t d u q 
b  u 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T O C 
e  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
j  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
n  0 0 0 0 0 0 0 H e l l o , _ w o r l d ! ! 0 0 0 0 0
o  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
q  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
r  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
s  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
u  0 0 0 G o o d b y e , _ w o r l d 0 0 0 0 0 0 0 0 0
v  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
w  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
y  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

That's right: Absolutely none.

The first filesystem tries to put all files as close to the start of the hard drive as it can, thus it constantly fragments files when they grow larger and there's no free space available.

The second scatters files all over the disk so there's plenty of free space if the file's size changes. It can also re-arrange files on-the-fly, since it has plenty of empty space to shuffle around. Defragging the first type of filesystem is a more intensive process and not really practical to run during normal use.

Fragmentation thus only becomes an issue on ths latter type of system when a disk is so full that there just aren't any gaps a large file can be put into without splitting it up. So long as the disk is less than about 80% full, this is unlikely to happen.

It is also worth knowing that even when an OS says a drive is completely defragmented, due to the nature of hard drive geometry, fragmentation may still be present: A typical hard drive actually has multiple disks, AKA platters, inside it.

Let's say that our example hard drive is actually on two platters, with aa to zm being the first and an to zz the second:

   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
a  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
b  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
j  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 
   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
n  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
o  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
q  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
r  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
s  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
u  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
v  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
w  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
y  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

The following file would be considered non-fragmented, because it goes from row m to row n, but this ignores the fact that the stylus will have to move from the very end of the platter to the very beginning in order to read this file.

   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
a  T O C h e l l o . t x t r m e n 0 0 0 0 0 0 0 0 0 0
b  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
c  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
d  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
e  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
f  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
g  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
h  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
i  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
j  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
k  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
l  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
m  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 H e l l o , _ w o 
 
   a b c d e f g h i j k l m n o p q r s t u v w x y z 
 
n  r l d ! ! 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
o  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
p  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
q  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
r  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
s  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
t  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
u  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
v  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
w  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
y  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
z  0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

I hope this has helped you to understand why some filesystems can suffer badly from fragmentation, whilst others barely suffer at all; and why no defragging software came with your Linux installation.

Fighting Fragmentation

I still occasionally see people who are adamant that fsck tells them that some of their files are non-contiguous (fragmented) and this is a problem and they want a solution. So you have some files that are fragmented. And this obviously is slowing your machine down. Right?
Wrong.

This is the first thing you need to get your head around if you're out to keep your hard drive performance as high as possible. A file being fragmented is not necessarily a cause of slowing down.

For starters, consider a movie file. Say, three hundred megabytes of file to be read. If that file is split into three and spread all over your hard drive, will it slow anything down?

No. Because your computer doesn't read the entire file before it starts playing it. This can be easily demonstrated by putting a movie onto a USB flash drive, starting playback, and then yanking the drive out.

So since your computer only reads the start of the file to start with, it matters not in the slightest that the file is fragmented: So long as the hard drive can open those 300MBs in under half an hour (and if it can't, throw it away or donate it to a museum) the fact that the file is fragmented is of no concern whatsoever.

Your computer has hundreds if not thousands of similar files. As well as multimedia files that you WANT to take several minutes to access, you have all kinds of small files whose access times are made irrelevant by the slowness of the application that reads them: Think about double-clicking a 100KB file to edit in Open Office - however long it takes to open the file, it's irrelevant considering how damn long it takes to get OOo loaded from a cold start.

You might like it
The next thing you need to bear in mind is that a file being all in one place doesn't necessarily mean that it'll get read faster than a file that's scattered around a bit.

Some people are adamant, having watched Windows defrag a FAT/NTFS partition, that all the files should be crammed together at the start of the disk, unfragmented. This cuts down on the slowest part of the hard drive reading process, the moving of the head.

Except it doesn't.

Everything crammed together makes sense in certain applications. A floppy disk or read-only CD/DVD for example. Places where one file is being read from a single disk, and the files being crammed tightly together isn't going to guarantee that a single file edit will instantly re-fragment things.

However, this is the 21st century. Your hard disk is not a hard disk, it's a hard drive with multiple discs (AKA platters) inside it, and the times when you would only be reading or writing one file at a long time are long, long gone.

It is perfectly feasible to think that in one single instant, my PC might be:

  • Updating the system log
  • Updating one or more IM chat logs
  • Reading an MP3/Ogg/Movie file
  • Downloading email from a server
  • Updating the web browser cache
  • Updating the file system's journal
  • Doing lots of other stuff

All of this could quite feasibly happen at the same time: Probably happens a hundred times a day, in fact. And every single one of these requires a file to be accessed on the hard drive.

Now, your hard drive can only access one file at a time. So it does clever things, holding writes in the memory for a while, reading files in the order they are on the drive rather than the order they were requested, etc. etc.

So the chances that your hard drive has nothing to do other than try to read a fragmented file are really pretty low. It's fitting that one file into a queue of file reads and writes that it's busy with.

Imagine this scenario: Your computer wants to read three files, A, B, and C. Here's a disk where they're non-fragmented:

   01       02       03       04       05       06
abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh
000AAAAA A0000000 0BBBBBB0 00000000 00CCCCCC 00000000

And here's one where they ARE fragmented:

   01       02       03       04       05       06
abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh abcdefgh
000AA000 0BBB00CC 0AA00BBB 000CCC00 AA00000C 00000000

Assuming your multi-tasking hard drive wants to read all three of these, which will be quicker to complete the job?

Answer: It makes no difference because the head still needs to go from '01 d' to '05 h' in one go, whether the files are fragmented or not.

In fact, the fragmented files might well be faster: The drive only has to read the first two blocks to get the first portions of each file. That might be enough that the applications accessing these files can begin their work with all three files at this point, whereas the non-fragmented version would only be working with one file at this point.

In this (highly simplified as usual) example, you gain a performance increase by scattering your files around the drive. Fragmentation is not necessarily a performance-killer.

But even so...
Okay, so even Linux's clever filesystems can't always keep you completely clear of performance-degrading fragmentation. The average user won't suffer from it, but certain types of file usage - particularly heavy P2P usage - can result in files scattered all over your drive.

How to keep this from causing problems? Carve up your hard drive!

Logically speaking, that is: Partitions are your friend!

Being simplistic again, the main cause of fragmented files is large files that get written to a lot. The worst offenders are P2P-downloaded files, as these get downloaded in huge numbers of individual chunks. But documents that are frequently edited - word processing, spreadsheets, image files - can all start out small and get big and problematic.

So, the first and simplest thing to do: Have a separate /home partition.

System files mostly just sit there being read. You don't make frequent updates to them: Your package manager or installation disk write them to disk, and they remain unchanged until the next upgrade. You want to keep these nice tidy, system-critical files away from your messy, frequently-written-to personal files.

Your system will not slow down due to large numbers of fragmented files if none of the system files are fragmented: A roomy dedicated root partition will ensure this.

But if your /home partition gets badly organised, then it could still slow you down: A pristine Firefox could still be slowed down by having to try and read a hideously-scattered user profile. So safeguard your /home as much as possible too: Create another partition for fragmentation-prone files to be placed in. P2P files, 'living' documents, images you're going to edit, dump them all in here.

This needn't be a significant hardship: You can have this partition mounted within your home directory if you like. So long as it keeps your own config files and the like away from the fragmentation-prone files, it'll help.

Backups
So partitioning can cut down on the influence fragmented files can have. But it doesn't actually stop the files being fragmented, does it?

These days, hard drives are cheap. Certainly they cost less than losing all your data. It makes a lot of sense to buy a second hard drive to backup your files to: Far quicker than burning files to DVDs, and more space to write to as well.

In fact, you've got so much space, you could even set up a script to do this:

  1. Backup the contents of your fragmentation-prone partition
  2. Verify that the files have been properly backed up (MD5 or whatever)
  3. Erase the original, heavily-fragmented files
  4. Copy the files from your backup disk to the original partition

As simple as that, you have your fragmented files both backed up and defragmented. And it's actually quicker and better to defrag like this: Writing all your files in one go to a blank partition is far quicker than having to shuffle bits of them all over the place trying to fit them all around each other; and you're not cramming them all together in one place like Windows does, so they have "room to grow" in future, again making them less prone to fragmenting - you're working with your filesystems' in-built algorithms, instead of against them.

A sensible partitioning strategy and occasional backup-defrags will keep your data secure and structured far better than one big partition with everything haphazardly dumped in it.

Don't look for a defrag utility to hide a poorly-thought-out hard drive arrangement. Invest some effort into organising your data and you won't know or care if there's a defragmentation tool available.

Defragmentation Tool

Jack Wallen's article provides another approach. It mentions a script that allows the user to check the fragmentation level, and then mentions the defrag utility by Con Kolivas. The end result is the same: there is a defrag utility available, but the need to use it may not be that high. Con Kolivas provides his own stats for his "compile" directory. If the time taken to do the defrag means that the compile process is faster, this may be a worthwhile reason to run the defrag.

What both of these authors are saying is that a bit of planning and measurement can reduce the need to defragment your Linux system, which is the best thing you can do for your hard drive.

If you are using a SAMBA share on a Linux server to store files, you can safely ignore any fragmentation problems on the server. Ideally, the SAMBA data should be on a separate partition so that if/when the partition gets full, the server doesn't choke and fall over. You'd do the same on a Windows server, right?

Article Sources

Many thanks to the following articles and authors:

Personal note: I am not a bigot when it comes to personal computers and operating systems. I have used several, including HP 1000 minicomputers (RTE-6/VM), MS-DOS 1.1 onwards, Windows 1.0 onwards, Macintosh Finder OS (68000 based) and Mac OS X, and 3 different versions of Linux. I use Windows daily, mainly because I am a Microsoft Access programmer. Sadly, Access is only available on Windows, so I'm stuck. I have an Ubuntu 9.10 boot CD that I use to test all my web sites. After the way Apple treated me last time, I don't think I'll be buying Apple products anytime soon. The same applies to HP and IBM.



blog comments powered by Disqus
free counters