Defrag Part 1

I've done a few algorithmic or generative pieces here using Processing (not the AI kind, the real kind) but very few simulations if any.

Back in the day when we used to have spinning hard drives and, after a period of time, they became "fragmented" by the use of the Windows file system, one of periodic admin tasks was to launch and run the "defrag" utility. It always felt satisfying to strart from a slow, bloated machine, run this utility, see all the blocks drop into place and you ended with a somewhat faster experience as the OS had less work to do hunting for bits of files all over creation.

Recently I was reminded of this little tool and tried a quick simulation which just set up some random coloured blocks and gradually replaced with them with nice clean green blocks. This worked pretty well but it wasn't as satisfying. I decided to come up with a more realistic simulation. This is the first part of that and only goes into the fragmentation part of the sketch.

Files

The heart of the file system, even if I'm not using too much of it at the early stages, is the files themselves and what I have chosen to call file "chunks" - bits of file which are going to be distributed around the file system but go together to make up a whole file.


class File:

    def __init__(self, id, starting_index):
        self.id = id
        self.chunks = []
        size = int(random(1, 10))

        for i in range(size):
            chunk = FileChunk(id, starting_index + i)
            self.chunks.append(chunk)


    def size(self):
        return len(self.chunks)


    def parts(self):
        return self.chunks


    def draw(self):
        print(self.id)

So to begin with, the file decides how many pieces it is going to be divided into and creates those as an array of chunks. Each file has an id so that we can track which chunks belong to which file eventually so we can decide if a file is contiguous on disk. Each chunk gets that parent file id and an index so that for however many chunks we have we can identify them all individually. Finally, some methods to return the size of the file (in chunks) and the parts of the file themselves.

I added a draw function just to write out to the console at the moment to help with debugging so I know each file is different and has the correct number and ids of each chunk.

        
class FileChunk:

    def __init__(self, file_id, index):
        self.file_id = file_id
        self.index = index


    def draw(self):
        print(self.file_id, self.index)

Similarly, the chunks keep the id of their parent file and their index. The draw function again is there for debugging purposes.

Sectors

Physically on disk we are tracking disk sectors in this simple view (not a total reflection of reality but close enough for our needs).


class DiskSector:
    
    def __init__(self, id):
        self.id = id
        self.chunk = None


    def unallocated(self):
        return self.chunk is None


    def allocate(self, part):
        self.chunk = part


    def deallocate(self):
        self.chunk = None


    def draw(self, x, y):
        
        if self.chunk is not None:
            fill(255, 0, 0)
        else:
            fill(240, 240, 240)           
            
        rect(x, y, sector_width, sector_height)
        textAlign(CENTER, CENTER)
        
        id = str(self.id)
        
        if self.chunk is not None:
            id = str(self.chunk.index)
            
        fill(50)
        text(id, x + (sector_width / 2), y + (sector_height /2 ))

Sectors have their own id and a reference to a chunk of file if one is eventually allocated to them. We have methods to check if a sector is unallocated (no file) and to allocate and deallocate a file to/from that sector. Here the drawing code is a little more sophisticated. We draw a different colour depending on whether or not there is a file associated with that sector. We also show either the sector id or the file chunk index in each box.

Does this make me look FAT?

Again in a simplistic fashion, I created a file allocation table to track which files are associated with which sectors of the disk.


class FileAllocationTable:
    
    def __init__(self, length):

        self.sectors = []
        
        for i in range(length):
            self.sectors.append(DiskSector(i))

            
    def unallocated_sectors(self):
        return [ i for i, x in enumerate(self.sectors) if x.unallocated()]


    def draw(self):
        
        border = 5
        x = border
        y = border
        
        # layout across and down the screen
        for i in range(len(self.sectors)):

            sector = self.sectors[i]
            sector.draw(x, y)
            x += sector_width + 1
            
            # move down the screen?
            if x >= (width - sector_width):
                x = border
                y += sector_height + 1


    def scatter(self, files):
        
        for file in files:    
            for part in file.parts():
                index = self.pick_random_index_from(self.unallocated_sectors())
                sector = self.sectors[index]
                sector.allocate(part)

                
    def pick_random_index_from(self, indices):
        randomIndex = int(random(len(indices)))
        return indices[randomIndex]

The FAT creates a set of disk sectors according to how large it is and knows about which sectors are allocated and which are not. The draw code attempts to draw the sectors in order and space them out across and down the screen so the first sector is in the top left and the last sector is somewhere towards the bottom of the screen.

The scatter method takes the existing files array and iterates through the files and their parts to distribute them randomly around the disk. This simulation of file fragementation should be good enough for us to begin with even if it's slightly too random than a real file system would be.

Glue

Finally, the setup and draw code that puts all the bits together, creating the files, creating the FAT and initiating the fragmentation routine.


def setup():
    
    global fat, files
    
    size(1000, 800)

    # random large number for size of disk
    fat = FileAllocationTable(800)

    # actual random number of files
    files = []
    number_of_files = int(random(20, 80))
    
    index = 0
    for i in range(number_of_files):
        file = File(i, index)
        files.append(file)
        index += file.size()
            
    fat.scatter(files)


def draw():  
    fat.draw()

More on the defragmentation process next.