Bavi_H's Blog

File systems: a brief description


by Bavi_H, 6/12/2008

This post is overflow from a Wikipedia Reference desk discussion on fragmentation

Summary: The operating system knows how to find all the clusters of a file by using the directory entry to find the first cluster, and the file allocation table to find all the rest. The file allocation table of a particular disk is always a fixed size (it has an entry for every cluster whether it's used or available), and is really a special reserved part of the file system, not a part of any particular file.


When a disk is formatted, a file system is created. Most of the disk is devoted to file data, and is divided into clusters. Another part of the disk is reserved for the file allocation table, which is a fixed size based on the number of clusters on the disk. The data section usually also contains a special space for the root directory.

The file allocation table is just a large list with one entry representing each cluster. Special values indicate if it's an unused cluster or if it's a used cluster but the last cluster in a file. All other values indicate it's a used cluster, but that it's not the last cluster in the file, and the value is actually the number of the next cluster in the file.

Directories are really special kinds of files that contain lists of file names and other flags, including a file's first cluster number, and the size of a file in bytes.

Starting with the file information from a directory entry, you can find the first cluster of a file and it's size in bytes. Each time you read in a cluster, you can look at the file allocation table to see if the cluster is the last cluster in the file, or if there's more clusters, where the next cluster is. Once you reach the last cluster in the file, you can use the file size to know exactly how many bytes are part of the file, and how many bytes are left over unused at the end of the cluster.

The operating system knows how to find all the clusters of a file by using the directory entry to find the first cluster, and the file allocation table to find all the rest. The file allocation table is always a fixed size (it has an entry for every cluster), and is really a part of the file system, not a part of the file.

When a disk is first formatted, the data section is mostly empty. As files are created, the clusters are usually just used in order. But as files are deleted or changed, available and used clusters become more intermixed, and newer files may have clusters scattered about the data section of the disk.

A fragmented file is one with its clusters not in order next to each other. A defragemented file is one with its clusters in order next to each other. Either way, the file uses the same number of clusters (the same amount of space is used up on the disk).

Update

6/15/2008 -- My main argument was that a file will use the same amount of disk space whether it's fragmented or defragmented. However another user, Kainaw, suggested that modern file systems do not use file allocation tables, and the methods they do use would cause framented files to use more disk space. Until I find time to investigate further, I'm uncertain if fragmented files use more disk space in modern file systems.

Example file systems

Here are some simple file systems that I have come across.

VMS Flashrom from Dreamcast Programming
mc.pp.se/dc/vms/flashmem.html
The file system of a Dreamcast Visual Memory card. This file system has no sub-directories.

Yamaha PSR-225 Bulk Dump Format - Song data decoded
rnhart.net/articles/bulk-dump.htm#song
In this memory dump of my piano, a simple file system stores the user songs. The first section and the section I called "beginning blocks" is like a root directory. (There are no sub-directories in this file system.) The section I called "next blocks" is like a file allocation table. The "block data" is where the main data of the files is stored.

Link

fragmentation
en.wikipedia.org/wiki/Wikipedia:Reference_desk/Archives/Computing/2008_June_11#fragmentation
The Wikipedia Reference Desk question where the discussion started.