Design Inode Layer
Inode has 2 related meanings:
1. On-disk data structure containing a file’s size and list of data block numbers.
2. In-memory node, which contains a copy of the on-disk inode as well as extra info needed by kernel.
On-disk inode
inode
The on-disk inodes are packed into a contiguous area of disk. Every one is same size. struct dinode
type
could be file, directory, device, or 0 (means on-disk node is empty). nlink
counts the number of directory entries that refer to this node.
In-memory inode
inode
struct inode
The kernel stores an inode in memory only if there are C pointers referring to that node.
The ref
field counts the number of C pointers referring to the in-memory inode. Kernel discards the inode from memory if reference count drops to 0.
The iget
and iput
functions acquire and release pointers to an inode, modifying the reference count.
Locking in inode
(4 kinds)
inode
(4 kinds)Big lock on in-memory inodes array
icache.lock
protects the invariant that an inode is present in the cache at most once.
Lock on each in-memory inode
Each in-memory inode has a lock field containing a sleep lock, which ensures exclusive access to inode’s important data.
Inode reference count
If a ref
on inode is > 0, system will maintain the inode in the cache, and the cache entry will not be re-used.
Number of links
Each inode contains link
field (stored in disk, copied in memory), counts the number of directory entries that refer to a file. Xv6 won't free an inode if its link count is greater than 0.
Separate reference count and locking
Separating acquisition of inode pointers from locking helps avoid deadlock. Multiple processes can hold a C pointer to an inode returned by iget, but only one process can lock the inode at a time.
iget
only increments reference count. ilock
acquires the inode’s lock.
Why we have in-memory cache of inode?
Main job is to providing synchronizing access by multiple process. Only one process can access the inode at a time. Second is caching. If an inode is frequently used, the buffer cache might keep it at the beginning of its chain if it isn't kept by inode cache.
The inode cache is write-through. Any code modifies the cached inode must immediately write it to disk (iupdate
).
Implement Inodes
Allocate a new inode
Similar to balloc
for allocating a disk block. It loops over the inode structures on disk, one at a time, locking for one that is marked free. When finds one, it claims by writing the new type
. Return an entry from the in-memory inode cache. (Calling iget
)
Get inode from cache
iget
looks thought the in-memory inode cache for an active entry. Bump the ref count and return. Or claim a new empty one.
Release a reference count on in-memory inode
Drop the reference count for an in-memory inode. If no more reference, and number of directory links is 0, free the inode on disk.
Locking when freeing inode
When nlink
is 0, the inode is not trace-able anymore, so not other process can use path to locate this inode. We do not need to worry just before truncating, some other process grab and add more content to it.
Every FS system calls can write to disk
iput
can write to disk, when inode should be freed. So even a read
could write to disk. We must wrap ALL system calls into transactions if they use the file system.
Handle orphan inodes
There is a challenging interaction between iput()
and crashes.
iput()
doesn’t truncate a file immediately when the link count for the file drops to zero (it tries to acquire the lock first), because some process might still hold a reference to the inode in memory: A process might still be reading and writing to the file, because it successfully opened it.
If crash happened in the meantime (before fd
is closed), the file will be marked allocated on disk still but no directory entry will point to it.
How to clean those un-linked inode?
Solution 1: On recovery, scans the whole file system, for files that are marked as allocated, but no directory entry pointing to them. Free them.
Solution 2: Track those 'link count drops to 0 but still has reference count' files on disk. On recovery, scan the list, and free them.
Last updated
Was this helpful?