📂
build a OS
  • Learn OS with me
  • OS Interfaces
    • OS interfaces
    • I/O and File descriptors
    • Process and Memory
    • Pipes
    • File
  • OS Organization
    • OS Organization
    • Challenge yourself
  • Memory Management
    • XV6 Virtual Memory
    • Page Table
      • Part 1: How to translate address
      • Part 2: Create an Address Space
      • Part 3: How Page Table is used
      • Part 4: Page Fault and Swap
      • Part 5: How to operate on page tables
    • xv6 buddy allocator
      • How to display physical memory
    • Memory Management Walk Through
  • Traps and Interrupts
    • Trap Home Page
      • 系统调用的核心原理
    • What is trapframe
    • What is trampoline
    • Traps from kernel space
    • How fork() works
    • How system calls get into/out of the kernel
    • How exec() works
  • Scheduling
    • XV6 CPU Scheduling
    • How unix pipes work?
    • How does wait(), exit(), kill() work?
  • File System
    • Overview and Disk Layout
    • Buffer Cache
    • Design Inode Layer
    • Inode Content
    • Block Allocator
    • Design a log system for crash recovery
    • Directory Layer
    • Path names
    • File Descriptor Layer
    • FS System Calls
    • XV6 VS Real World
    • Make Xv6 File disk management system
    • Write FS simulator in python
    • How Redirect Shell command works
  • Concurrency
    • Spinlock
    • How linux select work
    • Hardware Support Locking
    • Exercise: Implement atomic counter
    • Locking in Xv6
    • Concurrency in Xv6
    • Exercise: Socket Programming with Event loop
  • Labs
    • Lab 1 Xv6 and Unix utilities
    • Lab 2 Shell
    • Lab 3 Buddy Allocator
    • Lab 4 Lazy
    • Lab 5 Copy-on-Write Fork for xv6
    • Lab 6 RISC-V assembly
    • Lab 6 Uthread: switching between threads
    • Lab 6 Alarm
    • Lab 7 Lock
    • Lab 8 File System: Large Files
    • Lab 8 File System: Symbolic links
    • Lab 9 mmap
    • Lab 10 Networking Part 1
    • Lab 10 Networking Part 2
  • Hardware, Device, Assembly
    • RISC-V assembly
    • Assembly: Access and Store information in Memory
    • Start xv6 and the first process
    • Why first user process loads another program?
    • What does kernel.ld do in XV6?
    • XV6 Device Driver
Powered by GitBook
On this page
  • Crash in the file system can lead to inconsistent state.
  • Log Header
  • Group Commit
  • Log Disk Space
  • Data structure
  • Pattern for using the log in system call
  • begin_op
  • ilock
  • iunlock
  • end_op
  • Fun Part

Was this helpful?

  1. File System

Design a log system for crash recovery

Crash in the file system can lead to inconsistent state.

For example: Depending on the order of disk writes, crash may leave an inode with a reference to a content block that is marked free. Or leave an allocated but not yet referenced content block.

Log Header

The header block contains an array of sector numbers, one for each of the logged blocks, and the count of log blocks. The count is either 0, indicating there is no transaction. Or non-zero, indicating the log contains a complete committed transactions with a number of logged blocks.

XV6 writes the header block when a txn commits, but not before. So it is atomic.

Group Commit

Committing several txns together. Logging system only commits when no fs system calls ongoing.

Log Disk Space

XV6 dedicates a fixed amount of space on disk to hold log. The total number of blocks written by the system calls must fit in the space. 1. No single system call can be allowed to write more distinct blocks than allowed. 2. A system call must wait until there is enough space in log then begins.

Special Case: write

XV6’s write system call breaks up large writes into multiple smaller writes that fit in the log.

Data structure

Code in log.c

// Contents of the header block, used for both the on-disk header block
// and to keep track in memory of logged block# before commit.
struct logheader {
  int n; //xiaying: number of current added block.
  int block[LOGSIZE];
};

struct log {
  struct spinlock lock;
  int start;
  int size; //xiaying: total allowed capacity.
  int outstanding; // how many FS sys calls are executing.
  int committing;  // in commit(), please wait.
  int dev;
  struct logheader lh;
};
struct log log[NDISK];

static void recover_from_log(int);
static void commit(int);

Pattern for using the log in system call

begin_op();
ilock(f->ip);
r = writei(f->ip, …);
iunlock(f->ip);
end_op();

begin_op

Wait and sleep if the log is currently committing. Sleep when log is reaching MAX allowed size. Otherwise, increment the outstanding system call counter in log metadata, and continue.

// called at the start of each FS system call.
void
begin_op(int dev)
{
  acquire(&log[dev].lock);
  while(1){
    if(log[dev].committing){
      sleep(&log, &log[dev].lock);
    } else if(log[dev].lh.n + (log[dev].outstanding+1)*MAXOPBLOCKS > LOGSIZE){
      // this op might exhaust log space; wait for commit.
      sleep(&log, &log[dev].lock);
    } else {
      log[dev].outstanding += 1;
      release(&log[dev].lock);
      break;
    }
  }
}

ilock

Lock the given inode. Reads the inode from disk if necessary.

void
ilock(struct inode *ip)
{
  struct buf *bp;
  struct dinode *dip;

  if(ip == 0 || ip->ref < 1)
    panic("ilock");

  acquiresleep(&ip->lock);

  if(ip->valid == 0){
    bp = bread(ip->dev, IBLOCK(ip->inum, sb));
    dip = (struct dinode*)bp->data + ip->inum%IPB;
    ip->type = dip->type;
    ip->major = dip->major;
    ip->minor = dip->minor;
    ip->nlink = dip->nlink;
    ip->size = dip->size;
    memmove(ip->addrs, dip->addrs, sizeof(ip->addrs));
    brelse(bp);
    ip->valid = 1;
    if(ip->type == 0)
      panic("ilock: no type");
  }
}

This part is very important, since the inode we have in-memory might not load the latest content from disk!

iunlock

Unlock the given inode by releasing the sleep lock.

void
iunlock(struct inode *ip)
{
  if(ip == 0 || !holdingsleep(&ip->lock) || ip->ref < 1)
    panic("iunlock");

  releasesleep(&ip->lock);
}

end_op

Called at the end of FS system call. Commits if this was the last outstanding operation.

void
end_op(int dev)
{
  int do_commit = 0;

  acquire(&log[dev].lock);
  log[dev].outstanding -= 1;
  if(log[dev].committing)
    panic(“log[dev].committing”);
  if(log[dev].outstanding == 0){
    do_commit = 1;
    log[dev].committing = 1;
  } else {
    // begin_op() may be waiting for log space,
    // and decrementing log[dev].outstanding has decreased
    // the amount of reserved space.
    wakeup(&log);
  }
  release(&log[dev].lock);

  if(do_commit){
    // call commit w/o holding locks, since not allowed
    // to sleep with locks.
    commit(dev);
    acquire(&log[dev].lock);
    log[dev].committing = 0;
    wakeup(&log);
    release(&log[dev].lock);
  }
}

Fun Part

B) Logging is similar to DB’s logging and transaction design.

PreviousBlock AllocatorNextDirectory Layer

Last updated 5 years ago

Was this helpful?

A) Linux ext2 -> ext3 is using the logging system. See

journaling the linux ext2fs filesystem