I/O and File descriptors

A fd is a small integer representing a kernel-managed object that a process may read/write. A process obtains a fd by opening a file, directory, device, pipe, or socket. In unix, a file has different types that what it currently represents.

Every process has a private space of file descriptors starting at zero. Xv6 kernel uses fd as an index into a per-process table.

By convention, fd 1(standard input), fd 2(standard output), fd3 (standard error). xv6 shell ensure the 3 fds are open when starting up.

read and write system calls advance the file by offset (how much it reads or writes), so next time, the read/write starts from the new offset.

One interesting aspect is program like cat does not know whether it is reading from file, console, socket, or pipe. It does not know whether it is printing to a console, or a file. The use of fds and the conventions allows some simple implementation of user programs.

close releases fd, making it free for reuse. A newly allocated fd is always the lowest-numbered unused descriptor of the current process. Unix shell utilizes this feature with fork, to implement smart I/O redirection, and implement pipe.

I/O redirection

fork copies parent’s file descriptor tables, with its memory. The system call exec replaces the calling process’ memory but preserves its file table.

Example how command cat < input.txt works

char *argv[2];
argv[0] = “cat”;
argv[1] = 0;
if(fork() == 0) {
  close(0);
  open(“input.txt”, O_RDONLY);
  exec(“cat”, argv);
}

After the child closes file descriptor 0, open is guaranteed to use that file descriptor for the newly opened input.txt: 0 will be the smallest available file descriptor. Cat then executes with file descriptor 0 (standard input) referring to input.txt

fork copies fd table, but file offset is shared.

The underlying file offset is shared between parent and child.

if(fork() == 0) {
  write(1, “hello “, 6);
  exit(0);
} else {
  wait(0);
  write(1, “world\en”, 6);
}

dup system call is similar. Duplicates an existing file descriptor, returning a new fd that refers to the same I/O object. Both file descriptors share the an offset.

fd = dup(1);
write(1, “hello “, 6);
write(fd, “world\en”, 6);

Do you know why the file offset is shared?

fd is basically an index pointing to an actual file object. The file is shared between parent, and child. After fork or dup, the reference count of the file is incremented.

Fun Fun Command

ls existing-file non-existing-file > tmp1 2>&1 The 2>&1 tells the shell to give the command a file descriptor 2 that is a duplicate of descriptor 1. Both the name of the existing file and the error message for the non-existing file will show up in the file tmp1.

Summary

File descriptors are a powerful abstraction, because they hide the details of what they are connected to: a process writing to file descriptor 1 may be writing to a file, to a device like the console, or to a pipe.

Last updated