This blog post is about how to use C (UNIX source code) to re-implement the command line ls
utility for UNIX systems. I enjoy hacking and find Operating Systems interesting so I wanted to re-implement ls
without using #include <sys/stat.h>
.
sys/stat.h
is overpowered in my opinion, and takes the fun out of using ls
within C
source code. I wanted to develop a version of ls
from first principles to absorb OS concepts like file i-node numbers. I also wanted to provide a simple solution rather than convoluted examples which are hard to understand.
The command we want to use should have the syntax:
$ ./ls .
.
on the command line is a hidden file that is present in UNIX filesystems that represents the directory path of the current folder. In this example, we will be listing all the files within a folder, including the meta files .
and ..
for simplicity. ..
in UNIX represents the file path of the parent directory.
The first thing we need to do is ensure our main
function only accepts a single filepath argument.
#include <stdio.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
printf("Please enter 1 argument for the directory to list.\n");
return -1;
}
}
The above code checks the argument count (argc
) equals 2 and exits the program with a -1 return code if not.
You can see that to enable printf
we have #include <stdio.h>
.
We will be using #include <dirent.h>
for accessing the dirent
type, opendir
and readdir
commands. For accessing the files in the directory, we need to open the directory using opendir
and specify the folder path as a string.
#include <dirent.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
printf("Please enter 1 argument for the directory to list.\n");
return -1;
}
DIR *directory = opendir(argv[1]);
if (!directory) {
printf("The directory doesn't exist or is unreadable.\n");
return -1;
}
}
We have also added error handling to ensure the directory was successfully opened. !directory
is a really simple NULL
check. NULL
is equivalent to 0
in C, and 0
resolves to the BOOL
value NO
. if (!directory)
thus means if directory is not NULL
.
#include <dirent.h>
#include <stdio.h>
int main(int argc, char *argv[]) {
if (argc != 2) {
printf("Please enter 1 argument for the directory to list.\n");
return -1;
}
DIR *directory = opendir(argv[1]);
if (!directory) {
printf("The directory doesn't exist or is unreadable.\n");
return -1;
}
printf("Printing out directory contents of %s:\n\n", argv[1]);
struct dirent *direntry = readdir(directory);
while (direntry != NULL) {
printf("Filename: %s\ninode number: %llu\n\n", direntry -> d_name, direntry -> d_ino);
direntry = readdir(directory);
}
}
We now complete the implementation with recursive readdir
. direntry
is only guaranteed to have 2 members as part of the POSIX standard (Source): d_name
and d_ino
. These correspond to the file name and file inode number respectively.
inodes are pointers to disk memory where the files are actually stored.
Files are stored as byte blocks e.g. 1024 bytes, addressed using data block pointers. The inode stores the memory addresses of these data blocks for fast access in non-hierarchical file systems.
The inode numbers have different formats depending on the filesystem. They could include 6 direct pointers, 1 singly indirect block and 1 doubly indirect block, for example. These chained index blocks are essentially triply or doubly linked lists and exponentially increase the maximum file size for an Operating System. Small files can be addressed using direct pointers only.
The advantages for inodes are faster look-up times for files due to smaller search radii, caching, and close proximity to data. We can store the inodes close to the files on disk and this lookup is further sped up in custom-built SoCs.
gcc readdir.c -o ls
./ls .
Fingers crossed...
Printing out directory contents of .:
Filename: .
inode number: 1921047
Filename: ..
inode number: 1915157
Filename: readdir.c~
inode number: 14415621
Filename: .DS_Store
inode number: 9054655
Filename: ls.dSYM
inode number: 14476348
Filename: readdir.c
inode number: 14429305
Filename: ls
inode number: 14487301
I've adapted the code for getting additional file information from the man pages for stat. Resave the file as readdir_stat.c
to get the new stat features:
#include <dirent.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <time.h>
#define DEST_SIZE 100
int print_stat(const char *directory_path, char *entry_name) {
struct stat sb;
char filepath[DEST_SIZE];
strcpy(filepath,directory_path);
strcat(filepath,(const char *) "/");
strcat(filepath,entry_name);
strcat(filepath,(const char *) "\0");
if (stat(filepath, &sb) == -1) {
printf("stat failed");
return -1;
}
printf("File type: ");
switch (sb.st_mode & S_IFMT) {
case S_IFBLK: printf("block device\n"); break;
case S_IFCHR: printf("character device\n"); break;
case S_IFDIR: printf("directory\n"); break;
case S_IFIFO: printf("FIFO/pipe\n"); break;
case S_IFLNK: printf("symlink\n"); break;
case S_IFREG: printf("regular file\n"); break;
case S_IFSOCK: printf("socket\n"); break;
default: printf("unknown?\n"); break;
}
printf("Mode: %lo (octal)\n", (unsigned long) sb.st_mode);
printf("Link count: %ld\n", (long) sb.st_nlink);
printf("Ownership: UID=%ld GID=%ld\n", (long) sb.st_uid,
(long) sb.st_gid);
printf("Preferred I/O block size: %ld bytes\n", (long) sb.st_blksize);
printf("File size: %lld bytes\n", (long long) sb.st_size);
printf("Blocks allocated: %lld\n", (long long) sb.st_blocks);
printf("Last status change: %s", ctime(&sb.st_ctime));
printf("Last file access: %s", ctime(&sb.st_atime));
printf("Last file modification: %s\n", ctime(&sb.st_mtime));
return 0;
}
We can use the helper function in main
after some tweaks for error handling. I like using helper functions because they make code clearer and easy to partition.
int main(int argc, char *argv[]) {
if (argc != 2) {
printf("Please enter 1 argument for the directory to list.\n");
return -1;
}
DIR *directory = opendir(argv[1]);
if (!directory) {
printf("The directory doesn't exist or is unreadable.\n");
return -1;
}
printf("Printing out directory contents of %s:\n\n", argv[1]);
struct dirent *direntry = readdir(directory);
struct stat sb;
while (direntry != NULL) {
printf("Filename: %s\ninode number: %llu\n", direntry -> d_name, direntry -> d_ino);
int code = print_stat(argv[1], direntry -> d_name);
if (code != 0) {
return code;
}
direntry = readdir(directory);
}
return 0;
}
Note here how I have used the -g
flag to get debugging symbols. We can debug the program with LLDB if we want to using this flag. I used this to fix the dreaded [1] 82115 segmentation fault ./ls_stat .
error message. This is a SIGSEGV Signal i.e Segmentation Fault that happens when we access memory out of the bounds of our program.
$ gcc -g readdir_stat.c -o ls_stat
$ ./ls_stat .
# LLDB
$ lldb ./ls_stat .
Using our new ls_stat
function:
Printing out directory contents of .:
Filename: .
inode number: 12944739708
File type: directory
Mode: 40755 (octal)
Link count: 13
Ownership: UID=501 GID=20
Preferred I/O block size: 4096 bytes
File size: 416 bytes
Blocks allocated: 0
Last status change: Thu Aug 26 14:28:10 2021
Last file access: Thu Aug 26 14:37:34 2021
Last file modification: Thu Aug 26 14:28:07 2021
Filename: ..
inode number: 12966193218
File type: directory
Mode: 40755 (octal)
Link count: 9
Ownership: UID=501 GID=20
Preferred I/O block size: 4096 bytes
File size: 288 bytes
Blocks allocated: 0
Last status change: Thu Aug 26 14:28:08 2021
Last file access: Thu Aug 26 14:28:07 2021
Last file modification: Thu Aug 26 14:28:07 2021
Filename: ls_stat
inode number: 13080755973
File type: regular file
Mode: 100755 (octal)
Link count: 1
Ownership: UID=501 GID=20
Preferred I/O block size: 4096 bytes
File size: 50312 bytes
Blocks allocated: 104
Last status change: Thu Aug 26 14:28:10 2021
Last file access: Thu Aug 26 14:28:15 2021
Last file modification: Thu Aug 26 14:28:10 2021
Filename: ls_stat.dSYM
inode number: 13080755326
File type: directory
Mode: 40755 (octal)
Link count: 3
Ownership: UID=501 GID=20
Preferred I/O block size: 4096 bytes
File size: 96 bytes
Blocks allocated: 0
Last status change: Thu Aug 26 14:22:42 2021
Last file access: Thu Aug 26 14:23:55 2021
Last file modification: Thu Aug 26 14:22:42 2021
Filename: readdir.c~
inode number: 13080351695
File type: regular file
Mode: 100644 (octal)
Link count: 1
Ownership: UID=501 GID=20
Preferred I/O block size: 4096 bytes
File size: 606 bytes
Blocks allocated: 8
Last status change: Thu Aug 26 13:39:43 2021
Last file access: Tue Aug 24 14:03:13 2021
Last file modification: Sun Aug 22 18:49:25 2021
Filename: readdir_stat.c~
inode number: 13080749917
File type: regular file
Mode: 100644 (octal)
Link count: 1
Ownership: UID=501 GID=20
Preferred I/O block size: 4096 bytes
File size: 654 bytes
Blocks allocated: 8
Last status change: Thu Aug 26 13:47:58 2021
Last file access: Thu Aug 26 13:44:50 2021
Last file modification: Thu Aug 26 13:44:46 2021
Filename: readdir.c
inode number: 13080746629
File type: regular file
Mode: 100644 (octal)
Link count: 1
Ownership: UID=501 GID=20
Preferred I/O block size: 4096 bytes
File size: 628 bytes
Blocks allocated: 8
Last status change: Thu Aug 26 13:39:45 2021
Last file access: Thu Aug 26 13:39:46 2021
Last file modification: Thu Aug 26 13:39:43 2021
Filename: readdir_stat.c
inode number: 13080750376
File type: regular file
Mode: 100644 (octal)
Link count: 1
Ownership: UID=501 GID=20
Preferred I/O block size: 4096 bytes
File size: 2342 bytes
Blocks allocated: 8
Last status change: Thu Aug 26 14:28:03 2021
Last file access: Thu Aug 26 14:28:05 2021
Last file modification: Thu Aug 26 14:28:03 2021
Filename: ls
inode number: 13080725018
File type: regular file
Mode: 100755 (octal)
Link count: 1
Ownership: UID=501 GID=20
Preferred I/O block size: 4096 bytes
File size: 50047 bytes
Blocks allocated: 104
Last status change: Thu Aug 26 11:48:25 2021
Last file access: Thu Aug 26 11:48:25 2021
Last file modification: Sun Aug 22 19:07:33 2021
Also, you can see we now have .dSYM
files in our directory which are debugging symbols. This is because we used the -g
flag earlier.
Looks like a lot more than we'd need in day to day coding, but we can now use this whenever we want to instead of ls
. Feel free to customise this because you probably won't need anything too different to the pre-baked ls
for most things (it's very well documented and designed).
I hope you enjoyed playing around with UNIX source code. I found the project fun because I learnt something useful outside of iOS development.
PS: I implemented ls
using readdir_r
(recursive ➰) and this also performed the same. I abandoned this code snippet because the iterative solution was simpler to reason about.