Files and the Linux File System

The Linux file system is a classic tree with a root node /. This is commonly called the file system root or the root directory, and should not be confused with the /root/ directory, the home directory of the root user.

As any tree, all nodes, including the root node, can contain children which may be either files or directories. Note that some files may be "special" files.

Current working directory

Linux (and many other systems) track the current working directory, the directory from which relative paths are computed. This is very important when running commands with relative paths, commands that take paths as optional or required arguments, or opening files from scripts or other executable programs. For example, the Python call open("hello.txt") will open the file hello.txt in the current working directory.

You may configure your shell to display the current working directory as part of the command prompt, as shown below. You can see my current working directory is set to ~/Documents/tech/books/linux-for-djangonauts/.

Current working directory shown in the command prompt

Absolute and relative paths

Absolute paths start with the root node /. Here are some examples.

/
/etc/passwd
/usr/bin/ls
/home/username/Documents/hello.txt

As long as the target node exists, absolute paths will be valid regardless of the current working directory.

Relative paths are relative to the current working directory. Here are some examples.

hello.txt
Documents/hello.txt
./hello.txt
../Pictures/dog.jpg

Relative paths will need to be adapted to the current working directory. If the current working directory changes, the relative path will also change.

Current and parent directories

You may be wondering why some relative directories in the list above start with ./ or ../. The . refers to the current directory, while .. refers to the parent directory. For security reasons, the current working directory is typically not available in the $PATH variable, which defines a list of directories to search for executable commands. This means that if you have a file you would like to execute in a non-standard directory, and the current working directory is set to the directory containing the file, e.g. myprogram, you cannot simply run myprogram, but will usually need to prefix it with the current directory, as in ./myprogram. Running commands and environment variables will be covered in a later chapter.

The home directory

The home directory is the directory that contains the user's personal files. The home directory is typically /home/username, except for the special case of the root user whose home directory is /root.

The home directory is often referred to as ~ or ~/, which the shell expands to the current user's home directory. If your username is username and your home directory is /home/username, then ~/ will expand to /home/username/.

Among others, certain things you could keep in your home directory include:

Photos
Documents
Videos
Downloads
Software projects
Configuration files (commonly found under ~/.config/ or a subdirectory)

Where to save your work

Choosing where to save your work is very much a question of personal preference. I personally use ~/projects/ with a subdirectory for each project I work on, but here are some alternatives:

~/work/
~/projects/django/, along with other subdirectories for different languages or frameworks.
~/src/`

Throughout this book, I will be using ~/projects/ with a subdirectory for the project name, e.g. ~/projects/foo/.

Hidden files and directories

By default, commands like ls and tree as well as file browsers hide files starting with a .. This can be useful to keep your directories cleaner.

For example, you can name your virtual environment .env or .venv to hide it from view.

$ python -m venv .venv
$ source .venv/bin/activate

Commonly used directories

Most Linux distros use fairly common directory layouts. Note that based on your distro, your actual directory layout may be different, but some of the most common directories include:

/bin which contains executable programs.
/boot which contains boot loader files.
/dev which contains hardware devices.
/etc which contains configuration files.
/home which contains the users home directories.
/lib which contains shared libraries.
/media which contains removable media.
/mnt which contains mount points.
/opt which contains optionally installed software. It's a good place to install programs not installed via a package manager.
/proc which contains process information.
/root which contains the root user's home directory.
/run which contains runtime files, such as UNIX sockets and PID files.
/sbin which contains system binaries.
/srv which contains software repositories.
/sys which contains system files.
/tmp which contains temporary files. These files may be deleted at any time, especially upon rebooting.
/usr which contains user programs. Many distros install packages in this directory and subdirectories.

Phew! That's a lot of directories.

Notes

Many of these directories contain subdirectories. For example, /usr/include/ contains header files while /etc/nginx/ may contain configuration files for the nginx web server.
A few of these directories are commonly deprecated on modern distros. /bin/, /sbin/, and /usr/sbin/ now commonly link to /usr/bin/ where executables are actually installed.
Most distros implement the Filesystem Hierarchy Standard, which is maintained by the Linux Foundation. It's a pretty long technical document, but you will find it linked below.
The Cross-Desktop Group also maintains the XDG Base Directory Specification which defines where certain files, e.g. user and program data and configuration, should be located.
You can also find more information in the file-hierarchy and hier man pages.

File permissions

File (and directory) permissions are used to control access to files and directories. Every file or directory has a numeric owner and group ID, and three sets of permissions: user (owner) permissions, group permissions, and other permissions. Each set is further subdivided into three different permissions: read, write, and execute.

File permissions may be represented by a string, or an octal (numeric with base 8) value. For example, calling the list command with the long flag (ls -l) will list the permissions of each list entry as a string. The string is generally 10 characters long, starting with the file type, and three groups of permissions. For example, a user alice may see the following output:

$ ls -l /home/
drwx------  2 alice alice 4096 Jan 1 00:00 alice/

The drwx------ permission string can be split up as follows. The first character (d) indicates that the entry alice/ is a directory. Its absolute path is /home/alice/, as we listed the contents of the /home/ directory.

The following three characters (rwx) indicate that the directory is readable, writable, and executable by the owner. We can also see that based on the third field in the listing (alice), the file is owned by the user alice.

The next three characters (---) indicate the group permissions. A dash (-) indicates that a permission is denied, so in this case, members of the group alice (as shown by the fourth field in the listing) may not read, write, or execute the directory. Note that while the user alice does not get permissions from their group membership, they are granted permissions from their ownership, so in this case they have all permissions.

Finally, the last three characters (---) indicate the other permissions. In this case, other means that the user is not alice and they are not a member of the group alice. They likewise may not read, write, or execute the directory.

As previously mentioned, permissions may also be represented by an octal value. The value is three numbers long, with values between zero and seven. Each number is a sum of permissions, and the first number indicates the user permissions, the second number indicates the group permissions, and the last number indicates the other permissions. The possible values are 1 for the executable bit, 2 for the writeable bit, and 4 for the readable bit.

In our last example, the octal permission value would be 700, as the owner has all permissions (read (4), write (2), and execute (1)), while the group and others have no permissions. A file may be read-only (4), readable and writeable (6), or readable, writeable and executable (7). It can also be readable and executable (5), but is uncommon for a file to be executable or writeable, but not readable.

Note: as we will see later, octal permissions are actually four digits long. When omitted, the first digit is implicitly set to zero.

What executable means

The executable bit works differently for files and directories. If a directory is not executable, you may be unable to read its descendents' attributes, or create new entries. For example:

$ cd /tmp/ # change to the temporary directory to keep our file system clean
$ mkdir test1 # create a directory
$ touch test1/something # create a file in our new directory
$ chmod 644 test1 # remove the executable bit from the test1 directory
$ ls -alh test1/ # list the contents of test1
ls: cannot access 'test1/.': Permission denied
ls: cannot access 'test1/..': Permission denied
ls: cannot access 'test1/something': Permission denied
total 0
d????????? ? ? ? ?            ? ./
d????????? ? ? ? ?            ? ../
-????????? ? ? ? ?            ? something

Note that anything after a # in the commands listed above is a comment, and ignored by the shell. As you can see, the kernel shows very limited information. If can list contents of the test1 directory, but it cannot list the owner and group, size, permissions, or modification time. While this may seem confusing, simply remember that directories should be executable and you should be golden.

For a file, the executable bit must be enable to allow running the file as a program. There are two commonly used executable file formats. ELF (Executable and Linkable Format), and script files. ELF files are written in compiled languages such as C and go, and compiled into a binary file. Script files can be written in a scripting language like bash, fish, or Python, and executed by a shell or language interpreter. Package managers typically install files with the proper permissions, and commands like django-admin also typically create new projects with the required permissions (e.g. making the manage.py file executable), but you may need to enable the executable bit by yourself on files you create, e.g. when writing a script file.

As an example, let's look at the ls command.

$ file /usr/bin/ls
/usr/bin/ls: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
$ ls -lh /usr/bin/ls
-rwxr-xr-x 1 root root 135K Oct  6 08:15 /usr/bin/ls*

The file command shows us it is a binary in the ELF format, while the ls command shows us that it is executable by any user. We also ran the ls command, which we could not do if it were not executable.

$ cp /usr/bin/ls /tmp/ # copy the ls command to /tmp/
$ chmod 644 /tmp/ls # remove the executable bit
$ /tmp/ls # try running the command
bash: /tmp/ls: Permission denied

The whole shebang

While ELF files can be run natively, script files require some kind of interpreter. While you may be used to file associations deciding which program should open a specific file, the POSIX specifications used by Linux makes things... complicated. I will not be going into details at this time, but assume that if you save a Python file foo.py in the current working directory, make it executable, and run it as ./foo.py, the shell will likely use a shell execution environment, and use a program such as sh as an interpreter. This is unlikely to interpret Python code correctly, and will probably throw a bunch of errors. As such, you should run it as python foo.py, or use a shebang inside your script to specify which interpreter to use.

What is a shebang, you might wonder?

It must be the beginning of your script.
It must start with the #! character.
It must be followed by the name of the interpreter.

For example, you can use the following to ensure that a script is executed by the Python interpreter rather than the default shell interpreter:

#!/usr/bin/python
print("Hello from python")

It can also be useful to run Python with extra arguments. For example, you might want to use the -i flag to run the script in interactive mode while testing a script:

#!/usr/bin/python -i
print("Hello from Python")

This will print Hello from Python, and then enter the Python REPL shell. While this example is silly, it may be useful if you import Django models, load data from a database, etc. Any declared variables and imports will then be available in the Python shell. This is particularly helpful when debugging a function or you want to initialize some data.

You can also use the /usr/bin/env command to let the shell decided which binary to execute based on the $PATH environment variable. For example, the following would allow you to run the system Python interpreter by default, or the interpreter inside your active virtual environment:

#!/usr/bin/env python
print("Hello from python")

Note: You may need to use the -S flag to force the shell to split arguments when passing multiple arguments to the shebang, e.g.:

#!/usr/bin/env -S python -i
print("Hello from Python")

Otherwise, the shell may try to execute an executable called python -i rather than python with the -i argument. See the linked Stack Overflow question for more information.

Setuid, setgid, and the sticky bit

As we briefly covered before, octal permissions are four digits long, though the leading zero is often omitted. As such, 0644 is the same as 644.

The first digit may also be a sum of the values 4 (s in strings) for the setuid bit, 2 (S) for the setgid bit, and 1 (t) for the sticky bit.

When set on an executable file, the the setuid and setgid bits tell the kernel to run the file using the user or group ID of the file's owner, respectively. For example, we can test this with the Python REPL. First, copy the python executable with the command cp -iv /usr/bin/python3.11 ~/pybits (make sure to adjust the path accordingly for your Python version). Then change the owner and group by running sudo chown 5000:2000 ~/pybits. This will set the uid to 5000, and gid to 2000. Note that these IDs are likely to be unused on your system, but this does not matter. Finally, let's enable both the setuid and setgid bits with sudo chmod 6755 ~/pybits. Once done, let's take a look at the new permissions.

$ ls -lh ~/pybits
-rwsr-sr-x 1 5000 2000 14K Jan  3 19:30 ~/pybits*

As we can see, rather than x for executable, the user and group permissions now show s instead, and the uid and group are set to 5000 and 2000, respectively. As there are no corresponding entries in the user and group databases, the IDs are shown numerically. If we run the ~/pybits command, we can now revisit the os module as covered in Users and Permissions.

>>> import os
>>> print (os.getuid(), os.geteuid())
1000 5000
>>> print (os.getgid(), os.getegid())
1000 2000

As we can see, our effective user and group IDs have been changed.

Note that this may lead to security issues, so the setuid and setgid are typically ignored on executable scripts, but they can be useful to grant certain extra permissions. The sudo command usually has the setuid bit set to elevate privileges and allow running commands as the superuser.

The sticky bit is commonly set on directories so that only the file owner, directory owner, or superuser can delete files inside the directory. For example, this is very useful for the /tmp/ directory which is world-writeable by default in ensuring that user eve cannot delete files owned by user alice.

File attributes

Some file systems further allow you to set file attributes and even extended attributes. This is beyond the scope of this book, but as a quick example, a file that is writeable by some user but has the a attribute set may be opened for writing in append mode, but not in write mode. See the linked chattr and xattr man pages at the end of this chapter.

Mounting file systems

Linux supports a variety of different types of file systems. The most common ones are in the ext family (ext2, ext3, ext4), but btrfs and xfs are other popular options. If you are looking to format a drive or a partition for usage with Linux and are unsure which file system to use, ext4 is always a safe choice.

Linux can also mount many file systems designed for use with other operating systems, but as they may lack features like proper file permissions, they should not be used to install Linux.

Finally, some file systems may require extra steps prior to mounting. For example, a file system encrypted via dm-crypt must be unlocked before it can be mounted.

To mount a file system, you can use the mount command, but you can also use the /etc/fstab file to declare which file systems should be mounted automatically. To mount a file system, you must provide at least a source and destination. The source will typically be a device, such as /dev/sda1, while the destination will be a directory. It is common to use subdirectories of the /mnt/ directory, but you may also mount a drive in /home/ if you would like to share your files across different Linux installs, /var/ to dedicate an entire drive or partition to variable files if you require extra storage for certain applications, etc.

Note: not every mount needs to be a file system. While file systems typically need to be mounted as superuser, you can also use File system in USErspace (FUSE) to mount certain types of file systems as a regular user. When using e.g. sshfs to mount a remote location onto a different machine, you can mount any directory rather than a block device such as a partition. tmpfs mounts a virtual file system in the computer's memory.

For more information, see the File systems and fstab pages on the Arch Wiki linked below.

File links

Links are references to files (or directories) on the file system. When you create a new file, a link is automatically created to it. When you create a new directory, two links are automatically created to it. One is a reference to the directory itself, the other is the special . entry as mentioned earlier in this chapter.

Those links are called hard links because they point to a concrete file (or more specifically to an inode, which is beyond the scope of this book). There is another type of link, called symbolic link (aka symlink), that simply points to another name on a file system. Symlinks are one of the "special" file types mentioned previously.

Let's explore links in this section by creating some temporary files and directories as well as some extra links. Both symbolic and hard links can be created using the ln command.

$ mkdir /tmp/testlinks/
$ cd /tmp/testlinks/
$ touch file1 # create an empty file
$ mkdir dir1 # create a directory
$ ln file1 file2 # add a new hard link to file 1
$ dd if=/dev/urandom of=randomfile bs=1M count=10 # create a file containing 10 MB of random data
$ ln randomfile randomfile2
$ ln randomfile randomfile3
$ ln -s randomfile randomfile4 # create a symbolic link

Note that you may not create new hard links to directories.

$ ln dir1 dir2
ln: dir1: hard link not allowed for directory

A few things worthy of note happened after we ran these commands. Let's have a look at the file listing.

$ ls -lh
total 30M
drwxr-xr-x 2 username username  40 Jan  4 01:30 dir1/
-rw-r--r-- 2 username username   0 Jan  4 01:29 file1
-rw-r--r-- 2 username username   0 Jan  4 01:29 file2
-rw-r--r-- 3 username username 10M Jan  4 01:30 randomfile
-rw-r--r-- 3 username username 10M Jan  4 01:30 randomfile2
-rw-r--r-- 3 username username 10M Jan  4 01:30 randomfile3
lrwxrwxrwx 1 username username  10 Jan  4 01:30 randomfile4 -> randomfile

First, we see our directory says it contains 30 MB worth of data, but if we run du -h, we see it only takes up 10 MB. That's because we have three hard links to the same file, originally created under the name randomfile. There is nothing particularly special about the file randomfile, the original link to our file that we created using the dd command. If we delete it, the data will still persist on the disk as we have more hard links to it under randomfile2 and randomfile3, however randomfile4 will now be a broken symbolic link, pointing to a file that is no longer there. A file will only be really deleted from the disk once it has zero hard links pointing to its inode.

Also, note that file1 and file2 are hard links to the same file, while randomfile, randomfile2 and randomfile3 are hard links to another file. They have the same file sizes, content, and permissions. Changing the permission of one link will also apply to other hard links. Note that as a symbolic link, randomfile4 is only 10 bytes, the length of the string it points to.

Finally, note that randomfile4 shows l in the file type field in the permissions list, indicating that it is a symbolic link. Depending on your OS settings, it may also show an arrow showing the destination of the link, as in my example.

Hard links can only be created to files on the same file system, while symbolic links can be created to any file or directory on any file system, even if the linked path does not exist. (Whether a link to a destination that does not exist is useful is a different question.)

Symlinks and nginx

Symbolic links were traditionally used to enable and disable site configurations in nginx. It was pretty common practice to have the directories /etc/nginx/sites-available/ containing actual configuration files, and /etc/nginx/sites-enabled/ containing symbolic links to those files. According to the Nginx Cookbook, this practice is deprecated.

The STDs

There are three special files (that are likely symlinks) used for input and output operations. They are /dev/stdin, /dev/stdout and /dev/stderr, for standard input, standard output and standard error, respectively.

The standard input file is normally the terminal keyboard, and the standard output and standard error files are normally the terminal screen. When you run a Python program, the input() function reads from stdin, while print() writes to stdout. Exceptions are typically written to stderr. This is useful in separating the output stream from the error stream.

Other special files

There are several other special files that can be used for different operations, not all of which will be covered in this book. /dev/null will suppress output.

>>> with open("/dev/null", "w") as f:
...     f.write("hello") # writes into the void
...
5

/dev/zero can be used to read bytes with the value 0, while /dev/random can be used to read random bytes.

>>> with open("/dev/zero", "rb") as f:
...     f.read(10)
...
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> with open("/dev/random", "rb") as f:
...     f.read(20) # your values should be different
...
b"S'4_6\xb5E\x8b\xcb\xaa\x8b{\xc0F8\x91\x9d\xa2\x8bw"

The file system and Python

The two most common modules to work with file paths in Python are os.path and pathlib, with pathlib being recommended for most modern applications. Here are a few quick examples using Python's pathlib module:

>>> from pathlib import Path
>>> p = Path("/tmp/testlinks/")
>>> p.exists()
True
>>> p.is_dir()
True
>>> for _ in p.iterdir(): print(_)
...
/tmp/testlinks/file1
/tmp/testlinks/dir1
/tmp/testlinks/file2
/tmp/testlinks/randomfile2
/tmp/testlinks/randomfile3
/tmp/testlinks/randomfile4
>>> p2 = Path("~")
>>> p2.expanduser()
PosixPath('/home/username')
>>> p2.exists() # Note that ~ does not exist as a real path
False
>>> p2.expanduser().exists()
True

We can also use this module to look at a file's type and permissions:

>>> >>> p3 = Path("/tmp/testlinks/randomfile4")
>>> p3.is_symlink()
True
>>> oct(p3.lstat().st_mode)
'0o120777'

Note that we need to convert the st_mode attribute to octal format, and how we use lstat rather than stat as this is a symbolic link. The last three characters are the file's permissions as previously described.

The file system and Django

Django uses the pathlib module to generate the BASE_DIR variable in settings.py. Older versions used os.path, so you may want to replace that with pathlib if your Python version is modern enough. Note: pathlib was added in Python 3.4, so there's really no reason to not use it.

# settings.py
from pathlib import Path
BASE_DIR = Path(__file__).resolve().parent.parent

The BASE_DIR variable is then used to set necessary paths, such as for the SQLite database file, template dirs, static and media files, etc. You can think of BASE_DIR as the directory containing the manage.py file.

For example, to add the ./templates/ directory to the TEMPLATES list in settings.py, add it to the "DIRS" list:

# settings.py
TEMPLATES = [
    {
        "BACKEND": "django.template.backends.django.DjangoTemplates",
        "DIRS": [BASE_DIR / "templates"],
        ...
    },
]

As it may be difficult to know the exact current working directory in Django, you should rely on the BASE_DIR setting as much as possible when working with files directly. For example, if you want to access a file named foo/llms/model.dat (relative to the BASE_DIR) from a file foo/llms/views.py, you could access it as such:

# views.py
from django.conf import settings
BASE_DIR = settings.BASE_DIR

with open(BASE_DIR / "foo" / "llms" / "model.dat") as f:
    ...

We will cover where to store static and media files in later chapters.