Files and the Linux File System
The Linux file system is a classic tree with a root node /
. This is commonly called the file system root or the root directory, and should not be confused with the /root/
directory, the home directory of the root
user.
As any tree, all nodes, including the root node, can contain children which may be either files or directories. Note that some files may be "special" files.
Current working directory
Linux (and many other systems) track the current working directory, the directory from which relative paths are computed. This is very important when running commands with relative paths, commands that take paths as optional or required arguments, or opening files from scripts or other executable programs. For example, the Python call open("hello.txt")
will open the file hello.txt
in the current working directory.
You may configure your shell to display the current working directory as part of the command prompt, as shown below. You can see my current working directory is set to ~/Documents/tech/books/linux-for-djangonauts/
.
Absolute and relative paths
Absolute paths start with the root node /
. Here are some examples.
/
/etc/passwd
/usr/bin/ls
/home/username/Documents/hello.txt
As long as the target node exists, absolute paths will be valid regardless of the current working directory.
Relative paths are relative to the current working directory. Here are some examples.
hello.txt
Documents/hello.txt
./hello.txt
../Pictures/dog.jpg
Relative paths will need to be adapted to the current working directory. If the current working directory changes, the relative path will also change.
Current and parent directories
You may be wondering why some relative directories in the list above start with ./
or ../
. The .
refers to the current directory, while ..
refers to the parent directory. For security reasons, the current working directory is typically not available in the $PATH
variable, which defines a list of directories to search for executable commands. This means that if you have a file you would like to execute in a non-standard directory, and the current working directory is set to the directory containing the file, e.g. myprogram
, you cannot simply run myprogram
, but will usually need to prefix it with the current directory, as in ./myprogram
. Running commands and environment variables will be covered in a later chapter.
The home directory
The home directory is the directory that contains the user's personal files. The home directory is typically /home/username
, except for the special case of the root user whose home directory is /root
.
The home directory is often referred to as ~
or ~/
, which the shell expands to the current user's home directory. If your username is username
and your home directory is /home/username
, then ~/
will expand to /home/username/
.
Among others, certain things you could keep in your home directory include:
- Photos
- Documents
- Videos
- Downloads
- Software projects
- Configuration files (commonly found under
~/.config/
or a subdirectory)
Where to save your work
Choosing where to save your work is very much a question of personal preference. I personally use ~/projects/
with a subdirectory for each project I work on, but here are some alternatives:
~/work/
~/projects/django/
, along with other subdirectories for different languages or frameworks.- ~/src/`
Throughout this book, I will be using ~/projects/
with a subdirectory for the project name, e.g. ~/projects/foo/
.
Hidden files and directories
By default, commands like ls
and tree
as well as file browsers hide files starting with a .
. This can be useful to keep your directories cleaner.
For example, you can name your virtual environment .env
or .venv
to hide it from view.
Commonly used directories
Most Linux distros use fairly common directory layouts. Note that based on your distro, your actual directory layout may be different, but some of the most common directories include:
/bin
which contains executable programs./boot
which contains boot loader files./dev
which contains hardware devices./etc
which contains configuration files./home
which contains the users home directories./lib
which contains shared libraries./media
which contains removable media./mnt
which contains mount points./opt
which contains optionally installed software. It's a good place to install programs not installed via a package manager./proc
which contains process information./root
which contains the root user's home directory./run
which contains runtime files, such as UNIX sockets and PID files./sbin
which contains system binaries./srv
which contains software repositories./sys
which contains system files./tmp
which contains temporary files. These files may be deleted at any time, especially upon rebooting./usr
which contains user programs. Many distros install packages in this directory and subdirectories.
Phew! That's a lot of directories.
Notes
- Many of these directories contain subdirectories. For example,
/usr/include/
contains header files while/etc/nginx/
may contain configuration files for thenginx
web server. - A few of these directories are commonly deprecated on modern distros.
/bin/
,/sbin/
, and/usr/sbin/
now commonly link to/usr/bin/
where executables are actually installed. - Most distros implement the Filesystem Hierarchy Standard, which is maintained by the Linux Foundation. It's a pretty long technical document, but you will find it linked below.
- The Cross-Desktop Group also maintains the XDG Base Directory Specification which defines where certain files, e.g. user and program data and configuration, should be located.
- You can also find more information in the
file-hierarchy
andhier
man pages.
File permissions
File (and directory) permissions are used to control access to files and directories. Every file or directory has a numeric owner and group ID, and three sets of permissions: user (owner) permissions, group permissions, and other permissions. Each set is further subdivided into three different permissions: read, write, and execute.
File permissions may be represented by a string, or an octal (numeric with base 8) value. For example, calling the list command with the long flag (ls -l
) will list the permissions of each list entry as a string. The string is generally 10 characters long, starting with the file type, and three groups of permissions. For example, a user alice
may see the following output:
The drwx------
permission string can be split up as follows. The first character (d
) indicates that the entry alice/
is a directory. Its absolute path is /home/alice/
, as we listed the contents of the /home/
directory.
The following three characters (rwx
) indicate that the directory is readable, writable, and executable by the owner. We can also see that based on the third field in the listing (alice
), the file is owned by the user alice
.
The next three characters (---
) indicate the group permissions. A dash (-
) indicates that a permission is denied, so in this case, members of the group alice
(as shown by the fourth field in the listing) may not read, write, or execute the directory. Note that while the user alice
does not get permissions from their group membership, they are granted permissions from their ownership, so in this case they have all permissions.
Finally, the last three characters (---
) indicate the other permissions. In this case, other means that the user is not alice
and they are not a member of the group alice
. They likewise may not read, write, or execute the directory.
As previously mentioned, permissions may also be represented by an octal value. The value is three numbers long, with values between zero and seven. Each number is a sum of permissions, and the first number indicates the user permissions, the second number indicates the group permissions, and the last number indicates the other permissions. The possible values are 1 for the executable bit, 2 for the writeable bit, and 4 for the readable bit.
In our last example, the octal permission value would be 700
, as the owner has all permissions (read (4), write (2), and execute (1)), while the group and others have no permissions. A file may be read-only (4), readable and writeable (6), or readable, writeable and executable (7). It can also be readable and executable (5), but is uncommon for a file to be executable or writeable, but not readable.
Note: as we will see later, octal permissions are actually four digits long. When omitted, the first digit is implicitly set to zero.
What executable means
The executable bit works differently for files and directories. If a directory is not executable, you may be unable to read its descendents' attributes, or create new entries. For example:
$ cd /tmp/ # change to the temporary directory to keep our file system clean
$ mkdir test1 # create a directory
$ touch test1/something # create a file in our new directory
$ chmod 644 test1 # remove the executable bit from the test1 directory
$ ls -alh test1/ # list the contents of test1
ls: cannot access 'test1/.': Permission denied
ls: cannot access 'test1/..': Permission denied
ls: cannot access 'test1/something': Permission denied
total 0
d????????? ? ? ? ? ? ./
d????????? ? ? ? ? ? ../
-????????? ? ? ? ? ? something
Note that anything after a #
in the commands listed above is a comment, and ignored by the shell. As you can see, the kernel shows very limited information. If can list contents of the test1
directory, but it cannot list the owner and group, size, permissions, or modification time. While this may seem confusing, simply remember that directories should be executable and you should be golden.
For a file, the executable bit must be enable to allow running the file as a program. There are two commonly used executable file formats. ELF (Executable and Linkable Format), and script files. ELF files are written in compiled languages such as C and go, and compiled into a binary file. Script files can be written in a scripting language like bash, fish, or Python, and executed by a shell or language interpreter. Package managers typically install files with the proper permissions, and commands like django-admin
also typically create new projects with the required permissions (e.g. making the manage.py
file executable), but you may need to enable the executable bit by yourself on files you create, e.g. when writing a script file.
As an example, let's look at the ls
command.
$ file /usr/bin/ls
/usr/bin/ls: ELF 64-bit LSB pie executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2
$ ls -lh /usr/bin/ls
-rwxr-xr-x 1 root root 135K Oct 6 08:15 /usr/bin/ls*
The file
command shows us it is a binary in the ELF format, while the ls
command shows us that it is executable by any user. We also ran the ls
command, which we could not do if it were not executable.
$ cp /usr/bin/ls /tmp/ # copy the ls command to /tmp/
$ chmod 644 /tmp/ls # remove the executable bit
$ /tmp/ls # try running the command
bash: /tmp/ls: Permission denied
The whole shebang
While ELF files can be run natively, script files require some kind of interpreter. While you may be used to file associations deciding which program should open a specific file, the POSIX specifications used by Linux makes things... complicated. I will not be going into details at this time, but assume that if you save a Python file foo.py
in the current working directory, make it executable, and run it as ./foo.py
, the shell will likely use a shell execution environment, and use a program such as sh
as an interpreter. This is unlikely to interpret Python code correctly, and will probably throw a bunch of errors. As such, you should run it as python foo.py
, or use a shebang inside your script to specify which interpreter to use.
What is a shebang, you might wonder?
- It must be the beginning of your script.
- It must start with the
#!
character. - It must be followed by the name of the interpreter.
For example, you can use the following to ensure that a script is executed by the Python interpreter rather than the default shell interpreter:
It can also be useful to run Python with extra arguments. For example, you might want to use the -i
flag to run the script in interactive mode while testing a script:
This will print Hello from Python
, and then enter the Python REPL shell. While this example is silly, it may be useful if you import Django models, load data from a database, etc. Any declared variables and imports will then be available in the Python shell. This is particularly helpful when debugging a function or you want to initialize some data.
You can also use the /usr/bin/env
command to let the shell decided which binary to execute based on the $PATH
environment variable. For example, the following would allow you to run the system Python interpreter by default, or the interpreter inside your active virtual environment:
Note: You may need to use the -S
flag to force the shell to split arguments when passing multiple arguments to the shebang, e.g.:
Otherwise, the shell may try to execute an executable called python -i
rather than python
with the -i
argument. See the linked Stack Overflow question for more information.
Setuid, setgid, and the sticky bit
As we briefly covered before, octal permissions are four digits long, though the leading zero is often omitted. As such, 0644
is the same as 644
.
The first digit may also be a sum of the values 4
(s
in strings) for the setuid
bit, 2
(S
) for the setgid
bit, and 1
(t
) for the sticky
bit.
When set on an executable file, the the setuid
and setgid
bits tell the kernel to run the file using the user or group ID of the file's owner, respectively. For example, we can test this with the Python REPL. First, copy the python
executable with the command cp -iv /usr/bin/python3.11 ~/pybits
(make sure to adjust the path accordingly for your Python version). Then change the owner and group by running sudo chown 5000:2000 ~/pybits
. This will set the uid
to 5000
, and gid
to 2000
. Note that these IDs are likely to be unused on your system, but this does not matter. Finally, let's enable both the setuid
and setgid
bits with sudo chmod 6755 ~/pybits
. Once done, let's take a look at the new permissions.
As we can see, rather than x
for executable, the user and group permissions now show s
instead, and the uid
and group
are set to 5000
and 2000
, respectively. As there are no corresponding entries in the user and group databases, the IDs are shown numerically. If we run the ~/pybits
command, we can now revisit the os
module as covered in Users and Permissions.
>>> import os
>>> print (os.getuid(), os.geteuid())
1000 5000
>>> print (os.getgid(), os.getegid())
1000 2000
As we can see, our effective user and group IDs have been changed.
Note that this may lead to security issues, so the setuid
and setgid
are typically ignored on executable scripts, but they can be useful to grant certain extra permissions. The sudo
command usually has the setuid
bit set to elevate privileges and allow running commands as the superuser.
The sticky
bit is commonly set on directories so that only the file owner, directory owner, or superuser can delete files inside the directory. For example, this is very useful for the /tmp/
directory which is world-writeable by default in ensuring that user eve
cannot delete files owned by user alice
.
File attributes
Some file systems further allow you to set file attributes and even extended attributes. This is beyond the scope of this book, but as a quick example, a file that is writeable by some user but has the a
attribute set may be opened for writing in append mode, but not in write mode. See the linked chattr
and xattr
man pages at the end of this chapter.
Mounting file systems
Linux supports a variety of different types of file systems. The most common ones are in the ext family (ext2
, ext3
, ext4
), but btrfs
and xfs
are other popular options. If you are looking to format a drive or a partition for usage with Linux and are unsure which file system to use, ext4
is always a safe choice.
Linux can also mount many file systems designed for use with other operating systems, but as they may lack features like proper file permissions, they should not be used to install Linux.
Finally, some file systems may require extra steps prior to mounting. For example, a file system encrypted via dm-crypt
must be unlocked before it can be mounted.
To mount a file system, you can use the mount
command, but you can also use the /etc/fstab
file to declare which file systems should be mounted automatically. To mount a file system, you must provide at least a source and destination. The source will typically be a device, such as /dev/sda1
, while the destination will be a directory. It is common to use subdirectories of the /mnt/
directory, but you may also mount a drive in /home/
if you would like to share your files across different Linux installs, /var/
to dedicate an entire drive or partition to variable files if you require extra storage for certain applications, etc.
Note: not every mount needs to be a file system. While file systems typically need to be mounted as superuser, you can also use File system in USErspace (FUSE) to mount certain types of file systems as a regular user. When using e.g. sshfs
to mount a remote location onto a different machine, you can mount any directory rather than a block device such as a partition. tmpfs
mounts a virtual file system in the computer's memory.
For more information, see the File systems and fstab pages on the Arch Wiki linked below.
File links
Links are references to files (or directories) on the file system. When you create a new file, a link is automatically created to it. When you create a new directory, two links are automatically created to it. One is a reference to the directory itself, the other is the special .
entry as mentioned earlier in this chapter.
Those links are called hard links because they point to a concrete file (or more specifically to an inode, which is beyond the scope of this book). There is another type of link, called symbolic link (aka symlink), that simply points to another name on a file system. Symlinks are one of the "special" file types mentioned previously.
Let's explore links in this section by creating some temporary files and directories as well as some extra links. Both symbolic and hard links can be created using the ln
command.
$ mkdir /tmp/testlinks/
$ cd /tmp/testlinks/
$ touch file1 # create an empty file
$ mkdir dir1 # create a directory
$ ln file1 file2 # add a new hard link to file 1
$ dd if=/dev/urandom of=randomfile bs=1M count=10 # create a file containing 10 MB of random data
$ ln randomfile randomfile2
$ ln randomfile randomfile3
$ ln -s randomfile randomfile4 # create a symbolic link
Note that you may not create new hard links to directories.
A few things worthy of note happened after we ran these commands. Let's have a look at the file listing.
$ ls -lh
total 30M
drwxr-xr-x 2 username username 40 Jan 4 01:30 dir1/
-rw-r--r-- 2 username username 0 Jan 4 01:29 file1
-rw-r--r-- 2 username username 0 Jan 4 01:29 file2
-rw-r--r-- 3 username username 10M Jan 4 01:30 randomfile
-rw-r--r-- 3 username username 10M Jan 4 01:30 randomfile2
-rw-r--r-- 3 username username 10M Jan 4 01:30 randomfile3
lrwxrwxrwx 1 username username 10 Jan 4 01:30 randomfile4 -> randomfile
First, we see our directory says it contains 30 MB worth of data, but if we run du -h
, we see it only takes up 10 MB. That's because we have three hard links to the same file, originally created under the name randomfile
. There is nothing particularly special about the file randomfile
, the original link to our file that we created using the dd
command. If we delete it, the data will still persist on the disk as we have more hard links to it under randomfile2
and randomfile3
, however randomfile4
will now be a broken symbolic link, pointing to a file that is no longer there. A file will only be really deleted from the disk once it has zero hard links pointing to its inode.
Also, note that file1
and file2
are hard links to the same file, while randomfile
, randomfile2
and randomfile3
are hard links to another file. They have the same file sizes, content, and permissions. Changing the permission of one link will also apply to other hard links. Note that as a symbolic link, randomfile4
is only 10 bytes, the length of the string it points to.
Finally, note that randomfile4
shows l
in the file type field in the permissions list, indicating that it is a symbolic link. Depending on your OS settings, it may also show an arrow showing the destination of the link, as in my example.
Hard links can only be created to files on the same file system, while symbolic links can be created to any file or directory on any file system, even if the linked path does not exist. (Whether a link to a destination that does not exist is useful is a different question.)
Symlinks and nginx
Symbolic links were traditionally used to enable and disable site configurations in nginx. It was pretty common practice to have the directories /etc/nginx/sites-available/
containing actual configuration files, and /etc/nginx/sites-enabled/
containing symbolic links to those files. According to the Nginx Cookbook, this practice is deprecated.
The STDs
There are three special files (that are likely symlinks) used for input and output operations. They are /dev/stdin
, /dev/stdout
and /dev/stderr
, for standard input, standard output and standard error, respectively.
The standard input file is normally the terminal keyboard, and the standard output and standard error files are normally the terminal screen. When you run a Python program, the input()
function reads from stdin
, while print()
writes to stdout
. Exceptions are typically written to stderr
. This is useful in separating the output stream from the error stream.
Other special files
There are several other special files that can be used for different operations, not all of which will be covered in this book. /dev/null
will suppress output.
/dev/zero
can be used to read bytes with the value 0
, while /dev/random
can be used to read random bytes.
>>> with open("/dev/zero", "rb") as f:
... f.read(10)
...
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
>>> with open("/dev/random", "rb") as f:
... f.read(20) # your values should be different
...
b"S'4_6\xb5E\x8b\xcb\xaa\x8b{\xc0F8\x91\x9d\xa2\x8bw"
The file system and Python
The two most common modules to work with file paths in Python are os.path
and pathlib
, with pathlib
being recommended for most modern applications. Here are a few quick examples using Python's pathlib
module:
>>> from pathlib import Path
>>> p = Path("/tmp/testlinks/")
>>> p.exists()
True
>>> p.is_dir()
True
>>> for _ in p.iterdir(): print(_)
...
/tmp/testlinks/file1
/tmp/testlinks/dir1
/tmp/testlinks/file2
/tmp/testlinks/randomfile2
/tmp/testlinks/randomfile3
/tmp/testlinks/randomfile4
>>> p2 = Path("~")
>>> p2.expanduser()
PosixPath('/home/username')
>>> p2.exists() # Note that ~ does not exist as a real path
False
>>> p2.expanduser().exists()
True
We can also use this module to look at a file's type and permissions:
>>> >>> p3 = Path("/tmp/testlinks/randomfile4")
>>> p3.is_symlink()
True
>>> oct(p3.lstat().st_mode)
'0o120777'
Note that we need to convert the st_mode
attribute to octal format, and how we use lstat
rather than stat
as this is a symbolic link. The last three characters are the file's permissions as previously described.
The file system and Django
Django uses the pathlib
module to generate the BASE_DIR
variable in settings.py
. Older versions used os.path
, so you may want to replace that with pathlib
if your Python version is modern enough. Note: pathlib
was added in Python 3.4, so there's really no reason to not use it.
The BASE_DIR
variable is then used to set necessary paths, such as for the SQLite database file, template dirs, static and media files, etc. You can think of BASE_DIR
as the directory containing the manage.py
file.
For example, to add the ./templates/
directory to the TEMPLATES
list in settings.py
, add it to the "DIRS"
list:
# settings.py
TEMPLATES = [
{
"BACKEND": "django.template.backends.django.DjangoTemplates",
"DIRS": [BASE_DIR / "templates"],
...
},
]
As it may be difficult to know the exact current working directory in Django, you should rely on the BASE_DIR
setting as much as possible when working with files directly. For example, if you want to access a file named foo/llms/model.dat
(relative to the BASE_DIR
) from a file foo/llms/views.py
, you could access it as such:
# views.py
from django.conf import settings
BASE_DIR = settings.BASE_DIR
with open(BASE_DIR / "foo" / "llms" / "model.dat") as f:
...
We will cover where to store static and media files in later chapters.
Links
- Filesystem Hierarchy Standard 3.0 specification
- XDG Base Directory on the Arch Wiki
file-hierarchy(7)
man pagehier(7)
man pagechattr(1)
man pagexattr(7)
man page- Executable and Linkable Format on Wikipedia
- Shebang (Unix) on Wikipedia
- Question about the shebang not splitting arguments on Stack Overflow
- File permissions and attributes on the Arch Wiki
- File Systems on the Arch Wiki
- fstab on the Arch Wiki
- FUSE on the Arch Wiki
- Inode on Wikipedia
- The
pathlib
module in the Python docs