Day One: Bash and the Command Line
This lesson is adapted from The Unix Shell lesson of Software Carpentry.
- Traversing the Filesystem
- Where Are You?
pwd, cd, ls
- Anatomy of a Bash Command: Command, Options, Arguments
- File and Folder Manipulation:
mkdir, nano, mv, cp, rm
- Wildcards:
*, ?
- Processing Files with
wc
andcat
- Writing Output to Files:
>
and>>
- Filtering Output with
sort
,head
, andtail
- Today’s Bash Command List
- Learn More About Bash
Traversing the Filesystem
The part of the operating system responsible for managing files and directories is called the filesystem. It organizes our data into files and folders which hold files or other directories.
A screenshot of a folder as it appears in the Finder GUI.
You have probably interacted with your device’s filesystem through GUI or graphical user interface applications like Finder (MacOS) or File Explorer (Windows).
A screenshot of the same folder contents as they appear from the command line.
To work with high performance-computing systems like Talapas, you will need to interact with files and folders from a shell application called the command line, command prompt, or terminal interchangeably. The language to communicate with this shell application is called Bash.
Today’s Bash commands are used to create, inspect, rename, and delete files and directories. To start exploring them, move to the Git Bash or Terminal application that you opened during the workshop setup.
Where Are You? pwd, cd, ls
For today’s lesson, we will focus on running Bash commands locally on your own device. In our next lesson, we will run these commands (and a few new ones) on a remote cluster.
A Note on Example Commands
I have tested the commands in this workshop on a Windows computer with Git Bash installed. This means that my output paths typically begin with
/c/users/erin
rather than/users/erin
.
Entering Bash Commands
Bash commands can be intimidatingly terse for new programmers. However, if you start with the grammar, you will gradually commit more commands to memory with practice.
The simplest form of Bash’s grammar is a single command or program terminated by the Enter-key. Today we will focus on using built-in commands that control the underlying filesystem.
Where You Are: pwd
Our first Bash command is pwd
, which stands for print working directory.
pwd
Because terminal applications typically open to your home directory by default, the expected output is the absolute path of your home directory, which will vary based on your operating system and your account name.
/c/Users/Erin
This command returns the absolute path of the current working directory. All relative file and folder paths used as arguments must be relative to the current working directory.
Home Directory Structure
Each of you will have a different set of files and folders inside your home directory, which would make it difficult for interactive exercises. However, we’re going to practice Bash commands from inside the talapas-bash
folder you extracted from talapas-bash.zip
.
You should have a folder /users/home/[YOURNAME]/talapas-bash/
(Unix) or /c/users/Home/[YOURNAME]/talapas-bash/
(Git Bash) with a set of practice subfolders and files.
Please ask for assistance at this point if you need help moving and extracting thetalapas-bash.zip
file.
Changing the Current Working Directory: cd
Change your working directory to talapas-bash
using the cd
command. cd
takes in a single argument, the directory to move to. You can use either absolute or relative paths, but the relative paths must be relative (as always) to the current working directory.
cd talapas-bash
Confirm that you are inside the talapas-bash
directory by running the pwd
command again.
pwd
/c/Users/Erin/talapas-bash
Listing “Stuff” With ls
This begs the question: what files and folders are here?
Inspect the contents and their respective using ls
. The ls
command stands for list directory contents.
ls
books/ exercise-data/ scripts/
ls
is also the first command that we will use with options.
Options are letters signaled by a hyphen that are passed to commands to change the behavior of that command.
Type ls -l
with a - (hyphen) key. Options must come after the command they modify and without space between the hyphen and the letter representing the option.
ls -l
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 books/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 exercise-data/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 scripts/
The -l
option enables “long-listing” format. We’ll come back to what each column in this new listing format means later when discussing permissions.
We should still be in the talapas-bash
directory, which we can check using:
pwd
/Users/emwin/talapas-bash
Next, we’ll move to the talapas-bash/exercise-data
directory and see what it contains.
Auto-Complete with Bash
For lengthy filepaths, the shell can infer file and folder names through auto-completion. This is a great feature for traversing long or complicated file paths! Type cd exer
and Tab and the shell will automatically complete it to exercise-data
.
cd exercise-data
ls -l
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 alkanes/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 animal-counts/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 creatures/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:49 mice/
-rw-r--r-- 1 Erin 197121 18 Feb 3 22:48 numbers.txt
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 writing/
You can enable multiple options simultaneously by putting both letters after the hyphen.
Here, the h
option stands for “human-readable” and prints file sizes in KB, MB, and GB rather than in bytes for readability.
ls -lh
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 alkanes/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 animal-counts/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 creatures/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:49 mice/
-rw-r--r-- 1 Erin 197121 18B Feb 3 22:48 numbers.txt
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 writing/
Order among options does not matter when enabling multiple options.
ls -hl
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 alkanes/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 animal-counts/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 creatures/
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:49 mice/
-rw-r--r-- 1 Erin 197121 18B Feb 3 22:48 numbers.txt
drwxr-xr-x 1 Erin 197121 0 Feb 3 22:48 writing/
However, options are case-sensitive. For example, let’s try ls -F
and ls -f
.
ls -F
alkanes/ animal-counts/ creatures/ mice/ numbers.txt writing/
The -F
option tells ls
to add a marker to file and directory names to indicate what they are:
- a trailing
/
indicates that this is a directory @
indicates a link*
indicates an executable
This can be extremely informative. (Windows: You may already see these markings by default in Git Bash.)
The -f
option, however, tells ls
to remove any color-coding and to show all hidden files.
ls -f
. .. alkanes animal-counts creatures mice numbers.txt writing
If you try to use an option that is not supported, Bash commands will print an error message similar to:
ls -j
ls: invalid option -- 'j'
Try 'ls --help' for more information.
Anatomy of a Bash Command: Command, Options, Arguments
Let’s formalize some terminology by looking at an example Bash command:
ls -F /
In the example ls
is the command, as run with an option -F
(type indicator flag) and an argument /
(the root directory).
There are two types of options: single dash -
options or short options, and two dashes --
options or long options.
A command can be called with more than one option and more than one argument, but a command doesn’t always require an argument or an option. For example, the pwd
command never takes an argument.
You might sometimes see options being referred to as switches or flags, especially for options that take no argument.
Each part of the command is separated by spaces. If you omit the space between ls and -F the shell will look for a command called ls-F
, which doesn’t exist.
When ls
is used without an argument, it displays the files and folders in the current working directory.
When it is used with an argument, it displays the files and folders in that directory.
In this case, ls -F
will display all the files at the root directory /
.
Quiz
What happens if we try to
cd exercise-data
from here? Why?
Answer
The command will fail with the following error because there is no
exercise-data
directory insideexercise-data
and we supplied a relative path.bash: cd: exercise-data: No such file or directory
File and Folder Manipulation: mkdir, nano, mv, cp, rm
Special Characters: ~
, ..
and .
With our methods so far, cd
can only see sub-directories inside your current directory.
However, there is a shortcut in the shell to move up one directory level without resorting to absolute paths from root. It works as follows:
cd ..
..
is a special directory, the parent of the current directory. Sure enough, if we run pwd
after running cd ..
, we’re back in talapas-bash
:
$ pwd
/c/Users/Erin/talapas-bash/
The special directory ..
doesn’t usually show up when we run ls
. If we want to display it, we can add the -a
option to ls -F
:
ls -Fa
./ ../ books/ exercise-data/ scripts/
It also displays another special directory .
that points to the current working directory. It may seem redundant to have a folder refer to itself, but it will become useful as we learn more commands.
Quiz
From your current working directory of
writing
, how could use..
andls
to see the contents of themice
directory? Hint: themice
directory path istalapas-bash/exercise-data/mice
.
Answer
Use
ls ../mice
to look for a directory calledmice
from insideexercise-data
.ls ../mice
Animals.txt Tasks.txt citation.txt README.md Visit.txt
Finally, let’s discuss a third special character: ~
. This is shorthand for the absolute path if your home directory and it works regardless of your operating system.
cd ~
ls
Applications Documents Library Music Public
Desktop Downloads Movies Pictures
Quiz
How would I navigate to talapas-bash using a relative path? How about an aboslute path using the
~
again?
Answer
cd talapas-bash
, andcd ~/talapas-bash
respectively
More Complex Traversals
We don’t have to traverse through the filesystem one folder at a time!
When using auto-complete, if you press Tab and no paths populate, it means there’s more than one possible path with the current prefix. Because exercise-data/
has multiple files and subfolders inside, the shell can’t decide path which to populate. Press Tab twice to see all the possibilities.
Now, type a single w for writing
and press the Tab key a final time. Then, run the full command.
cd exercise-data/writing/
Creating Directories with mkdir
Inside writing
, let’s create a new empty directory called thesis
using the command mkdir thesis
.
mkdir thesis
mkdir
means ‘make directory’. Since thesis
is a relative path, the new directory is created in the current working directory:
ls -F
haiku.txt LittleWomen.txt thesis/
Since we’ve just created the thesis
directory, there’s nothing in it yet. We can check this by passing in thesis
as an argument to ls
.
ls -F thesis
Note that mkdir
is not limited to creating single directories one at a time. The -p
option allows mkdir
to create nested subdirectories in a single operation. This command creates a project
directory with two subfolders: data
and results
.
mkdir -p ../project/data ../project/results
The -R
option to the ls
command will list all nested subdirectories within a directory.
Let’s use ls -FR
to recursively list the new directory hierarchy we just created in the project
directory:
ls -FR ../project
../project/:
data/ results/
../project/data:
../project/results:
A Beginner-Friendly Text Editor: nano
Let’s change our working directory to thesis
using cd
, then run a text editor called Nano to create a file called draft.txt
:
cd thesis
nano draft.txt
Let’s type in a few lines of text.
Once we’re happy with our text, we can press Ctrl+O (press the Ctrl or Control key and, while holding it down, press the O key) to write our data to disk. Press Return to write out to draft.txt
.
Once our file is saved, we can use Ctrl+X to quit the editor and return to the shell.
nano
doesn’t leave any output on the screen after it exits, but ls
now shows that we have created a file called draft.txt
:
$ ls
draft.txt
Renaming and Moving with mv
Return to the talapas-bash/exercise-data/writing
directory, using ..
. This means we need to navigate to the parent of thesis
.
cd ..
In our thesis
directory we have a file draft.txt
. Let’s change the file’s name using mv
, which is short for ‘move’:
mv thesis/draft.txt thesis/wisdom.txt
The first argument tells mv
the source file or folder, while the second is the destination. In this case, we’re moving thesis/draft.txt
to thesis/wisdom.txt
, which has the same effect as renaming the file. Now, ls
shows us that thesis
now contains one file called wisdom.txt
:
ls thesis
wisdom.txt
One must be careful when specifying the target file name, since mv
will silently overwrite any existing file with the same name. For example, if we move LittleWoman.txt to haiku.txt, we will be left with one file named haiku.txt that has the contents of the novel Little Women.
mv LittleWomen.txt haiku.txt
ls -f
haiku.txt thesis/
Looking at Long Files with less
Speaking of that long text files, let’s practice looking at long files (like logs, extensive code) from the command line with a screen reader called less
.
less haiku.txt
Use the arrow keys to scroll through the text and the Q-key to exit the manual.
By default, mv
will not ask for confirmation before overwriting files. However, an additional option, mv -i
(or mv --interactive
), will cause mv
to request such confirmation.
Note that mv
also works on directories.
Let’s move wisdom.txt
into the current working directory writing
.
We use mv
once again, but this time we’ll use just the name of a directory as the second argument to tell mv
that we want to keep the filename but put the file somewhere new.
In this case, the directory name we use is the special directory name .
that we mentioned earlier.
mv thesis/wisdom.txt .
This moves wisdom.txt
from thesis
to the current working directory. ls
now shows us that thesis
is empty:
ls thesis
Alternatively, we can confirm the file wisdom.txt
is no longer present in the thesis
directory by trying to list it with ls
.
This is a helpful debugging strategy for path resolution errors both locally and on Talapas.
ls thesis/wisdom.txt
ls: cannot access 'thesis/wisdom.txt': No such file or directory
We can also use this to see that wisdom.txt
is now present in our current directory:
ls wisdom.txt
wisdom.txt
Copying files and directories: cp
The cp
command is similar to mv
, except it copies files and folders instead of moving or renaming them.
cp wisdom.txt thesis/quotations.txt
ls wisdom.txt thesis/quotations.txt
wisdom.txt thesis/quotations.txt
We can also copy a directory and its contents by using the recursive option -r
:
cp -r thesis thesis_backup
We can check the result by listing the contents of both the thesis
and thesis_backup
directory. The contents are identical.
ls thesis thesis_backup
thesis:
quotations.txt
thesis_backup:
quotations.txt
It is important to include the -r
flag to recursively copy the folders’ contents – all subfolders and files – when moving nonempty folders. If you want to copy a directory and omit this option, you will see a message that the directory has been omitted with -r not specified
.
cp thesis thesis_backup
cp: -r not specified; omitting directory 'thesis'
The recursive -r
flag is an extremely common option. Keep in eye out for it in other Bash commands that manipulate or traverse folders.
Creating Empty Files: touch
To create an empty file, use the touch
command followed by one or more file paths.
Let’s make an empty directory and put a few placeholder (empty) files inside.
mkdir journal
touch journal/day1.txt journal/day2.txt
To show that these files are empty (size 0B), we can use the ls -l
commands, which shows the directory contents in a long-listing format.
ls -lh journal
-rw-r--r-- 1 Erin 197121 0 Feb 3 00:17 day1.txt
-rw-r--r-- 1 Erin 197121 0 Feb 3 00:17 day2.txt
In this case, file size 0
is the 5th column.
Removing Files and Directories with rm
Returning to the talapas-bash/exercise-data/writing
directory, let’s tidy up this directory by removing the original wisdom.txt
file we created.
The command we’ll use for this is rm
or remove:
rm wisdom.txt
Check the file is gone with ls
:
ls wisdom.txt
ls: cannot access 'wisdom.txt': No such file or directory
If we try to remove the thesis
directory using rm thesis
, we get an error message:
rm thesis
rm: cannot remove 'thesis': Is a directory
This happens because rm
by default only works on files, not directories.
rm
can remove a directory and all its contents if we use the recursive option -r
, and it will do so without any confirmation prompts:
rm -r thesis
Given that there is no way to retrieve files deleted using the shell, rm -r
should be used with great caution. Consider adding the interactive option rm -r -i
, which will prompt for the individual deletion of each file in the recursive traversal.
Deleting from the Command Line is Forever
The Unix shell doesn’t have a trash bin that we can recover deleted files from.
On Talapas, your only hope in scenarios like this is to restore a version of the file or folder from a system backup. As researchers working in a shared computing environment, it is your responsibility not to use rm
on project files and folders unless it is safe to remove them.
Wildcards: *, ?
*
is a wildcard, which represents zero or more other characters. Let’s consider the talapas-bash/exercise-data/alkanes
directory.
cd ~/talapas-bash/exercise-data/alkanes
ls -F
cubane.pdb ethane.pdb explosive/ methane.pdb octane.pdb pentane.pdb propane.pdb
*.pdb
represents ethane.pdb
, propane.pdb
, and every file that ends with ‘.pdb’. Let’s test this by using with ls
.
ls *.pdb
cubane.pdb ethane.pdb methane.pdb octane.pdb pentane.pdb propane.pdb
On the other hand, p*.pdb
only represents pentane.pdb
and propane.pdb
, because the ‘p’ at the front can only represent filenames that begin with the letter ‘p’.
ls p*.pdb
pentane.pdb propane.pdb
?
is also a wildcard, but it represents exactly one character. So ?ethane.pdb
can only represent methane.pdb
whereas *ethane.pdb
represents both ethane.pdb
and methane.pdb
.
ls ?ethane.pdb
methane.pdb
Wildcards can be used in combination with each other. For example, ???ane.pdb
indicates three characters followed by ane.pdb
, giving cubane.pdb ethane.pdb octane.pdb
.
When the shell sees a wildcard, it expands the wildcard to create a list of matching filenames before running the preceding command.
As an exception, if a wildcard expression does not match any file, Bash will pass the expression as an argument to the command as it is. For example, typing ls *.pdf
in the alkanes
directory (which contains only files with names ending with .pdb
) results in an error message that there is no file with a .pdf
extension.
ls *.pdf
ls: cannot access '*.pdf': No such file or directory
Commands like wc
and ls
see the lists of file names matching these expressions generated by the shell, but not the wildcards themselves. It is the shell itself that expands the wildcards.
Processing Files with wc
and cat
Let’s look more closely at the alkanes
folder that contains six files describing some simple organic molecules.
The .pdb
extension indicates that these files are in Protein Data Bank format, a simple text format that specifies the type and position of each atom in the molecule.
$ ls
cubane.pdb methane.pdb pentane.pdb
ethane.pdb octane.pdb propane.pdb
Concatenating with cat
Let’s (con)catenate this file to the terminal to inspect its connects using cat
.
cat methane.pdb
COMPND METHANE
AUTHOR DAVE WOODCOCK 95 12 18
ATOM 1 C 1 0.257 -0.363 0.000 1.00 0.00
ATOM 2 H 1 0.257 0.727 0.000 1.00 0.00
ATOM 3 H 1 0.771 -0.727 0.890 1.00 0.00
ATOM 4 H 1 0.771 -0.727 -0.890 1.00 0.00
ATOM 5 H 1 -0.771 -0.727 0.000 1.00 0.00
TER 6 1
END
Unlike nano
, cat
prints the contents of a file directly to the terminal and does not give us the opportunity to edit it. Commands like cat
are not appropriate for longer files.
To look at a numeric summary of a text file, we can use the versatile wc
or word count command.
wc cubane.pdb
20 156 1158 cubane.pdb
wc
or “word count” counts the number of lines, words, and characters in files and displays them in that order from left to right.
If we run the command wc *.pdb
, the *
in *.pdb
matches zero or more characters, so the shell turns *.pdb
into a list of all .pdb
files in the current directory.
wc *.pdb
20 156 1158 cubane.pdb
12 84 622 ethane.pdb
9 57 422 methane.pdb
30 246 1828 octane.pdb
21 165 1226 pentane.pdb
15 111 825 propane.pdb
107 819 6081 total
Note that wc *.pdb
also shows the total number of all lines in the last line of the output.
If we run wc -l
instead of just wc
, the output shows only the number of lines per file:
wc -l *.pdb
20 cubane.pdb
12 ethane.pdb
9 methane.pdb
30 octane.pdb
21 pentane.pdb
15 propane.pdb
107 total
The -m
and -w
options can also be used with the wc
command to show only the number of characters or the number of words, respectively.
Writing Output to Files: >
and >>
Which of these protein database files contains the fewest lines? It’s an easy question to answer when there are only six files, but what if there were 6000? Our first step toward a solution is to run the command:
wc -l *.pdb > lengths.txt
The greater than symbol, >
, tells the shell to redirect the command’s output to a file instead of printing it to the screen.
This command prints no screen output, because everything that wc
would have printed has gone into the file lengths.txt
instead. If the file doesn’t exist prior to issuing the command, the shell will create the file. If the file exists already, it will be silently overwritten. Thus, redirect commands require caution.
ls lengths.txt
confirms that the file exists:
ls lengths.txt
lengths.txt
We can now send the content of lengths.txt
to the screen using cat lengths.txt
.
cat lengths.txt
20 cubane.pdb
12 ethane.pdb
9 methane.pdb
30 octane.pdb
21 pentane.pdb
15 propane.pdb
107 total
We’ll continue to use cat
in this lesson, for convenience and consistency, but it has the disadvantage that it always dumps the whole file onto your screen.
Filtering Output with sort
, head
, and tail
Next we’ll use the sort
command to sort the contents of the lengths.txt
file.
The file talapas-bash/exercise-data/numbers.txt
contains the following lines:
10
2
19
22
6
If we run sort
on this file, the output is:
10
19
2
22
6
If we run sort -n
on the same file, we get this instead:
2
6
10
19
22
This is because the -n
option specifies a numerical rather than an alphanumerical sort.
The sort
command alone does not change input files; it prints their lines in sorted order to the screen.
sort -n lengths.txt
9 methane.pdb
12 ethane.pdb
15 propane.pdb
20 cubane.pdb
21 pentane.pdb
30 octane.pdb
107 total
We can put the sorted list of lines in another temporary file called sorted-lengths.txt
by putting > sorted-lengths.txt
after the command, just as we used > lengths.txt
to put the output of wc
into lengths.txt
.
Once we’ve done that, we can run another command called head
to get the line of sorted-lengths.txt
:
sort -n lengths.txt > sorted-lengths.txt
head -n 1 sorted-lengths.txt
9 methane.pdb
This tells us that methane.pdb
is the shortest of the files, with only 9 lines.
Using -n 1
with head
tells it that we only want the first line of the file; -n 20
would get the first 20, and so on.
Waiting for Input…
What happens if a command is supposed to process a file, but we don’t give it a filename? For example, what if we type wc -l
but don’t type anything after the command?
wc -l
Since it wasn’t supplied filenames as arguments, wc
waits for us to give it a path to data interactively.
If you make this kind of mistake, you can escape out of this state by holding down the control key (Ctrl) and pressing the letter C once: Ctrl+C. Then release both keys.
Ctrl+C can also be used to exit running programs, so this is an essential skill for interacting with the command line.
Getting Help with Linux Commands: man
and help
Commands like ls
have so many options that even the most experienced users wouldn’t have them all memorized. As you encounter new commands, use the GNU manual and references like StackOverflow to guide you in configuring their options and options.
The manual pages for most commands can be accessed as follows:
Mac OS, Linux
man wc
Use the arrow keys to scroll through the text and the Q-key to exit the manual.
Windows
wc --help
The --help
flag prints the contents of the manual directly to the terminal instead. Scroll upwards with your scroll bar to see the output.
Clearing the Screen: clear
As we wrap up for the day, let’s clear the text that has been printed to the terminal with the clear
command. This will not affect your command history or any of your files.
Today’s Bash Command List
command | description | example usage |
---|---|---|
pwd | print working directory | pwd |
ls | list stuff (files, folders) | ls -lha |
cd [directory] | change directory | cd ~/Pictures |
mkdir [directory_name(s)] | make directory | mkdir my_new_dir |
rm [file(s)] | remove files (permanently) | rm a.txt b.txt |
nano [filename] | create or open file at filename | nano draft.txt |
less [filename] | open a paged reader for filename | less bigDoc.md |
touch [filename] | create an empty file at filename | touch empty.txt |
mv [old] [new] | move (or rename) files and folders | mv water.txt wine.txt |
cp [old] [new] | copy files or folders to a new location | cp old.txt backup/ |
wc [filename] | prints line, word, and char counts | wc list.txt |
cat [filename] | prints the contents of a file to the screen | cat list.txt |
command > [filename] | redirects the output of a command to file | ls > files.txt |
sort [filename] | sorts the lines of a file | sort -n rows.csv |
head -n [# lines] [file] | print the first # lines of a file or files | head -n long-novel.txt |
clear | clears the terminal screen (not the history) | clear |
Learn More About Bash
- The Linux command line for beginners, Ubuntu Tutorials.
- A short Bash tutorial designed for Ubuntu users with a summary of the historical context for UNIX and Linux. You can install Ubuntu for free and follow along in the GUI with VirtualBox.