Debian / Ubuntu System Administration Roadmap
This is intended to be a brief summary of the basic knowledge and skills needed in order to maintain the NebulOS cluster at UC Riverside, starting at a low level. More generally, this is a concise roadmap (or outline) for Ubuntu Linux system administration. In this roadmap, I will point you to resources that can be used to learn more. The purpose of this is not to explain everything in detail; it will merely get you started. Note that some of the things mentioned here are specific to Ubuntu or Debian-based operating systems, so this isn't exactly a general Linux system administration roadmap.
The Basics
You will need to have a good understanding of...
-
the layout of the the GNU / Linux file system: Filesystem Hierarchy Standard.
file permissions and attributes: Wikipedia, Arch, StackExchange
the BASH shell (and other shells, like Zsh, if you desire).
the Unix philosophy: Wikipedia article, Catb
You will need to learn to solve your own problems with and without the assistance of Google. This will involve reading manual pages (man pages), info pages, help files, and system logs. For example, this command shows you the manual for Bash:
[bash gutter="false"] $ man bash[/bash]
The following command shows you all of the built-in functions in Bash:
[bash gutter="false"] $ enable -a[/bash]
To find out more about any of the commands that are listed by this command, you can read more documentation using the help command. For instance, to find out more about the wait command:
[bash gutter="false"] $ help wait[/bash]
You should also learn to write non-trivial scripts in Bash, with conditional blocks, loops, and functions.
Resources for learning BASH: Book 1, Book 2
You should also understand the concept of an inode. In particular, you should know the difference between copying a file and moving a file. You should understand why moving a 5 GB file from one directory to another directory on the same disk happens almost immediately, while copying the file takes a few seconds. You should also understand the difference between soft and hard links (also known as symbolic links and ordinary links, produced by ln -s
and ln
).
Input and Output Redirection and Piping
-
Learn about Unix pipes (particularly the usage of the | operator in Bash)
Learn to use >, >>, >&, >&1, 1>&2, >>&, >!, and >&!, for output redirection.
Learn to use < for input redirection
Learn about named pipes (FIFOs, with the
fifo
command)
Viewing and Changing Permissions and Attributes
Learn about these:
-
chown
chmod
chatter
lsatter
ls -l
stat
Installing, Updating, and Removing Software
Since we are using Ubuntu, which is a Debian derivative, become familiar with...
-
apt-get
apt-cache (In particular, apt-cache depends and apt-cache rdepends)
dpkg
dpkg-reconfigure
synaptic (Synaptic Package Manager)
aptitude
apt
Investigating dependencies:
[bash gutter="false"] $ aptitude why package_name$ apt-cache rdepends package_name --installed --recurse
[/bash]
Listing, Creating, Deleting, and Modifying Users and Groups
-
users
w
useradd, adduser
deluser, userdel
usermod
groups
groupadd, addgroup
groupdel, delgroup
groupmod
It's also nice to know how user information (including passwords) are stored.
Changing Password: passwd
Perform actions as another user (switch user): su
Obtaining Root Priviledges / Becoming Root
-
su
sudo
sudo -i
Regular Expressions
Learn regular expressions (Regex) so that you can use grep, gawk, sed
, and other utilities effectively:
Searching The System
locate
(and the associated updatedb
command)
find
(a very versatile tool;there are many nice tutorials online)
apropos
(search for phrases in all of the man pages installed on the system)
whereis
(search for files associated with a program)
Searching a stream, a file, or a group of files:
-
command | grep [options] 'search term'
grep [options] 'search term' file
grep [options] 'search term' dir/*/filename-pattern*
Find and Examine Hard Drives / Block Devices and Their Contents
-
lsblk
blkid (as root)
parted -l (as root)
df
du
Format a Hard Drive or Partition, Edit Partitions, Fix Filesystem Problems
Since we primarily use EXT4 and Btrfs filesystems:
-
mkfs.btrfs
mkfs.ext4
parted
btrfs-convert
btrfs [sub-command]
fsck
btrfs restore -iv /dev/(disk) /recovered/data/
lvm (and the many tools associated with lvm)
Mounting and Unmounting Drives & Partitions
-
mount
umount
You may need to run lsof before unmounting to see if any programs are using the mounted device. You should definitely learn about the many mount options available.
Also learn about the format of /etc/fstab.
A Few Important Configuration & Info Files
-
/etc/grub.d/*
/etc/fstab
/etc/mtab
/etc/hosts
/etc/hostname
/etc/network/interfaces
/etc/sysctl.conf
/etc/sudoers (only edit with visudo)
/etc/sudoers.d/*
/etc/passwd
/etc/shadow
/etc/group
/etc/gshadow
/etc/crontab
/etc/os-release
/etc/bash.bashrc
/etc/init/*
/etc/init.d/*
/home/username/.bashrc
/etc/modules
Some Special Files (Devices) and Directories
-
/dev/null
/dev/random
/dev/urandom
/dev/zero
/run/shm (a RAM disk)
System Logs
Most logs are stored in /var/log/. The following are particularly useful when diagnosing problems:
-
/var/log/dmesg
/var/log/syslog
/var/log/boot.log
/var/log/auth.log
/var/log/dpkg.log
Managing Processes
-
top
htop
pgrep
ps
pkill
kill
killall
Child Process Management / Multiprocessing in Bash
To spawn a new process in the background, place an ampersand after the command:
[bash gutter="false"] $ command &[/bash]
Also refer to documentation for (i.e., learn about) the following:
-
fg
bg
wait
nohup
disown
(You should have learned about these when you learned about advanced Bash scripting)
Configuring the Bootloader
Since Ubuntu uses GRUB (The Grand Unified Bootloader), you should know a little bit about GRUB. You can learn from some Google searches and from the info page for grub-mkconfig
, which is a command that you may need to use to regenerate your GRUB configuration. Also look at the files in /etc/grub.d/
[/bash]
Note: in practice, it’s easier to use update-grub
, which calls grub-mkconfig
.
Scheduling Tasks
Be aware of cron, at, and associated tools. You may need to add, remove, or modify cron jobs, for instance
-
crontab
cron
at
atq
atrm
batch
System Init
Some versions of Ubuntu use Upstart for system initialization, while the latest versions use Systemd. In either case, you can begin learning about the init system using:
[bash gutter="false"] $ man init[/bash]
You can learn more by reading tutorials online.
In any case, the restart and shut-down commands are:
[bash gutter="false"] $ sudo reboot[/bash]
and
[bash gutter="false"] $ sudo halt[/bash]
and you can find initialization settings here:
-
/etc/init/*
/etc/init.d/*
SSH
Become very familiar with secure shell (SSH) and be aware that HPN-patched version of OpenSSH is installed (https://www.psc.edu/index.php/hpn-ssh). SSH has many features. Try to become familiar with the existence of most of the features. The tools ssh-keygen and ssh-copy-id are also quite important.
When using HPN-SSH, it is recommended that arcfour encryption be used and that TCP Receive Buffer Polling is set to enabled. To do this, use the following:
[bash gutter="false"] $ ssh -c arcfour -o TCPRcvBufPoll=yes username@remote_server[/bash]
When the connection is slow and/or the data being transferred is highly compressible, it is beneficial to enable compression with the -C flag:
[bash gutter="false"] $ ssh -c arcfour -o TCPRcvBufPoll=yes -C username@remote_server[/bash]
Byobu / Screen
When working remotely, you should use Byobu or GNU Screen to simplify things and also to prevent your session from dying if your network connection is interrupted. Byobu is a layer on top of GNU Screen, which makes Screen easier to use (http://byobu.co/).
Rsync
Become familiar with the power of rsync. Note that, if you ever need to use rsync to copy the contents of an HDFS DataNode’s block data, you need to copy everything including hard links, using rsync’s -H
or --hard-links
flags. Also be aware of the -e
flag, which allows you to use rsync over SSH:
[/bash]
Copying data from one drive to another, preserving hard links and extended attributes
[bash gutter="false"] $ sudo rsync -acvHAX old-drive/ new-drive/[/bash]
Network Configuration
Become familiar with the file /etc/network/interfaces and the utilities for managing network interfaces:
-
ifconfig
ifup
ifdown
ifquery
Become familiar with the configuration tools for network traffic filtering / routing:
-
iptables
iptables-save
iptables-restore
ip
route
An iptables tutorial, with specific info about Network Address Translation (NAT)
How to set up a gateway: Connection Sharing
DHCP
Setting up a DHCP server, using dnsmasq: http://blogging.dragon.org.uk/
Domain Name System (DNS) Nameserver configuration
Be familiar with DNS concepts and BIND9. Here’s a little tutorial.
NFS
Learn how to set up an NFS server and how to mount an NFS directory.
Network monitoring / network exploration
-
nethogs
netstat (Useful: sudo netstat -anlp | grep -w "192.168.0.1")
lsof -i
iftop
iptraf
nmap
A more comprehensive list: http://www.binarytides.com/linux-commands-monitor-network/
Also interesting: http://cacti.net/index.php and http://www.ntop.org/
Other Monitoring / info-gathering tools
-
iotop
iostat
lsof
memstat
free
Additionally, search the /usr/bin/ directory for all programs ending in 'stat'
[bash gutter="false"] $ ls /usr/bin/*stat[/bash]
and search the repositories for programs ending in 'stat'
[bash gutter="false"] $ apt-cache search --names-only '.*stat$'[/bash]
Exploring the hardware
-
lshw
dmidecode
lscpus
lsusb
lspci
sensors (from the lm-sensors package)
To install and use the sensors tool:
[bash gutter="false"] $ sudo apt-get install lm-sensors$ sudo sensors-detect
$ sudo service kmod start
$ sensors
[/bash]
Benchmarking
phoronix-test-suite
(comprehensive, but bulky and time-consuming)
sysbench
(multiple benchmarks)
mbw
(for testing memory bandwidth)
fio
(Flexible I/O tester)
ping
(network latency)
iperf
(for testing network throughput)
For iperf
, on the server you run:
[/bash]
and on the client, you run:
[bash gutter="false"] $ iperf -c address_of_server[/bash]
Stress-testing / Burn-in
-
stress
badblocks
for example, to do a burn-in on the hard drive, /dev/sdg,
[bash gutter="false"] $ sudo badblocks -b 4096 -sw /dev/sdg[/bash]
Then check:
[bash gutter="false"] $ sudo smartctl --xall /dev/sdg[/bash]
where smartctl comes from the S.M.A.R.T. tools (smartmontools)
[bash gutter="false"] $ sudo apt-get install smartmontools[/bash]
Automation / Orchestration with Ansible
Ansible allows you to efficiently manage large groups of machines. You can easily modify system settings, copy data among machines, and install, remove, update, and configure software easily.
I use Ansible’s ad hoc commands very frequently: http://docs.ansible.com/ansible/intro_adhoc.html
The real power comes in using playbooks: http://docs.ansible.com/ansible/playbooks.html
Compiling and Inspecting Libraries
To list functions / symbols in static libraries:
[bash gutter="false"] $ nm -g -C library_file.a[/bash]
To list symbols in shared libraries:
[bash gutter="false"] $ readelf -sW library_file.so[/bash]
To list symbols in objects:
[bash gutter="false"] $ objectdump -t library.o[/bash]
To create a shared library (.so shared object file), which has position-independent code:
[bash gutter="false"] $ gcc -shared -fPIC -o library_file.so obj1.o obj2.o[/bash]
To find what libraries a program uses:
[bash gutter="false"] $ ldd /path/to/binary[/bash]
Example:
[bash gutter="false"] $ ldd $(which rsync)[/bash]
shows the libraries used by rsync because $(which rsync) expands to "/usr/bin/rsync"
The ltrace
utility is also quite useful. It is similar to strace
, but it reports library calls instead of system calls.
HDFS
You can learn to manage HDFS by reading the huge amount of Hadoop documentation online as well as by reading through the HDFS source. The following commands were not very well-documented when I started using HDFS.
To check the entire filesystem:
[bash gutter="false"] $ hdfs fsck /[/bash]
To check the files in a specific sub-directory or a specific file and show info about the blocks:
[bash gutter="false"] $ hdfs fsck /path/to/data -files -blocks -locations[/bash]
Move corrupt files to lost+found:
[bash gutter="false"] $ hdfs fsck / -delete[/bash]
To re-set the number of replications to 3 manually:
[bash gutter="false"] $ hdfs dfs -setrep -w 3 -R /path/to/files[/bash]
Mesos
For Mesos configuration information, refer to the http://mesos.apache.org/ (web searches will also lead to nice documentation from the company, Mesosphere)
Deb Packages
Debian Packages (.deb files) are somewhat less interesting than they used to be, now that Snappy is available, with its associated snap packages, but here are a few basic things that may come in handy:
-
auto-apt
checkinstall
debc
dpkg-deb
dpkg-depcheck
debdiff
lintian (check that the deb conforms to standards)
debchange (modify changelog)
To extract the contents of a .deb file:
[bash gutter="false"] $ dpkg-deb -R filename[/bash]
After editing the contents, you can re-build the archive by doing:
[bash gutter="false"] $ dpkg-deb -b directory[/bash]
To identify dependencies, use
[bash gutter="false"] $ dpkg-depcheck -m command[/bash]
Kernel Modules
To list, load, unload kernel modules, and investigate modules:
-
lsmod
insmod
modprobe
rmmod
depmod
modinfo
The kmod program handles all of these tasks:
[bash gutter="false"] $ kmod --help[/bash]
There are also tutorials online for building a custom kernel from source and for writing your own kernel modules.
Useful Books
Introduction to the Command Line Second Edition
CompTIA Linux+ Complete Study Guide
Beej's Guide to Network Programming Using Internet Sockets
The Linux Kernel Module Programming Guide
Finally, a book about an extra-super-cool tool that might come in handy some day: