Monthly Archives: February 2008

Breaking packages

2008-02-17Uncategorizedbreaking packages historyChristopher

I’ve used the term “breaking packages” a few times. As I said, I maintain my Linux boxes without a package manager. So, how did I get these Linux boxes?

My main Linux computer has over half a million files in its filesystem, and over 3000 separate executables. Where did they all come from? You need some way to start out, your computer isn’t going to do much without a kernel, a shell, and a compiler.

In 1994, I installed Slackware on a 486-based computer. This computer had about 180 MB of hard drive space (nowadays that wouldn’t even hold half of the kernel source tree) and 16 MB of RAM. At that time, Slackware didn’t really have a package manager. It had packages, just compressed tar files of compiled binaries, grouped by function. If you weren’t interested in networking, you didn’t download the networking file. If you weren’t interested in LaTeX, you didn’t download that file. There were only a few dozen “packages”, because of this very coarse granularity. The functions like “upgrade”, “install”, “find package owning file” weren’t present. An upgrade was the same as an install, just install the new package into the filesystem, and it would probably replace the old package. To find out which package provided a certain file, you could look in per-package lists of files.

So, I never really had a package manager on that system. When I needed new programs, I downloaded the source code, compiled it, and installed it. When I moved to a new system, I brought backup images or a live hard drive to the new computer. I didn’t start with a blank hard drive, I started with the hard drive from the old computer I was replacing. Over the years, I have replaced every executable that was installed in 1994 (I know this is the case because all of the files installed then were a.out format, and I have only ELF binaries on my computer now).

Sometimes, though, I’ve started with a computer that had a distribution installed on it. At a previous job, my laptop came with Mandrake Linux installed on it. I tried to keep the distribution alive for a while, but eventually got impatient with the package management system and broke the packages.

So, if you give me a new Linux computer and tell me it’s mine to modify, a good first step for me is to kill the package manager. On an RPM-based system, that’s generally achieved by recursively deleting the directory /var/lib/rpm. After that, the rpm command will stop working, and I have the finer control and more difficult task of managing the box myself.

What do we have running on that box?

2008-02-17Uncategorizedaudit login failure one size fits all securityChristopher

As I mentioned in my first post, when you install a distribution you sometimes have programs running without your knowledge. Because some users may need these features, you often get them as well.

Last week at the office, somebody came up to me to ask if I could figure out why it was no longer possible to log into one of the servers. This server has a history of flakiness, there’s probably a bad memory module on the board, and sometimes it becomes unresponsive. So, my co-worker, upon realizing that he couldn’t log in, had rebooted the computer. However, even after the reboot, he still couldn’t log on, either as his regular user through SSH, or as root on the console.

The first step, before getting out of my chair, was to telnet to port 22 on the box. I got a “connected” message, and a text string indicating that I was attached to an SSH daemon. This told me that the kernel was alive, it was accepting new connections and passing them to the appropriate processes, which were themselves able to make forward progress. So, the box wasn’t wedged. I went to the console, and tried to log in through the getty running on the text login screen. I entered ‘root’ at the username, and got a password prompt. When I entered the password and pressed ENTER, the getty process froze, and did not present me with a shell.

So, we have two very different authentication schemes that are failing to allow logins. The console doesn’t allow root logins. Something seemed to be interfering with the general activity of authentication. The first thought is that this might be a PAM problem, but it would be a strange one. We didn’t get authentication failure messages, we got a hang after authentication. Root’s credentials were stored on the local drive, so it wasn’t an LDAP issue, and in any case, the machine was on the network, and there weren’t LDAP problems anywhere else in the office.

When multiple independent programs fail together, the next thought is that there’s probably a full disk somewhere. If you fill up /tmp your system can start to behave very strangely. The login problems were a symptom of something, as yet unknown. So, the next thing to do is to check the hard drive to see if we had any full partitions. Because I didn’t know what else might be misbehaving, I wanted to avoid all of the startup jobs, so I rebooted the machine with an appended kernel parameter, “init=/bin/bash“. Instead of running the usual /sbin/init, and all of the various scripts under /etc/init.d, the computer would start up the kernel and then drop immediately to a root shell. No logins, no passwords, no startup scripts. I could then run ‘df‘ at the prompt, and confirm that there were no partitions within 5% of full (remember that a default ext2 format will reserve 5% of the blocks for root, so a disk that’s 96% full could actually be entirely full for some users). Checking with ‘df -i‘ showed that we had not run out of inodes either.

So, what’s next? I decided to reboot the machine into single-user mode so that I could easily modify files on the disk but still get onto the computer without a password. This is done by appending the parameter “S” on the kernel boot line. Again, I get a shell, but this time the disks are read-write mounted, and various services have started up. So, I modify the inittab. I replaced the getty on tty1 with /bin/bash. That means that when the computer is rebooted into multi-user mode, tty1 has a root shell while the other ttys are still running their gettys.

Reboot into the usual multi-user mode. I have a root shell on tty1. I run “ps ax“, and find the PID of the getty on tty2. Then, I run the command

strace -f -p

at the shell prompt of tty1. Changing virtual consoles to tty2 with the usual command, CTRL-ALT-F2, I am presented with a login prompt. I enter the username ‘root’, and enter the password. The program hangs. So, I change back to tty1 to see what strace has to say about the program. The last things the program did are on the screen. It opened a device called /dev/audit, did some things with it, then issued an ioctl() on the file descriptor. That ioctl was not returning to the caller, so the program was blocking waiting for a response from something associated with /dev/audit.

None of us had heard of /dev/audit, so it was time to do a bit of research. It turned out to be a package that was included in the RHEL distribution installed on that computer. There is communication between the device and a daemon. That daemon keeps logs, so I went to its logging directory to see what was there. I found 4 GB of data there. Apparently that had reached some sort of internal limit, and the daemon responded by forbidding further auditable actions until some of the logs were removed by the administrator. Logins, being auditable actions, were blocked.

So, delete all of the logs in the directory, and reboot the computer. Everything returned to normal.

Now, a logging function like this is very useful for some users. There are some people who must know exactly who logged into the machine, what database entries they accessed or modified, and so on. We are not such people. A service we never knew about, enabled for all because it is useful by some, wound up locking us out of our own machine.

It’s a security feature that logins are forbidden until the logs have been inspected and removed. If you’re going to design a function like this, then this is the correct way to go about it. Of course, it was very easy for me to overcome this security feature with access to the console, but that’s generally true. I probably would have set it up so that gettys are permitted to log in as root even when an audit failure occurs, but that level of flexibility may not be available, if the behaviour is driven by a special PAM module or a patched glibc.

Breaking packages on the MythTV box

2008-02-17UncategorizedmythtvChristopher

As I mentioned earlier, I have a MythTV computer, installed from packages, but I’ve broken some of the packages. Here are some of the issues that I had with the packages, and how I solved them.

Two of the package manager drawbacks I’ve mentioned previously appear here: the one-size-fits-all approach to software packaging, and the failure to receive timely updates.

The MythTV box is on old hardware. Because it has hardware assistance for both MPEG encoding and decoding, I didn’t need a new computer with a fast CPU. The fact that this is old hardware, with a 7-year old BIOS, may be why I had problems, but I found it easier to break the packages than to try to solve the problems under the constraints of the package system.

First, the MythTV box controls an infra-red LED attached to its serial port, allowing it to change the channels on a digital cable box. This requires the use of the LIRC package, and the lirc_serial kernel module. Well, at the time I set this up, the lirc_serial module was having problems with the SMP kernel. The system would generate an oops quite regularly when it wanted to change channels. Looking at the oops logs, I could see that there were problems specifically with SMP. My MythTV box has only one CPU, so I didn’t need an SMP kernel, but because some users will have SMP computers, the KnoppMyth distribution ships with an SMP kernel. I tried to find a non-SMP kernel for the system, without success. So, the easiest way to fix the problem was just to download a recent kernel source tree from kernel.org, copy the configuration file from the Knoppix kernel, and reconfigure it as non-SMP. The spontaneous reboots stopped occurring. The package manager still believes that it knows what kernel is running on the computer, but that isn’t what is really installed.

When I installed the MythTV box, the software was still a bit immature, and a stability fix in the form of version 0.20 came out several months later. I waited a few weeks with no update to the distribution, and no word of when an update might become available. Eventually, I grew impatient and downloaded the source code of 0.20 myself, recompiled it on the MythTV box, and installed it over top of the existing programs.

There was one other impact of the one-size-fits-all approach that caused difficulties with the MythTV box. I was regularly recording a television show between 6:00AM and 6:30AM. A few minutes before the end of the show, the recording would have problems. The audio would break up, and the video would jump. It appeared that the program was losing frames of data, either because it was losing interrupts, or because it couldn’t get the data to the disk quickly enough. Because it happened at about the same time every day, I expected it was probably a cron job. I got a root shell on the box, and asked for the list of all root-owned cron jobs with the command “crontab -l”. This reported that there were no root-owned cron jobs. I mistrusted this result, and did more investigation. As I mentioned in the first post, distribution packagers often break up a configuration file into a set of separate files. They did that with cron jobs, which means that the command-line tool that ought to tell you all about root-owned cron jobs didn’t report the full set of such processes. A bit of digging around in /etc showed that the slocate database update was being run at that time. This process scans the entire disk, making a list of the files on it. While probably useful in a general context, this is an unnecessary operation on an appliance box that isn’t changing, particularly when it results in so much bus traffic that the primary function of the box is degraded. My solution was to change the /etc/crontab file (which is, itself, not viewed by “crontab -l”) so that a cron job would be skipped if there were any users (reported by the ‘fuser’ command) of either of the two video input devices, /dev/video0 and /dev/video1.

My hardware environment

2008-02-17UncategorizedhardwareChristopher

I have two computers that I use for my work. One is an x86 laptop, a ThinkPad T42 with a built-in ATI video controller (Mobility Radeon 9600). The other is a quad-core x86_64 box with an NVidia card (GeForce 6600). My work involves a lot of scientific computation, sometimes multi-threaded, and I need hardware-accelerated 3D rendering to analyze the results. So, I’m running on two architectures, with two different video cards.

The laptop is fairly standard, so I won’t discuss it further. My big box has the following hardware:

Intel DP35DP motherboard
Intel Core2 Quad CPU, Q6600, 2.4GHz per core
4 GB RAM
Two 160 GB SATA disks
One 500 GB SATA disk
Two 120 GB EIDE disks

I’ll discuss later why I have so many hard drives.

Because I sit next to this box all day, I’ve put a lot of effort into making it quiet. My laptop makes more noise than the big box.

Why not use a distribution and a package manager?

2008-02-17Uncategorizedintroduction rationaleChristopher

I have a few Linux computers, but they do not use a package manager. They’re not “redhat” computers, or “debian”, or “ubuntu”. Once, 13 years ago, they were slackware. Briefly. I administer these boxes manually, for lack of a better adjective.

Maintaining a Linux computer manually is a fair amount of work. Installing new software is not always trivial, and sometimes things break in subtle ways that may take some effort to debug. I plan to start recording my adventures here, in part so that I can come back and see what I did the next time I upgrade something and it misbehaves in a familiar manner. Because I do things manually, I tend to run into problems that the majority of Linux users don’t experience. I often have to look on the web for answers to questions, so I hope my experiences can help out other people who, for whatever reason, come across one of these unusual problems.

What do I have against distributions and package managers? Nothing, really. They are very useful. I do have one computer that was installed from packages, a MythTV computer that I installed from a KnoppMyth CD. This is a good example of a place where package managers are useful. The computer is an appliance that I set up once, and then don’t ever modify. It’s not exposed to the Internet, and it isn’t going to change much. I don’t need to install new software on it, because it’s a dedicated single-purpose machine that already does what I want it to do. And yet, I’ve “broken the packages” on the box. There are files ostensibly under control of the knoppix package manager that I have replaced with recompiled binaries, and which I am maintaining myself now. I’ll talk about that in a later post.

Here are some of the things that I think are good and useful about distributions and package managers (note that there are some exceptions to these rules, but most package managers supply at least some of these benefits):

They supply the entire filesystem in compiled form, allowing a new computer to be set up and running in under an hour with reasonable defaults, usually after asking just a handful of questions.
They usually are associated with a good setup tool that can configure the software correctly for the hardware attached to your computer.
They have a good, general-purpose kernel with modules ready to handle many situations.
They keep track of dependencies to help to ensure that interdependent packages are correctly installed, so that the user doesn’t end up with an installed package that fails to work correctly.
They provide a single location for access to updates and security fixes. A user can simply ask the package manager to do an “update to latest packages”, and expect that they have all of the updates provided by the distribution.
If you have a dozen new computers to set up, possibly even on different architectures, it’s not a very big job with the correct installation media available.
Probably most importantly, distributions and package managers provide an easy way for people to administer their Linux computer without having to become Linux experts. The computer is a tool used to perform other activities, and a distribution lets the person work with the tool, instead of spending a lot of time maintaining the tool.

So, why don’t I use package managers? There are a few drawbacks to the use of package managers, and for me, they outweigh the benefits. Other people will have different priorities. I would never suggest to a newcomer to Linux that they should be going distribution-free. A person who maintains a large collection of computers on dissimilar hardware might also be poorly served by breaking the distributions (though I have actually done exactly that).

What don’t I like about package managers and distributions? Well, here’s a collection of drawbacks:

It isn’t always clear what your computer is doing. There may be packages or services installed that you don’t want, doing things you don’t understand. Somewhere in the 200 packages that were installed when you set up the computer, you may have wound up with, say, an FTP daemon you didn’t ask to have. When you’re installing software manually, you’re more likely to install only the things you really need.
Distributions tend to ship with older code. Distributors have to freeze their versions and do extensive testing, and by the time the packages are shipped there may have been improvements, bugfixes, or security fixes that didn’t make it into the base media.
Bugfixes and security fixes can be delayed as you wait for the distributor to build updated packages. While most Linux distributors get security fixes out within a small number of days, there is still some delay between the time a fix is produced and the time that updated packages are available.
Distributions are set up to be good for the general case, but there will be times when they do the wrong thing for a particular special use.
Package installers are generally forbidden from interacting with the user, otherwise a new install would be a tedious exercise in configuring every package as it came along. Consequently, packages are usually dropped in with some default configuration.
Many programs come with multiple compile-time configuration options. A media player may have support for multiple codecs, output devices, companion devices, and so on. A distribution will usually turn on as many of these options as possible. Some of these options might not be of interest to a specific user, but that user is still forced to install other packages holding libraries he or she doesn’t expect to use. These dependent libraries increase the interconnectedness of the packages, which can make what would be a simple upgrade of one package into a huge transaction that touches a dozen other packages and the kernel.
Because it’s easier for a particular file to be owned by a specific package, even when that file controls the behaviour of multiple packages, distributions tend, when possible, to break up the file into fragments that are logically collated in some other place. This can make it hard to figure out exactly what a specific application is doing.
Distributions and package manages don’t insulate the user in all cases. Some users with unusual requirements may still end up having to install software by hand, and figure out how to tie the new software into the system correctly, and sometimes the package management system makes such efforts more difficult.
Most importantly, for me, a package manager hides too much of what is happening. You don’t have to learn how to configure a program, you don’t know what files it’s installing, it’s a bit too much of a black box for my tastes.

Given all this, I’ve decided that I prefer not to use package manager. Consequently, I’ve been manually modifying my Linux computers for over 13 years now.

There have been visitors to this page

Wordpress on cneufeld.ca

Moved here from Taiwan Yahoo blogs

Monthly Archives: February 2008

Breaking packages

Breaking packages on the MythTV box

My hardware environment

Why not use a distribution and a package manager?