Tag Archives: hardware

Unreachable server

While I was away on vacation, the server hosting this blog become unresponsive, twice.

The first failure was a kernel oops shortly after mounting my backup disc

kernel BUG at mm/slab.c:3109!

invalid opcode: 0000 [#1] PREEMPT SMP

It was several days before somebody with a key could get to the computer and restart it.  A week later, shortly before I returned from vacation, the machine become unreachable again.  This time, when I came into the house, I could hear a continuous audible alarm from the UPS.  I reset that, and the machine came up and worked normally.

So, two apparently unrelated problems knocking the machine offline when I wasn’t around to handle it.  I checked my SMART logs for the backup disc, there’s no sign that it’s a hardware issue, and the UPS logs are entirely empty of anything incriminating.

If I figure out the causes of either of these failures, I’ll update this post.

Update #1: 2014-06-20

The UPS triggered its alarm again this morning, and the server lost power.  The UPS is an APC Back-UPS ES 750, and a continuous tone indicates that the battery is missing or has failed.  I’ve replaced the battery, and hope that the issue does not repeat.

Curious SMART failure

My main computer has 5 physical hard drives installed.  There’s my primary drive, and the live backup that is synchronized to it every night.  There are two large drives concatenated to make a backup volume for the MythTV box.  And there’s a separate drive that holds work-related code and files.

I have had many hard drive failures over the years, fortunately my backup routine has prevented me from losing anything important.

In light of the number of hard drives in the machine, I recently decided to start using SMART monitoring software on the main drive, its backup, and the work drive.  A daemon running on Linux periodically performs short and long tests of the hard drives.  Every Sunday, a report is generated and emailed to me to notify me of any developing issues with the hard drives.  The assumption is that, before outright failure, hard drives are likely to show degradation that manifests in these tests, allowing the user to prepare for the imminent loss of the drive.

In my latest Sunday report, all three drives reported normal conditions.  No signs of impending failure were noted.  Later that afternoon, the SMART daemon started issuing non-routine email messages indicating that it was unable to perform SMART queries on two drives, the work drive and my main backup drive.

It’s unreasonable to expect that two drives failed without warning, simultaneously, mere hours after getting a clean bill of health from SMART, so something unusual must have been happening.  Both the work drive and the backup drive were delivering SATA errors on any activity, they couldn’t be mounted, and files could not be read from them.  The emails indicated that the work drive failed first, and the primary backup drive about thirty minutes later.

My first thought was that the SATA bus was confused.  Maybe a BIOS error, a kernel bug, the infamous “cosmic ray”, or something else.  With the SATA bus unreliable, all activities, including SMART lookups, could fail.  So the first thing to try was a power cycling of the box.  After that, both drives were again mountable, and files could be read.

Less than an hour later, the SMART daemon started sending its messages again.  Work drive failure, followed half an hour later by backup drive failure.  The new theory was that the work drive was failing in a curious way.  When it received SMART queries, issued periodically by the daemon, the drive responded in such a way as to confuse the SATA bus or the kernel module responsible for handling it.

So, I bought a new hard drive to replace the failing work drive.  I modified my startup system to avoid activating the SMART daemon while I ran a new backup.  The work drive isn’t fully backed up every night, as many of the files are mirrored on other work computers, but just to be sure, I did a full backup of that drive.  This turned out to be straight-forward, without the SMART daemon issuing queries the drive and the SATA bus remained stable throughout the process.  Once the backup was completed, I powered down again and replaced the drive.  Set up the encryption on the new drive, formatted it, and recovered from the backup.

I’ve reactivated the SMART daemon, and the system has remained stable.  This leads me to believe that I had an unexpected SMART failure, where the only visible problem with the drive was that SMART queries messed up the state, leading to the inability to use not only that drive, but another unrelated drive in the computer.

Converting DVDs for viewing on a tablet, while inlining captions

Previously, I  described how to convert HDTV videos for my EEE Pad Transformer.  Now, I’ll go over something a bit more difficult.

My wife and I have some DVDs of Bollywood films that we enjoy watching.  Aaja Nachle, Om Shanti Om, 3 Idiots, Billu, among others.  These films are mostly in Hindi, but there are English subtitles available.  As we don’t understand Hindi, we watch the movies with the subtitles.  The Android media viewer that comes with the tablet doesn’t have a way to select subtitles from an alternate video stream.

Now, I wanted to make files of these movies that I could watch on the Android tablet.  As noted in the previous article, the resulting files have to be H.264 Baseline profile, and under 2GB in size.

Here’s how I did this.  Note that this procedure required no less than 70 GB of free disk space to hold a large intermediate file, as I wanted to avoid artefacts introduced by running through multiple codecs, so I used a lossless intermediate state.

First of all, I used the MythTV option to rip a perfect copy of the DVD.  That gave me a file, say 3IDIOTS.vob.

Next, I used mencoder to inline the captions directly into the video stream:

mencoder -ovc lavc -lavcopts vcodec=ljpeg:aspect=16/9 \
    -vobsubid 0 -oac lavc -lavcopts acodec=flac \
    -o 3idiots 3IDIOTS.vob

The output file, 3idiots, was, as noted, huge.  It consisted of a lossless jpeg video stream, with the subtitle 0 track overlaid on the video stream itself.

Next, the file had to be converted to H.264 Baseline.  In this case, I decided, rather than setting a qmax, that I would set a bitrate.  That way I could be certain ahead of time what the final size of the file would be, though at the cost of increased trancoding time.  To get a fixed bitrate, it is necessary to run ffmpeg in two passes, once to collect statistics, and the second time to generate the file itself.  Here’s how this is run:

ffmpeg -pass 1 -i 3idiots -vcodec libx264 -vpre fast \
    -vpre baseline -b 1400 -acodec libfaac -ab 64k \
    -ac 2 -ar 44100 -threads 3 \
    -deinterlace -y junkfile.mp4
ffmpeg -pass 2 -i 3idiots -vcodec libx264 -vpre fast \
    -vpre baseline -b 1400k -acodec libfaac -ab 64k \
    -ac 2 -ar 44100 -threads 3 \
    -deinterlace 3idiots.mp4 

The “junkfile.mp4” file can be deleted.  The H.264 file, 3idiots.mp4, came in at 1.8 GB, and was of quite acceptable quality to view on the tablet.

Converting HDTV videos for viewing on a tablet

I have an Android-based tablet computer, the EEE Pad Transformer.  My MythTV computer can record digital over-the-air broadcasts in high definition now that I have put an HDHomerun on my network.  So, it would be nice to be able to transfer some HDTV programs to the Android computer to watch them there while traveling.  The HDTV shows are 1080i, encoded as mpeg2 video, at a bitrate of close to 16000 kbits/sec.

So, what are our constraints?  The Android computer is not powerful enough to play videos without hardware assist, and that hardware assist is only available when viewing H.264 videos encoded with the baseline profile.  It doesn’t work on main profile H.264 videos.  Also, the Micro-SD card that I plug into the tablet must be formatted as VFAT, it isn’t recognized when I reformat it to any more modern Linux filesystems, so our files are going to have to be under 2GB in size.  Also, the Android screen is only 1280×800, so there’s no point copying a 2560×1080 file there, the machine will have to reduce the resolution, we might as well do it before we copy it to the card.

So, a 1 hour show, recorded on the MythTV box, is about 8 GB and in the wrong format.  We convert it in two steps.  First, cut out any commercials and transcode it at high quality.  For network broadcast television that chops off about 25% of the file size, and you probably didn’t want to watch the commercials while sitting on the train/airplane anyway.

Next, it has to be transcoded to H.264 Basline.  This can be done with ffmpeg:

ffmpeg -i PROGRAM.mpg -vcodec libx264 -vpre fast \
-vpre baseline -s hd720 -qmax 30 -acodec libfaac \
-ab 128k -ac 2 -threads 4 -ar 44100 -deinterlace \
PROGRAM.mp4

This takes the HDTV .mpg file from mythtv, “PROGRAM.mpg”, and converts it.  We use the libx264 video codec, fast settings, baseline profile, formatted for a high definition 720 line screen.  “qmax” sets a limit on quality loss, I usually use a value between 25 and 30.  We use the FAAC audio codec at 128kbits/sec, deinterlace the result, and write it to “PROGRAM.mp4”.

The resulting file, about 45 minutes of air time, is about 600 MB in size.

A Followup On Cryptographic Mounts, The Bad News

Previously, I discussed cryptographic mounts to hold sensitive data. It’s worth pointing out an article that is making the rounds today by 9 authors from Princeton, in which the researchers describe an attack on cryptographic techniques, including the one I’ve described.

The technique relies on the fact that modern memory can retain its information for several minutes after the computer stops sending it refresh signals. What this means is that a person with physical access to the computer can pull the power connector from the computer and then remove the memory chips, insert them in another computer, and read the cryptographic keys out of the memory. I don’t know of a good way to avoid this attack. If the cryptographic volumes are mounted when the computer falls into the hands of the attacker, the data will be, in theory, recoverable.

So, what can be done to prevent the key from being resident in the computer’s memory at the instant that the attacker unplugs it? The key has to be available to the operating system so that it can read and write that data in normal operation. Sure, you could get specially modified hardware that deliberately overwrites the main memory from batteries when the power connector is removed, but maybe there’s a way to store 128 bits somewhere other than in main memory?

A cache line on a modern CPU is 64 bytes, big enough to hold two 128-bit keys. Could the operating system subvert the hardware’s L1 caching mechanism sufficiently to pin a value in the cache and remove it from L2 and main memory? This attack won’t recover data from the L1 cache, so if that’s the only place the key is kept, maybe that would be enough. You sacrifice a cache line, but maybe it’s worth it?

How about the TLB? That’s another part of the CPU that holds data, and that one is explicitly designed to interact with the operating system. Could we find a way to store 128 bits in parts of the TLB, and then deliberately avoid overwriting them? Can the operating system read those numbers back out of the TLB?

Are there any registers that could be used? Probably not on 32-bits, there aren’t many registers there, and on 64-bits you’d probably have to use a special-purpose compiler to avoid these registers being touched by a context switch, and avoid them being saved to memory when an interrupt handler runs.

What if you have fifteen keys, all of 128 bits? Well, I believe we could handle that if we had 256 bits of volatile storage space. The first 128 bits of volatile space holds an XOR key, that decodes all of the fifteen keys. The second 128 bits of volatile space holds the decoded key in active use.

Those are my thoughts, anyway.