RightScale Blog

Cloud Management Blog
Cloud Management Blog

Recompiling Kernel Modules for EC2 Instances

Some time ago, I discovered that the version of the kernel that Amazon uses for its current infrastructure (linux 2.6.16) contains a bug in the lvm modules. This was a bummer to see since we are using LVM snapshotting facilities to realize sub-second database backups. The bug was only triggered under specific load conditions, but when we’re talking database backups nobody likes getting kernel panics at the most inappropriate times.The interesting thing is that this particular bug has been fixed for a while now in newer kernel versions but we (i.e., all EC2 users) cannot benefit from these kernel sub-release patches since we depend on the kernel version that Amazon installs in all instances.

After some research it became clear that in order to successfully use our fast snapshotting facilities in EC2 (or, for that matter, for anybody to use LVM-related tools on EC2), patching the lvm kernel modules became a requirement.

The first thing to do was to find out the exact version of the kernel that Amazon is installing, the sub-release version of the kernel that contains the required patch, and the version of Xen that Amazon uses to patch their kernel. At the time of writing, Amazon’s kernel is based on a vanilla 2.6.16 kernel, patched with an unknown version of Xen (at least I couldn’t really find what version it was or perhaps it’s customized by Amazon). It turns out that the fix for the LVM bug I was triggering was applied at 2.6.16.12. Therefore the task was to recompile the kernel modules for a Xen-patched kernel of version >= 2.6.16.12. There doesn’t seem to be much information at all out there on how to do it, so at first I feared this might be an ugly or esoteric process, but fortunately it turns out to be quite simple!

The next paragraphs describe the rationale and steps on how to recompile kernel modules that are ready to be used for EC2 instances.

Preparing the Sources and Compiler Setup

The first thing to know, is that kernel modules must be compiled with the same gcc version than the kernel they will run on. Since it is Amazon that originally compiled the kernel we need to determine the gcc version. Luckily, that is a simple task since this information is saved in the compiled modules. Therefore we can find out by issuing the following command on an unmodified running EC2 instance:

[root@ src]# modinfo dm_mod
filename:       /lib/modules/2.6.16-xenU/kernel/drivers/md/dm-mod.ko
license:        GPL
author:         Joe Thornber < dm-devel@redhat.com >
description:    device-mapper driver
depends:
vermagic:       2.6.16-xenU SMP 686 gcc-4.0
parm:           major:The major number of the device mapper (uint)

It turns out that the kernel (and modules) were compiled with gcc-4.0 for the 686 architecture. Now, we must bring up an instance that has that version of gcc installed. In my case, I think I booted Amazon’s developer image (ami-26b6534f) but you can pick any other that comes with gcc 4.0.

Once the instance with the right compiler is up, we need to copy the kernel sources and patches. The Amazon kernel sources (patched with Xen) can be found at http://s3.amazonaws.com/ec2-downloads/linux-2.6.16-ec2.tgz and patches for a given sub-release version can be found at http://www.kernel.org/pub/linux/kernel/v2.6/.

Our latest CentOS RightImages already provide an untarred copy of the Amazon kernel sources in /usr/src/linux-2.6.16-xenU, so there’s no need to download it. Therefore, the only thing I needed to download was the latest existing linux patch, which happened to be 2.6.16.53. Once we have these files on the ec2 instance, we are ready to configure and patch the kernel, and then recompile the modules.

Configuring, Patching, and Compiling the New Kernel and Modules

To configure the kernel, we can use the built-in config facility of the running Amazon kernel. For that, simply uncompress the original Amazon sources and construct the “.config” file from the instance’s /proc filesystem. For example:

cd /usr/src/linux-2.6.16-xenU/
gunzip < /proc/config.gz > .config

then, apply the latest kernel patch on top of that. Here, the tricky part is that we’ll be trying to apply a patch prepared for the vanilla kernel version, but on top of a Xen-modified version. Therefore, this will likely result in conflicts when applied as is. While the patch I applied didn’t result in any conflict I couldn’t easily resolve, this might not always be the case. If you know what you are doing and the extent of the code you want to fix (or upgrade), you should just patch the affected files (usually only modules) and forget about any core kernel fixes. Remember that any kernel upgrades/fixes outside a loadable module won’t be visible anyway, since Amazon will always replace the kernel of an instance before booting. For example, applying a complete patch to the amazon kernel will look something like:

bzip2 -d /tmp/patch-2.6.16.53.bz2
cd /usr/src/linux-2.6.16-xenU/
patch  -p1 < /tmp/patch-2.6.16.53
find . -name '*.rej'
./arch/x86_64/ia32/Makefile.rej
./arch/i386/kernel/vm86.c.rej
./net/core/skbuff.c.rej
./Makefile.rej

Once you’ve resolved the conflicts we’re ready to compile and install:

make 
make modules_install

If something broke, go back to the conflicts and fix whatever is broken. Once it all compiles, you should have the brand new modules installed in the “/lib/modules/2.6.16-xenU” directory! At that point you can take them for a spin and see if they can be loaded correctly. In the case of lvm, we can try to unload the existing ‘dm’ modules first (if any was loaded) and then load our new ones. If they load correctly we’ll have brand new, bug-fixed kernel modules at work for us.

Packaging the New Modules to Use for Any Future EC2 Instance

The next step is to get these newly compiled modules and package them properly so we can use them in any of the ec2 instances that we wish. In my case, I used our RightSript infrastructure which allowed me to upgrade any of the templates that use LVM tools within minutes. All I had to do is to package the kernel modules in a .tgz file and attach it to a new boot RightScript. This boot RightScript installs (i.e., replaces) the modules upon boot, removes any pre-loaded ‘dm’ modules, and loads up the newly installed ones. Here is the complete script:

#!/bin/bash -e
# Copyright (c) 2007 by RightScale Inc., all rights reserved

# First upgrade the kernel modules with some lvm fixes
# Try to unload the md modules if any is loaded (hopefully none will be in use)
echo "Unloading DM modules:"
for m in `cat /proc/modules | grep ^dm_| cut -d' ' -f1`; do echo $m; modprobe -q -r $m; done

echo "Installing new/custom kernel modules..."
(cd /lib/modules/ && tar xzf $ATTACH_DIR/modules-2.6.16.53-xenU.tgz )
echo "Loading the device mapper driver..."
modprobe dm_mod

If you are not familiar with our scripts, any attachment uploaded to the web site will automatically be sent to the booting instance and the ATTACH_DIR environment variable is automatically set to reflect that temporary directory such that the RightScrips can locate it. In this case, only a single tgz file containing the modules was attached. Now that I have this RightScript (I called it “upgrade LVM kernel modules”) I can seamlessly patch all my server templates by adding it to the list of boot scripts. Voila! Without any other changes, I ensured that the next time any of these templates are instantiated they will use the latest kernel modules and all the nice enhancements and bug fixes that come with them. My database backups are a lot happier now without kernel panics!


Archived Comments

keving

Firstly, thanks for collecting all this and writing it all down. However, I can’t find gcc 4.0 on ami-9a9e7bf3: [root@domU-12-31-36-00-29-81:] gcc –version gcc (GCC) 4.1.1 20070105 (Red Hat 4.1.1-52) Copyright© 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. [root@domU-12-31-36-00-29-81:] gcc<tab> gcc gcc34 gccmakedep [root@domU-12-31-36-00-29-81:~] gcc Also shouldn’t ‘gunzip < /proc/config.gz .config’ be ‘gunzip < /proc/config.gz > .config’ ?? thanks for your help -k
blanquer
Kevin, I’ve corrected the entry. You’re right, the image I had listed comes with 4.1 and not 4.0. I guess my memory was wrong and I booted another of out RightImages. But anyway, I’ve updated the entry to point to the basic Amazon developer image which does come with 4.0. Also, the redirection for the .config file was a formatting typo. I corrected it. Thanks for the feedback! Josep
sevmax
I have a trouble with compilation new kernel. when I do “make” it wrote me that error:
< CC arch/i386/kernel/cpu/amd.o arch/i386/kernel/cpu/amd.c: In function ‘init_amd’: arch/i386/kernel/cpu/amd.c:211: error: ‘X86FEATUREFXSAVE_LEAK’ undeclared (first use in this function) arch/i386/kernel/cpu/amd.c:211: error: (Each undeclared identifier is reported only once arch/i386/kernel/cpu/amd.c:211: error: for each function it appears in.) make2: [arch/i386/kernel/cpu/amd.o] Error 1 make1: [arch/i386/kernel/cpu] Error 2 make: * [arch/i386/kernel] Error 2 > How can I correct this error? Thanx.
blanquer
Sevmax, If you want to recompile the whole patched kernel you’ll need to resolve the conflicts manually. In this particular case, ‘X86FEATUREFXSAVELEAK’ is a new processor feature that doesn’t seem to be compatible with the version of Xen (i.e., this feature doesn’t exist in the Xen patch that is applied to the kernel). To fix it, just uncomment or remove the lines that set the bit. In my patched version these are lines 210-211 of ‘arch/i386/kernel/cpu/amd.c’. They look like:
       if (c->x86 >= 6)

                set_bit(X86_FEATURE_FXSAVE_LEAK, c->x86_capability);
Good luck, Josep M.
sevmax
blanquer, thanx for answer. But when I delete this lines, I had new errors: arch/i386/kernel/vm86.c: In function ‘dosysvm86’: arch/i386/kernel/vm86.c:318: error: ‘eax’ undeclared (first use in this function) arch/i386/kernel/vm86.c:318: error: (Each undeclared identifier is reported only once arch/i386/kernel/vm86.c:318: error: for each function it appears in.) arch/i386/kernel/vm86.c:318: error: invalid lvalue in asm output 0 make1: [arch/i386/kernel/vm86.o] Error 1 make: [arch/i386/kernel] Error 2 When I have looked a file vm86.c, I have seen that function dosysvm86() repeats some times. i.e. this function cannot be removed. Thanks for attention. sevmax I can’t compile kernel on my system with this manual. On your CentOS 5 ans Amazon Developer AMI’s i have a trouble with arch/i386/kernel/vm86.c. If you have success with compile kernel, please, help me compile kernel.
Thanks for your help sevmax
sevmax
Without kernel patching I have install modules. and necessary modules are load. But after reboot modules was not load. Maybe I can execute script as “upgrade LVM kernel modules” script to correct load nbd ? =)
blanquer
OK, I thought that the conflict resolution of the patch was almost trivial, but since there are questions about it I’ll detail the changes necessary to do it, so it is on the record. Here’s the list of changes to resolve the conflicts and have a successful compilation (This based only on the versions I mention on the blog).: 1) For “./arch/x86_64/ia32/Makefile” : add (-Wa,-32) to the FLAGS in lines 31 and 32 2) For “./arch/i386/kernel/vm86.c” : add “long eax;” line after the “#endif” in line 261 3) ./net/core/skbuff.c is a conflict but it doesn’t need to be modified….(the Xen patch already fixed/changed it) 4) “./Makefile” (I believe this was only the version name or something trivial like that) 5) “./arch/i386/kernel/cpu/amd.c” There is a new processor feature that is not compatible with the code patched by Xen. Remove/comment it. Comment lines 210 and 211: if (c->x86 >= 6) setbit(X86FEATUREFXSAVELEAK, c->x86_capability); That’s all it really takes (or took) for the 2.6.16.53 patch. A couple more things sevmax: 1- The new modules will not be loaded upon booting a new instance until you copy them and load them yourself (which might require unloading the old ones first). 2- If you haven’t been able to resolve these small compilation problems by yourself I would strongly suggest reconsidering if you really require to recompile kernel modules yourself…although it looks like any other “config->make->install” type of operation like any other application, it is actually a little more serious than. If this is just a learning process for you, then, by all means hack away and experiment! Let us know if you’ve successfully completed the process. Good luck! Josep M.
sevmax
Hello Josep M. aka blanquer! Thank you very much for help. I had compiled and installed kernel modules on Fedora Core 6 But now I have another “small” trouble: ”# modprobe nbd FATAL: Error inserting nbd (/lib/modules/2.6.16-xenU/kernel/drivers/block/nbd.ko): Invalid module format” In instance log: “nbd: version magic ‘2.6.16-xenU SMP 686 gcc-4.1’ should be ‘2.6.16-xenU SMP 686 gcc-4.0’” So, I must replace gcc 4.1 to gcc 4.0. I’m can’t find how to replace gcc via yum. I must remove gcc 4.1 via yum and install gcc 4.0 from sources? Thanks!
joka
Are you still able to use xfs or reiserfs when using the patched modules? I get the following errors: reiserfs: disagrees about version of symbol isbadinode reiserfs: Unknown symbol isbadinode reiserfs: disagrees about version of symbol makebadinode reiserfs: Unknown symbol makebadinode or exportfs: disagrees about version of symbol isbadinode exportfs: Unknown symbol isbadinode xfs: Unknown symbol findexporteddentry xfs: disagrees about version of symbol isbadinode xfs: Unknown symbol isbadinode xfs: disagrees about version of symbol makebadinode xfs: Unknown symbol makebadinode I’m hoping I just messed something up along the way.

 

Post a comment