Kayako V3 to V4 Importing

Kayako has a new version of their ticket system available. We have a customer that uses this system extensively. The database format for the new version of Kayako isn’t compatible with the older database schema, so you have to run an import script that processes the old database information and imports it into the new database for V4.

The problems begin when you start to deal with large installations of Kayako. This one customer has over 4.2 million rows of information in their current Kayako installation (this is based on what the import script is reporting). Kayako’s importing script is said to have some issues with memory leaks and memory utilization with large imports and they have included a command line option to help get around this issue.

./index.php /Base/Import/Version3/<limit>
is the number of data loops to be performed for each run. Overall, not a bad approach to work around some limitations. I personally would have gone a different route and spent some serious amounts of time on memory optimization and clean up in the import scripting, but that is just me.

My problem with this process is how the import script actually works. Each time you run the import command the system prompts you for the database information of the previous installation.

====================
Version3 Import
====================
Database Host: localhost
Database Name: database
Database Port (enter for default port):
Database Socket (enter for default socket):
Database Username: username
Database Password: password

To my knowledge, there is not any way currently to pass this information via the command line or settings file. It must be manually entered each and every time you run the import script. You can imagine that for 4.2 million records, this would get rather tiresome after the second or third time.

So… I wrote a script that uses expect to get around it:

#!/usr/bin/expect

set dbhost "localhost"
set dbuser "user"
set dbpass "password"
set dbname "database"

spawn /path-to-the-new-kayako-install/console/index.php /Base/Import/Version3/50

expect "Database Host:"
send "$dbhost\r"

expect "Database Name:"
send "$dbname\r"

expect "Database Port (enter for default port):"
send "\r"

expect "Database Socket (enter for default socket):"
send "\r"

expect "Database Username:"
send "$dbuser\r"

expect "Database Password:"
send "$dbpass\r"

interact

This will pass the values to the executed script filling in the values as required, and then allow the script to execute until it is complete. The interact line was the part I needed to find with some trial and error.

So there you go… use that to work around the dumb requirements of having to type that information in every time.

Anti Anti Debugging Tricks

Housekeeping: Well… It’s been a while since I have posted. I had been meaning to post some things but Google turned off their FTP service in Blogger making their service really useless to me and so this blog languished while I finally got around to sorting things out and moving it to a proper blog software. I hope to be providing more updates now, but I won’t fool myself too much.

On with the show….

I recently came across a binary application on a Linux server that had been coded to prevent people from snooping into what the application actually did. If you ran the application using strace or gdb the application would detect it and stop running. It would throw an error similar to:

“Debugging detected… goodbye!”

and the application would simply exit. Now I understand a programmer’s desire to protect their code, however if you are going to be running an application on my server I should at least know what it is you are actually doing. Apparently this is known as “anti debugging” and is designed to prevent reverse engineering of an application. Not being one to turn down a challenge…. I accepted. Below I will outline some very simple processes that can be used to circumvent some of the more basic checks.

There are apparently a few different methodologies involved in anti debugging, and it seems that our application used a couple of checks. I will show you some of what the application was doing and the very simple workarounds to get around them.

When you attach a debugging tool such as strace or gdb to the command it would throw off the process id of the parent application. Apparently the application was checking that by using a ptrace call to attach to the parent process id (getppid). Here is how I worked around this issue.

We override the getppid system call with our own. Create a very simple C application that will define our new getppid() calls.

#include <stdio.h>

int getppid() { return getsid( getpid() ); }

What this does is return the session id for current process’s id anytime getppid() is called. So now we need to compile this into a shared library that we can load before we execute the application we want to debug.

gcc -shared -fPIC -o fakegetppid fakegetpppid.c

This compiles the C application into a shared library that we can then load. Here is how we tell something like strace to use this instead of the normal getppid system call.

strace -fxi -E LD_PRELOAD=./fakegetppid /path/to/the/main/application

If there are no other anti-debugging tricks in place, the application will execute as normal.

What if you have an anti-debugging trick that isn’t so simple to thwart? What if the application makes multiple calls to a system call that you need to catch a specific instance of? Well there is a way to accomplish that as well.

#define _GNU_SOURCE 1

#include <stdio.h>
#include <dlfcn.h>

static int(*next_ptrace)(int a, int b, int c, int d ) = NULL;

long ptrace( int a, int b, int c, int d )
{
    if( next_ptrace == NULL ) {
        next_ptrace = dlsym( RTLD_NEXT, "ptrace" );
    }

    if( a == 16 ) { /* PTRACE_ATTACH */
        fprintf( stderr, "PTRACE_ATTACH called with pid %i\n", b);
    }

    return next_ptrace( a, b, c, d );

}

This will allow you to specifically capture PTRACE_ATTACH calls and do whatever you want with the calls, sending all other calls to the original system ptrace call. That’s pretty darn powerful and works exactly the same way as the code above.

I hope this information is useful to some people out there. I know it is a little complex, but it just goes to show that it is possible to debug an application, even if it doesn’t appear to be at first.

Good luck and happy hunting.

How To Recover Data From a Badly Corrupted Drive

Recently I had a customer’s hard drive on a dedicated server corrupt so badly that we couldn’t access the information on it at all. Any attempt to fsck the drive or even just mount it produced weird errors and strange notifications (like trying to mount it would say “not found” and other really vague answers).

We basically determined that the inode that held the partition tables as well as the inode that held root directory (/) were corrupted. We were able to run a fsck on it initially, and were prompted to repair the entries as well as multiply linked inodes and other errors. After letting fsck finish, the drive was completely unreadable. No partition tables, nothing… We told the customer that the drive corrupted and that they would have to restore their site from backups. Well guess what they said? “What backups?” Ugh… So…

Moving on… we installed a new drive, moved the old drive as the slave to attempt to recover the data for them. We installed the OS and setup the control panel, and put a page “Site crashed… we are working on restoring it.” and moved forward.

Here is what we did and the results we had and some notes about the process along the way.

1) Do not panic! The first thing you have to remember in dealing with a corrupted drive is don’t panic! The information is still there… and with enough effort can be recovered provided that the drive still functions at some level.

2) Make a backup image. This will help you out a great deal when things go wrong. I recommend using dd to just mirror the data to a new drive or some place with enough storage to hold the drive data.

3) Take your time. This one is hard to follow because normally the site is down and the customer is absolutely panic striken and calling you every 15 minutes asking about the status. Working with large sets of data takes time, so be patient and wait for the various processes to complete… They always take a while, and cutting corners here to save time will only lead to misery if you screw things up.

Okay… Now that we have the ground rules laid out, here is how we restored all their data save one database table, and even then we managed to save that and I will show you how we did that as well.

Step 1: Take the drive to another machine if you can, or a safe place to work on it. We had another machine with a large enough hard drive to hold the data on the drive and made a disk image of it.

dd if=/dev/sdb bs=1k conv=sync,noerror | gzip -c > /path/to/disk.img.gz

This will save your a potential future headache, but it takes a long time for a fair amount of data. Wait it out… it is worth it in the long run.

Step 2: Determine how badly the drive is corrupted. In our case an “fdisk -l /dev/sdb” didn’t show any partition tables, so we had to recover that first before we could start getting at the data.

The application we used to recover the partition table is TestDisk. It is written by a gentleman named Christopher Grenier, and to say that it is awesome is a complete understatement. You will be using this software for all steps that follow, that is how useful it is.

Okay… so download TestDisk from the link above and extract it. Make sure to grab the version for the OS you are using. We used the linux 2.6 kernel version since the drive is attached to a 2.6 kernel machine that we are using for recovery purposes.

Step 3: Recover the partition table. When you first run TestDisk it provides you with a list of drives available. Select the drive (in our case it was /dev/sdb), and the partition table type (we used Intel/PC since that is what we have). Select “Analyse” and the software will inspect the drive to look for partitions. The software will do a quick search first and scan the drive very quickly looking for partitions. In our case it found the boot partition almost immediately, however the other partition was not found on the first pass, so we needed to do a “Deeper Search”. The Deep Search found the other partition.

Step 4: Once you have the partitions listed, you will want to write them to the drive if possible. This will help you in the next step. After we wrote the partition table we exited TestDisk and did step 5. If you just have a Linux based partiton instead of an LVM partition, then you can just continue to step 6.

This is where things get a little tricky. In our case the partition was actually an LVM physical volume partition, meaning it holds additional information for LVM volume groups and logical volumes inside it. TestDisk won’t allow you to read files directly from the LVM partition, which makes sense since there aren’t technically any files inside that. What you need to do is get your OS to import and activate the volume groups and logical volumes so that TestDisk can see them and use them.

Step 5: Import and activate your LVM volume groups and logical volumes. First you need to find your LVM settings. So run:

lvm vgdisplay

This will show you the volume groups for the drive. Then you run:

lvchange  -ay

This will activate the volume groups and logical volumes so that the OS can use them.

Step 6: Start TestDisk again, and this time you should see the logical volume or partition with your files on it. Select it and this time (if you are using LVM) you should select “None” as the partition type, since LVM doesn’t have partition information. Select Analyse and you will see the partitions hopefully.

Step 7: Move the partition that contains your files, and press the letter p. This will give you a directory listing of the files as they are found in the partition/logical volume. From here it is just a matter of navigating to the correct location and finding the files you need to recover and pressing “c” to copy them off the drive onto your working drive.

The customer’s drive was a mess. Basically all the directories and files were linked under “lost+found” with names like #123456789. We had to hunt and peck through the drive structure to find the information that we needed. We eventually found the mysql directory and the web space directory and copied those to the working drive (/dev/sda) on our recovery machine.

I have to say that we were pretty pleased with the results. Out of all the files we recovered (and there thousands of images and other files) we only had one file that was damaged. Unfortunetly it was a MySQL FRM file which contains the schema information for the associated table. Without this information MySQL can’t read the data from the MYD file.

Recovering from a damaged FRM file

The FRM file doesn’t often get modified, so the likelyhood of it being damaged is small, but in our case the damaged drive did corrupt this file. Here is how we recovered the data.

If you have a copy of the original schema, and I mean an exact copy with correct field lengths/types, etc. then you are in luck and the restore process is very simple.

Step 1: Make backups of your files. Copy the MYD file out of the databases folder (/var/lib/mysql/databasename) to someplace safe. You are goint to need this later.

Step 2: Delete the other files that make up the table files (remove the corrupted FRM file, MYI file and the MYD file you just copied). F
or example if the table is named foobar in database example, then you will have files named: foobar.MYD, foobar.MYI, and foobar.frm in a folder named example in your mysql directory.

Step 3: Recreate the table schema. Generally this involves logging into the MySQL command line interface and typing in the CREATE TABLE sql queries to make the table. This will recreate the table in MySQL, but the table will be empty.

Step 4: Copy your data file back to your database folder. Take the backup you made of the MYD file in step 1 and copy it over the file that is now in the database folder. Using our example again, you would copy the file foobar.MYD over the file that exists in the example folder in the MySQL data folder.

Step 5: Restart MySQL. Not really necessary, but can’t hurt either.

Step 6: You will need to repair the table to make sure your indexes are correct and everything is working correctly.

And that is it!

Good luck, and I hope you never need this information.

Turn Your PXE Enabled Network Card Into an iSCSI HBA

Update: gPXE has been forked to a new development called iPXE. iPXE is being actively developed by the same team that worked on gPXE and have made many new code changes while gPXE has remained relatively static (look at the respective changelogs for confirmation). The iPXE project has the same features as before with active bug fixes and features being added all the time. Please be sure to double check the commands referenced in this article as they might have changed in name from gPXE to iPXE.

You can turn that PXE enabled network card into an iSCSI enabled HBA for free. Save yourself a couple of bucks on an iSCSI HBA, and boot your server/workstation diskless via iSCSI.

Here is how you turn your card into an HBA.

gPXE iPXE is a PXE compatible bootloader that provides some great functionality including AoE (ATA Over Ethernet), HTTP (loading boot scripts and boot images from HTTP), and the one we are most interested in, iSCSI which allows us to boot from an iSCSI target.

We start off with a working PXE enviroment; DHCP server (to provide IP and PXE settings) and TFTP server (to provide the PXE files we need to load). Now in order to get PXE to load the gPXE iPXE firmware we need to do what is called “chainloading”. This means that our network card will do its standard PXE boot up, and when it loads, we will then load the gPXE iPXE loader and use gPXE iPXE for the rest of the boot process.

Here is how we do that:

In our DHCPd server we need to add some specific settings to enable us to detect wether or not the DHCP request is coming from PXE, or gPXE.

/etc/dhcpd.conf:

allow booting;
allow bootp;
default-lease-time 600;
max-lease-time 7200;
authoritative;
option space gpxe;
option gpxe-encap-opts code 175 = encapsulate gpxe;
option gpxe.bus-id code 177 = string;
ddns-update-style ad-hoc;
subnet 192.168.2.0 netmask 255.255.255.0 {
    use-host-decl-names on;
    range 192.168.2.20 192.168.2.200;
    option subnet-mask 255.255.255.0;
    option broadcast-address 192.168.2.255;
    default-lease-time 1800;
    max-lease-time 86400;
    option domain-name-servers 192.168.1.10;
    next-server 192.168.2.1;
    if not exists gpxe.bus-id {
        filename "undionly.kpxe";
    } else {
        filename "http://192.168.2.1/default/install.gpxe";
    }
}

The important lines are:

option space gpxe;
option gpxe-encap-opts code 175 = encapsulate gpxe;
option gpxe.bus-id code 177 = string;

and

if not exists gpxe.bus-id {
    filename "undionly.kpxe";
} else {
    filename "http://192.168.2.1/default/install.gpxe";
}

This conditional statement allows us to load either the gPXE chainloader when we are called from a standard PXE request (the if not exists gpxe.bus-id) or a gPXE compatiable script when we are called from gPXE.

We are currently using this setup to handle new server OS installations, hence the install.gpxe file.

The contents of that file are rather simple.

install.gpxe:

#!gpxe
kernel http://192.168.2.1/default/centos5 askmethod
initrd http://192.168.2.1/default/centos5.img
boot

This loads the CentOS 5 PXE installation image and initrd to handle OS installation on the server.

Once the server has its OS installed, we then need to add the server’s MAC address to the DHCPd server so that it will chain load gPXE and then load the server’s root disk via iSCSI.

Here is how we accomplish that:

/etc/dhcpd.conf (added in the subnet 192.168.2.0 section above):

host server01 {
    hardware ethernet 00:xx:xx:xx:xx:xx;
    fixed-address 192.168.2.21;
    if not exists gpxe.bus-id {
        filename "undionly.kpxe";
    } else {
        filename "";
        option root-path "iscsi:192.168.2.1::::iqn.2001-04.com.server:server01.vg00.lun0";
    }
}

This, again, chainloads the gPXE iPXE chainloader from PXE, and on the next DHCP request from gPXE iPXE we provide the iSCSI target to load the root for the server. This brings up the normal GRUB screen and the system boots as normal.

And that is how you turn your PXE enabled network card, into an iSCSI HBA.

Gotchas:
I had originally wanted to use gPXE/iSCSI to host the root drive for a Xen based Dom0, however I have discovered that the Xen hypervisor does not support this feature. I have done some searching on the internet and it seems that the problem does lie with Xen’s hypervisor kernel and its inability to read the iBFT (Iscsi Boot Firmware Table). gPXE does support and utilize the iBFT, however the Xen Hypervisor kernel only recognizes the iBFT from certain iSCSI HBAs (listed in thier HCL).

Installing CentOS 5 as a DomU with a Debian Dom0

There isn’t a whole lot of information about how to setup CentOS as a DomU under a Debian 4.0 based Dom0 and still maintain the use of pygrub to boot the CentOS kernels. This howto will give you a general overview on what steps to take without having to use an incomplete CentOS image. This is not going to be a copy and paste sort of howto, but rather a more high level detail, and a couple of fixes to make it all work correctly.

A couple of assumptions I am making here:

  1. You have a working Xen install already under Debian
  2. You can edit files using vi or a comparable editor.
  3. You understand how Xen and LVM can work together at least at some basic level
  4. You are confident to compile your own applications using make, etc…

Here is what you need to do to get started:

Step 1:

Download the kernel image and ram disk for CentOS and put them some place you can access them on the Dom0.

In my case, I put them in /usr/local/src/xen/ (vmlinuz and initrd.gz respectively). I downloaded these files from a CentOS mirror. The files you are after are located in the centos/5.1/os/i386/images/xen/ directory as these contain the Xen code compiled into the kernel so that you can boot the DomU in paravirtualization mode.

Step 2:

Create a Xen DomU configuration file that points to these files for the boot kernel.

I edited the two lines:

kernel = “/usr/local/src/xen/vmlinuz”
ramdisk = “/usr/local/src/xen/initrd.img”

This tells Xen to use these kernels on boot up.

Step 3:

Modify your DomU config to point to your disks:

disk = [ ‘phy:/dev/xen01/centos5-disk,xvda,w’, ‘phy:/dev/xen01/centos5-swap,sda1,w’]

It is important to note that you must export the drives from the Dom0 as xvda, otherwise the CentOS installer will not be able to detect them properly and you will have no target drive to install to.

We will also want to modify the default restart behavior as you will see later, this is important:

on_reboot = ‘destroy’

Step 4:

Go ahead and boot up the Xen DomU using xm create -c

Install CentOS as a normal network installation (point it at an FTP or HTTP mirror and let it install normally).

Step 5:

Once the CentOS installation is completed, the DomU will attempt to reboot itself. This is why we set the on_restart to destroy instead of the default of restart. We need to edit the configuration to boot up via pygrub instead:

bootloader = “/usr/lib/xen-3.0.3-1/bin/pygrub”

Step 6:

Here is where things get a little tricky. The pygrub application is missing a library that it needs in order to boot up CentOS based kernels. We must build this ourselves.

Download the xen-3.0.3 source (the new sources do not build this file, so I used this version specifically, I don’t know if others will work). I know for a fact that xen-3.2.0 does not work.

wget http://bits.xensource.com/oss-xen/release/3.0.3-0/src.tgz/xen-3.0.3_0-src.tgz

Untar the file and cd into the directory xen-3.0.3_0-src

Then:

cd tools/pygrub

Then you need to run make. Pay attention to the errors, you might need to install additional libraries if you don’t have them on your Dom0. (e2fslibs-dev comes to mind).

Step 7:

Once your build has successfully completed, you will need to copy the files to your local xen installation.

cd build/lib.linux-i686-2.4/grub/fsys/ext2
mkdir /usr/lib/xen-3.0.3-1/lib/python/grub/fsys/ext2
cp * /usr/lib/xen-3.0.3-1/lib/python/grub/fsys/ext2/

Step 8:

Boot your DomU using:

xm create -c

Finished:

You should now have a working Xen DomU under Dom0 without having to resort to broken CentOS images.

CentOS 4.4 and Asus P5GC-MX Motherboard

I recently had to install CentOS 4.4 on the Asus P5GC-MX motherboard. The board works very well with the installer as long as you aren’t doing a network install as the network drivers are not available.

Here is some information about the boards network controller, that was almost impossible to track down. The board uses a network chipset called the Attansic L2.

More specifically, you can compile the drivers from source files against the current kernel.

I have included a copy of the driver for you to download if you need it. I searched through many mailing list postings to locate these file and I can confirm that it will compile against the kernel in CentOS 4.4.

l2-linux-driver-new.rar

In order to compile the kernel module you will need to install the kernel source for your current kernel version (yum install kernel-devel and kernel-smp-devel if you need it). You will also need to create a symbolic link from the source to /usr/src/linux as the kernel module looks at this location for the current kernel.

Then you need to cd to the src directory in the archive and run make to compile the module.

Once completed you can run insmod to install the module into the kernel.

After you have done that, you will need to make sure you create module alias between the module and eth0.

It is a little bit of a pain to get working, but it can be done.

Good luck and I hope that the source files come in handy to somebody out there.

CentOS 4.4 and New nForce Chipsets

CentOS is a great OS, and we use it for all our cPanel installs. It is getting a little old, but currently CentOS 5 isn’t supported by cPanel, so we must continue to use CentOS 4.4 for these installs.

The latest batch of motherboards we got in use the MCP61 (nForce 430 chipset). Luckily the SATA controller is supported using the nv_sata kernel module that comes with CentOS 4.4, so there is no need to upgrade that. However the network interface of this chipset is not recognized by the forcedeth driver (reverse engineered nForce network driver).

The solution to fix this problem is to compile the latest forcedeth.ko (kernel module). Here is how you do it.

1) Install CentOS and be sure to install gcc and kernel-devel for your kernel.
2) Download the latest forcedeth drivers from nVidia. You can get them from here: http://www.nvidia.com/object/linux_nforce_1.21.html
3) Extract the files from the zip file.
4) Change to the directory that contains the forcedeth.c source code. (./NV_Linux_DRV_PKG_v1.21/RHEL4_U4/source)
5) Create a Makefile that contains:
obj-m := forcedeth.o
6) Now compile the module with the following command:
make -C /usr/src/kernels/2.6.9-42.0.10.EL-i686/ SUBDIRS=$PWD modules

Please note that your path might differ as you might be using a different version of the kernel.

7) When this completes you will have a new forcedeth.ko file in the current directory. Move this file into modules directory:

cp forcedeth.ko /lib/modules/2.6.9-42.0.10.EL/kernel/drivers/net/

Again, your path might differ based on the version of the kernel you are running.

8) Add an entry to alias the kernel module to your network interface in /etc/modprobe.conf
alias eth0 forcedeth

I threw a reboot at the machine just in case, but you can also do:

modprobe forcedeth

and your network card should now appear in:

ifconfig -a

And there you go…

How To Find and Remove Windows Rootkits

First of all I want to wish everyone a very Happy New Year. It has been a while since I made my last posting and I want to apologize for that.

Now let’s get down to business…

So recently we discovered that a Windows 2003 server had been exploited via an apparently well known 0day exploit in MailEnable’s SMTP service. This has since been corrected by the MailEnable developers (you can read about that here).

At first there was some doubt as to whether or not there was a hacker on the server. Our first clue was the abnormal amount of traffic the server was doing. Typically this server moved about 100-200KBps per day. We knew something was up when this server started moving 2MBps. Upon inspection of the server we couldn’t see anything out of the ordinary, however we did notice that taskman.exe (Task Manager) was running at 100% CPU utilization when ever we looked at it. This threw up all sorts of red flags to us, and we knew that we had a hacker on the server and we needed to find out what they were doing.

I have to say that Event Viewer is your friend. You must look at it every once in a while to make sure you know what is going on. Even with a hacker on the server and rootkit installed on the server to hide his activity, he still wasn’t able to hide some log entries in Event Viewer. After looking in Event Viewer we noticed several references to VMWare. After asking around, we determined that none of the legitimate administrators had installed VMWare and we knew that this must be the hacker.

Here is how we found him, and how we removed him.

We could see that there were some hidden directories on the server that we couldn’t access through the normal Explorer interface, so we knew we were dealing with an on boot rootkit. (You can see file accesses using the file system and disk tools from Windows SysInternals Tools.)

We installed HiJack Free from a-squared. This piece of software is pretty powerful and does some deep inspections of the registry to find services and applications that are not normally visible in the Control Panel services listing. We sorted the services and looked for services that were set to start at boot up, and looked for anything that wasn’t signed. Hi-Jack Free displays the company that signed the driver/service, and any service that was set to run, but wasn’t signed was on our hit list to remove.

With a list of services to disable, we installed the Windows Recovery Console and rebooted the server into the recovery console. We disabled the services that we identified as problems and rebooted the system normally.

At this point we could see the directories that were hidden from us earlier. We discovered that the hacker had installed VMWare. Because we wanted to see what the hacker was actually doing with the VMWare installation, we used Virtual Disk Driver to mount the WMWare disk images to see what they were doing. Turns out they were downloading Pokemon episodes. Ugh… All that hassle and it wasn’t even anything good. 🙂

So we removed the rootkit from the server, removed the VMWare installation, and patched the MailEnable install, and the server has been cruising along ever since.

We hope that this description of what we did will help you find and remove Windows rootkits on your servers.

Find Out What Your DNS Server is Doing

What is my DNS server responding to?

We have been in the process of moving from an old server to a newer server. The process is straight forward, we move the sites over to the new server and then update their zone records to point at the new server (the zone has a low TTL – Time To Live to make this transition smoother). Overall everything has gone smoothly with little interuption in the service of each site.

Finally once everything was moved over, we updated the nameserver records to point at the new server so now everything should be running off the new server’s DNS. We are ready to turn off the old server, but noticed that named (bind) was still handing out DNS responses (based on its activity in top). We thought we had everything updated so that this server shouldn’t be used at all.

So we had to find out what DNS requests were still hitting the old server and why we missed those. Here is what we did to find out.

Edit your named.conf (ours was in /etc).

Add the following section if you do not already have a section called logging {}.

logging {
channel query_logging {
syslog daemon;
severity debug 9;
};
category queries {
query_logging;
};
};

What this does it record any DNS query named serves up in the default syslog for named (generally /var/log/messages). This will help you see what domains are being requested from your server.

We determined what DNS queries were coming in, and based on the whois information found out that there were some very old nameserver records pointing at the server’s IP. Without the logging change above, we could have lost 3 or 4 long time customer’s DNS information when the old server was turned off. As it is now we have updated those nameserver records to point at the new nameservers, and will need to keep the old server up and running for at least another 48 hours (the amount of time a root nameserver record is cached). Saved us a black eye for sure.

What else is my DNS server handing out?

Additionally, you might want to look at the log information and determine if anybody is using your server for recursive lookups too.

What is DNS recursion?

Well, recursion itself isn’t bad, and actually a vital part of DNS. Recursion means that if you request a DNS lookup against a DNS server, and that server isn’t authoritative for that domain (it doesn’t have a zone for that domain), it must pass the DNS request to another server.

Why is it bad to allow recursion?

Until recently DNS recursion wasn’t really a bad thing, but hackers have determined that it is possible to “amplify” or magnify their DDoS (Distributed Denial of Service) attacks using spoofed UDP based DNS requests. (UDP is extremely easy to spoof the originating IP address of the request.) The hackers send a spoofed UDP request for a given domain with a large number of records to a DNS server that allows recursive lookups. Since the initial UDP request is realtively small, and the response (because it has so many records in it) is very large, hackers can amplify the amount of data they can send at a target using recursive third party DNS servers.

How do I turn off recursion in named/bind?

To turn off recursive lookups from unauthorized sources you can add the follownig ACL to your named.conf:

acl recursion { 127.0.0.1; 1.2.3.4/24; };

And then in your options do:

options {
allow-recursion { “recursion”; };
};

The first line creates an ACL (Access Control List) to let named (bind) know who is allowed to do recursive lookups against the server. The IP’s should be listed in CIDR notation, and be followed by a semicolon. Include any IP address that uses this server for legitimate DNS lookup purposes.

The second section should already exist in your named.conf, and you just want to add the allow-recursion line to that section. This will apply the ACL to your server. Then you just need to restart named, and you are good to go.

So that is why you should know exactly what your DNS server is doing.

How To Install NetBSD as a DomU in Xen 3.0

Ever since the first time I heard about Xen and its ability to run any OS side by side on the same server I have had the urge to run a BSD based OS with a Linux OS. Today I have sucessfully achieved my goal, and this is how I did it.

First some background on the server itself. The server is a Dell PowerEdge 1750 with Dual Xeon processors and 3GB of RAM and 500GB of RAID storage. The server is running the Xen 3.0.2 hypervisor kernel (the main kernel that handles the paralization, or virtualization, of the hardware). The Dom0 system is running Debian 3.1 with some patches to the kernel to work with the LSI based RAID 5 card in the server. Each virtual OS installed on the server is given its own partition and is managed using LVM in Dom0.

The vast majority of information about NetBSD running under Xen as a DomU seems to be either Xen 2.0 specific, or assumes you are running NetBSD as Dom0. Unfortunetly, the Xen 2.0 information is no going to work on a Xen 3.0 machine, and more so our Dom0 is Debian, so we needed to come up with our own.

Here is how I did it, and what sort of problems I encountered.

The entire process is pretty easy, but finding the actual information can be tough, and finding the files you need can be even tougher. Here is kind of a rough over view of the process…

1) Set up your partition that will hold NetBSD. We are using a LVM partition named vg00-netbsd.
2) Set up the xen domU config file.
3) Boot the netbsd install kernel for Xen 3.0.
4) Follow the sysinstall steps like you normally do to install NetBSD. I had to use an FTP based installation, because I could never get the CDROM to work correctly.
5) Complete the install and shutdown NetBSD.
6) Edit the domU config file and change the kernel from the install kernel to the normal NetBSD kernel.
7) Boot NetBSD DomU and enjoy.

So here are the specifics.

Step 1: You need to download the NetBSD Xen 3.0 kernels (install and normal) and put them some place on your Dom0. I put mine in the /boot of the server, because it sort of made sense to me, but they can be almost anywhere. You can download the DomU kernels from NetBSD’s FTP servers from the daily build areas. The kernels for Xen 3.0 are not in the release versions of NetBSD so you have to find them. I would post links to them, but most likely the would go stale over time. Go to ftp://ftp.netbsd.org/pub/NetBSD-daily/ and navigate through to either the NetBSD 3.1 tree or the NetBSD 4.0 tree. You are looking for the directory i386/binary/kernel/ in that directory you will find the two kernels you need. The install kernel is called netbsd-INSTALL_XEN3_DOMU.gz and the normal kernel is named netbsd-XEN3_DOMU.gz. Download both of those kernels as you will need them later.

Step 2: Once you have downloaded your kernels you will need to create a xen config file for your NetBSD DomU. Here is an example of the one I used:

kernel = “/boot/netbsd-INSTALL_XEN3_DOMU.gz”
memory = 128
name = “netbsd”
vif = [ ” ]
disk = [ ‘phy:/dev/mapper/vg00-netbsd,0x01,w’ ]
root = “/dev/wd0d”

You will need to change the disk = line to match where you are installing NetBSD to on your server. After you have created that file in your xen config directory (our was /etc/xen/).

Step 3: We are ready to boot NetBSD for the first time. To boot NetBSD we run the command:

xm create -c netbsd

“netbsd” is the name of the DomU config file we created in step 2, so change that to match what you used in that step.

A couple of times we noticed that Xen didn’t attach us to the console of the booting NetBSD DomU, so you may need to connect to it manually. To do so do the following:

xm list

Which will print out a list of running Xen instances like this:

Name ID Mem(MiB) VCPUs State Time(s)
debian 0 1374 4 r—– 2354.5
plesk 10 1024 1 -b—- 161.5
netbsd 50 128 1 -b—- 1.2
qmail 8 128 1 -b—- 953.4

We will need to know the ID of the instance we want to attach to. In the example above this is 50. Then we attach to the console of that DomU by typing:

xm console 50

To break out of the console at any time simple press CTRL+] at the same time.

Step 4: Once you are in the console you should see the sysinstall application. You can follwo the prompts and install NetBSD like you would normally do. One problem I did encounter was that for what ever reason the server would stop talking to the FTP server due to some sort of DNS lookup failure. It did this no matter which kernel I tried. I eventually resorted to using the IP address instead, and the installation worked perfectly.

Step 5: Once the install is completed, break out of the server and shut it down via the command:

xm shutdown 50

Again replace 50 with the ID of the DomU of your NetBSD install.

Step 6: Edit the DomU file and change the kernel line to point to your normal NetBSD kernel. So your DomU config file should look somethnig like this now:

kernel = “/boot/netbsd-XEN3_DOMU.gz”
memory = 128
name = “netbsd”
vif = [ ” ]
disk = [ ‘phy:/dev/mapper/vg00-netbsd,0x01,w’ ]
root = “/dev/wd0d”

Step 7: Reboot your NetBSD DomU via the command:

xm create -c netbsd

Enjoy your NetBSD running under a Debian/Linux Dom0 in Xen 3.0!

Gotchas:

Having used the NetBSD system only breifly I have noticed that there is something “funky” with the networking and the way it behaves. I noticed over a sustained ping that the network interface starts to drop packets, every other packet it seems. Modifying the vif = line in the DomU config to read:

vif = [ ‘bridge=xenbr0’ ]

seems to have cleared up the issue. This line bridges the ethernet interface inside the DomU to the xenbr0 interface in the Dom0. It seems to have cleared up the issue to date.

And there
you have it! NetBSD running under a Linux Dom0 on top of Xen 3.0. The world just got a whole lot smaller.