« Posts by The Geek

How To Recover Data From a Badly Corrupted Drive

Recently I had a customer’s hard drive on a dedicated server corrupt so badly that we couldn’t access the information on it at all. Any attempt to fsck the drive or even just mount it produced weird errors and strange notifications (like trying to mount it would say “not found” and other really vague answers).

We basically determined that the inode that held the partition tables as well as the inode that held root directory (/) were corrupted. We were able to run a fsck on it initially, and were prompted to repair the entries as well as multiply linked inodes and other errors. After letting fsck finish, the drive was completely unreadable. No partition tables, nothing… We told the customer that the drive corrupted and that they would have to restore their site from backups. Well guess what they said? “What backups?” Ugh… So…

Moving on… we installed a new drive, moved the old drive as the slave to attempt to recover the data for them. We installed the OS and setup the control panel, and put a page “Site crashed… we are working on restoring it.” and moved forward.

Here is what we did and the results we had and some notes about the process along the way.

1) Do not panic! The first thing you have to remember in dealing with a corrupted drive is don’t panic! The information is still there… and with enough effort can be recovered provided that the drive still functions at some level.

2) Make a backup image. This will help you out a great deal when things go wrong. I recommend using dd to just mirror the data to a new drive or some place with enough storage to hold the drive data.

3) Take your time. This one is hard to follow because normally the site is down and the customer is absolutely panic striken and calling you every 15 minutes asking about the status. Working with large sets of data takes time, so be patient and wait for the various processes to complete… They always take a while, and cutting corners here to save time will only lead to misery if you screw things up.

Okay… Now that we have the ground rules laid out, here is how we restored all their data save one database table, and even then we managed to save that and I will show you how we did that as well.

Step 1: Take the drive to another machine if you can, or a safe place to work on it. We had another machine with a large enough hard drive to hold the data on the drive and made a disk image of it.

dd if=/dev/sdb bs=1k conv=sync,noerror | gzip -c > /path/to/disk.img.gz

This will save your a potential future headache, but it takes a long time for a fair amount of data. Wait it out… it is worth it in the long run.

Step 2: Determine how badly the drive is corrupted. In our case an “fdisk -l /dev/sdb” didn’t show any partition tables, so we had to recover that first before we could start getting at the data.

The application we used to recover the partition table is TestDisk. It is written by a gentleman named Christopher Grenier, and to say that it is awesome is a complete understatement. You will be using this software for all steps that follow, that is how useful it is.

Okay… so download TestDisk from the link above and extract it. Make sure to grab the version for the OS you are using. We used the linux 2.6 kernel version since the drive is attached to a 2.6 kernel machine that we are using for recovery purposes.

Step 3: Recover the partition table. When you first run TestDisk it provides you with a list of drives available. Select the drive (in our case it was /dev/sdb), and the partition table type (we used Intel/PC since that is what we have). Select “Analyse” and the software will inspect the drive to look for partitions. The software will do a quick search first and scan the drive very quickly looking for partitions. In our case it found the boot partition almost immediately, however the other partition was not found on the first pass, so we needed to do a “Deeper Search”. The Deep Search found the other partition.

Step 4: Once you have the partitions listed, you will want to write them to the drive if possible. This will help you in the next step. After we wrote the partition table we exited TestDisk and did step 5. If you just have a Linux based partiton instead of an LVM partition, then you can just continue to step 6.

This is where things get a little tricky. In our case the partition was actually an LVM physical volume partition, meaning it holds additional information for LVM volume groups and logical volumes inside it. TestDisk won’t allow you to read files directly from the LVM partition, which makes sense since there aren’t technically any files inside that. What you need to do is get your OS to import and activate the volume groups and logical volumes so that TestDisk can see them and use them.

Step 5: Import and activate your LVM volume groups and logical volumes. First you need to find your LVM settings. So run:

lvm vgdisplay

This will show you the volume groups for the drive. Then you run:

lvchange  -ay

This will activate the volume groups and logical volumes so that the OS can use them.

Step 6: Start TestDisk again, and this time you should see the logical volume or partition with your files on it. Select it and this time (if you are using LVM) you should select “None” as the partition type, since LVM doesn’t have partition information. Select Analyse and you will see the partitions hopefully.

Step 7: Move the partition that contains your files, and press the letter p. This will give you a directory listing of the files as they are found in the partition/logical volume. From here it is just a matter of navigating to the correct location and finding the files you need to recover and pressing “c” to copy them off the drive onto your working drive.

The customer’s drive was a mess. Basically all the directories and files were linked under “lost+found” with names like #123456789. We had to hunt and peck through the drive structure to find the information that we needed. We eventually found the mysql directory and the web space directory and copied those to the working drive (/dev/sda) on our recovery machine.

I have to say that we were pretty pleased with the results. Out of all the files we recovered (and there thousands of images and other files) we only had one file that was damaged. Unfortunetly it was a MySQL FRM file which contains the schema information for the associated table. Without this information MySQL can’t read the data from the MYD file.

Recovering from a damaged FRM file

The FRM file doesn’t often get modified, so the likelyhood of it being damaged is small, but in our case the damaged drive did corrupt this file. Here is how we recovered the data.

If you have a copy of the original schema, and I mean an exact copy with correct field lengths/types, etc. then you are in luck and the restore process is very simple.

Step 1: Make backups of your files. Copy the MYD file out of the databases folder (/var/lib/mysql/databasename) to someplace safe. You are goint to need this later.

Step 2: Delete the other files that make up the table files (remove the corrupted FRM file, MYI file and the MYD file you just copied). F
or example if the table is named foobar in database example, then you will have files named: foobar.MYD, foobar.MYI, and foobar.frm in a folder named example in your mysql directory.

Step 3: Recreate the table schema. Generally this involves logging into the MySQL command line interface and typing in the CREATE TABLE sql queries to make the table. This will recreate the table in MySQL, but the table will be empty.

Step 4: Copy your data file back to your database folder. Take the backup you made of the MYD file in step 1 and copy it over the file that is now in the database folder. Using our example again, you would copy the file foobar.MYD over the file that exists in the example folder in the MySQL data folder.

Step 5: Restart MySQL. Not really necessary, but can’t hurt either.

Step 6: You will need to repair the table to make sure your indexes are correct and everything is working correctly.

And that is it!

Good luck, and I hope you never need this information.

Turn Your PXE Enabled Network Card Into an iSCSI HBA

Update: gPXE has been forked to a new development called iPXE. iPXE is being actively developed by the same team that worked on gPXE and have made many new code changes while gPXE has remained relatively static (look at the respective changelogs for confirmation). The iPXE project has the same features as before with active bug fixes and features being added all the time. Please be sure to double check the commands referenced in this article as they might have changed in name from gPXE to iPXE.

You can turn that PXE enabled network card into an iSCSI enabled HBA for free. Save yourself a couple of bucks on an iSCSI HBA, and boot your server/workstation diskless via iSCSI.

Here is how you turn your card into an HBA.

gPXE iPXE is a PXE compatible bootloader that provides some great functionality including AoE (ATA Over Ethernet), HTTP (loading boot scripts and boot images from HTTP), and the one we are most interested in, iSCSI which allows us to boot from an iSCSI target.

We start off with a working PXE enviroment; DHCP server (to provide IP and PXE settings) and TFTP server (to provide the PXE files we need to load). Now in order to get PXE to load the gPXE iPXE firmware we need to do what is called “chainloading”. This means that our network card will do its standard PXE boot up, and when it loads, we will then load the gPXE iPXE loader and use gPXE iPXE for the rest of the boot process.

Here is how we do that:

In our DHCPd server we need to add some specific settings to enable us to detect wether or not the DHCP request is coming from PXE, or gPXE.

/etc/dhcpd.conf:

allow booting;
allow bootp;
default-lease-time 600;
max-lease-time 7200;
authoritative;
option space gpxe;
option gpxe-encap-opts code 175 = encapsulate gpxe;
option gpxe.bus-id code 177 = string;
ddns-update-style ad-hoc;
subnet 192.168.2.0 netmask 255.255.255.0 {
    use-host-decl-names on;
    range 192.168.2.20 192.168.2.200;
    option subnet-mask 255.255.255.0;
    option broadcast-address 192.168.2.255;
    default-lease-time 1800;
    max-lease-time 86400;
    option domain-name-servers 192.168.1.10;
    next-server 192.168.2.1;
    if not exists gpxe.bus-id {
        filename "undionly.kpxe";
    } else {
        filename "http://192.168.2.1/default/install.gpxe";
    }
}

The important lines are:

option space gpxe;
option gpxe-encap-opts code 175 = encapsulate gpxe;
option gpxe.bus-id code 177 = string;

and

if not exists gpxe.bus-id {
    filename "undionly.kpxe";
} else {
    filename "http://192.168.2.1/default/install.gpxe";
}

This conditional statement allows us to load either the gPXE chainloader when we are called from a standard PXE request (the if not exists gpxe.bus-id) or a gPXE compatiable script when we are called from gPXE.

We are currently using this setup to handle new server OS installations, hence the install.gpxe file.

The contents of that file are rather simple.

install.gpxe:

#!gpxe
kernel http://192.168.2.1/default/centos5 askmethod
initrd http://192.168.2.1/default/centos5.img
boot

This loads the CentOS 5 PXE installation image and initrd to handle OS installation on the server.

Once the server has its OS installed, we then need to add the server’s MAC address to the DHCPd server so that it will chain load gPXE and then load the server’s root disk via iSCSI.

Here is how we accomplish that:

/etc/dhcpd.conf (added in the subnet 192.168.2.0 section above):

host server01 {
    hardware ethernet 00:xx:xx:xx:xx:xx;
    fixed-address 192.168.2.21;
    if not exists gpxe.bus-id {
        filename "undionly.kpxe";
    } else {
        filename "";
        option root-path "iscsi:192.168.2.1::::iqn.2001-04.com.server:server01.vg00.lun0";
    }
}

This, again, chainloads the gPXE iPXE chainloader from PXE, and on the next DHCP request from gPXE iPXE we provide the iSCSI target to load the root for the server. This brings up the normal GRUB screen and the system boots as normal.

And that is how you turn your PXE enabled network card, into an iSCSI HBA.

Gotchas:
I had originally wanted to use gPXE/iSCSI to host the root drive for a Xen based Dom0, however I have discovered that the Xen hypervisor does not support this feature. I have done some searching on the internet and it seems that the problem does lie with Xen’s hypervisor kernel and its inability to read the iBFT (Iscsi Boot Firmware Table). gPXE does support and utilize the iBFT, however the Xen Hypervisor kernel only recognizes the iBFT from certain iSCSI HBAs (listed in thier HCL).

Installing CentOS 5 as a DomU with a Debian Dom0

There isn’t a whole lot of information about how to setup CentOS as a DomU under a Debian 4.0 based Dom0 and still maintain the use of pygrub to boot the CentOS kernels. This howto will give you a general overview on what steps to take without having to use an incomplete CentOS image. This is not going to be a copy and paste sort of howto, but rather a more high level detail, and a couple of fixes to make it all work correctly.

A couple of assumptions I am making here:

  1. You have a working Xen install already under Debian
  2. You can edit files using vi or a comparable editor.
  3. You understand how Xen and LVM can work together at least at some basic level
  4. You are confident to compile your own applications using make, etc…

Here is what you need to do to get started:

Step 1:

Download the kernel image and ram disk for CentOS and put them some place you can access them on the Dom0.

In my case, I put them in /usr/local/src/xen/ (vmlinuz and initrd.gz respectively). I downloaded these files from a CentOS mirror. The files you are after are located in the centos/5.1/os/i386/images/xen/ directory as these contain the Xen code compiled into the kernel so that you can boot the DomU in paravirtualization mode.

Step 2:

Create a Xen DomU configuration file that points to these files for the boot kernel.

I edited the two lines:

kernel = “/usr/local/src/xen/vmlinuz”
ramdisk = “/usr/local/src/xen/initrd.img”

This tells Xen to use these kernels on boot up.

Step 3:

Modify your DomU config to point to your disks:

disk = [ ‘phy:/dev/xen01/centos5-disk,xvda,w’, ‘phy:/dev/xen01/centos5-swap,sda1,w’]

It is important to note that you must export the drives from the Dom0 as xvda, otherwise the CentOS installer will not be able to detect them properly and you will have no target drive to install to.

We will also want to modify the default restart behavior as you will see later, this is important:

on_reboot = ‘destroy’

Step 4:

Go ahead and boot up the Xen DomU using xm create -c

Install CentOS as a normal network installation (point it at an FTP or HTTP mirror and let it install normally).

Step 5:

Once the CentOS installation is completed, the DomU will attempt to reboot itself. This is why we set the on_restart to destroy instead of the default of restart. We need to edit the configuration to boot up via pygrub instead:

bootloader = “/usr/lib/xen-3.0.3-1/bin/pygrub”

Step 6:

Here is where things get a little tricky. The pygrub application is missing a library that it needs in order to boot up CentOS based kernels. We must build this ourselves.

Download the xen-3.0.3 source (the new sources do not build this file, so I used this version specifically, I don’t know if others will work). I know for a fact that xen-3.2.0 does not work.

wget http://bits.xensource.com/oss-xen/release/3.0.3-0/src.tgz/xen-3.0.3_0-src.tgz

Untar the file and cd into the directory xen-3.0.3_0-src

Then:

cd tools/pygrub

Then you need to run make. Pay attention to the errors, you might need to install additional libraries if you don’t have them on your Dom0. (e2fslibs-dev comes to mind).

Step 7:

Once your build has successfully completed, you will need to copy the files to your local xen installation.

cd build/lib.linux-i686-2.4/grub/fsys/ext2
mkdir /usr/lib/xen-3.0.3-1/lib/python/grub/fsys/ext2
cp * /usr/lib/xen-3.0.3-1/lib/python/grub/fsys/ext2/

Step 8:

Boot your DomU using:

xm create -c

Finished:

You should now have a working Xen DomU under Dom0 without having to resort to broken CentOS images.

CentOS 4.4 and Asus P5GC-MX Motherboard

I recently had to install CentOS 4.4 on the Asus P5GC-MX motherboard. The board works very well with the installer as long as you aren’t doing a network install as the network drivers are not available.

Here is some information about the boards network controller, that was almost impossible to track down. The board uses a network chipset called the Attansic L2.

More specifically, you can compile the drivers from source files against the current kernel.

I have included a copy of the driver for you to download if you need it. I searched through many mailing list postings to locate these file and I can confirm that it will compile against the kernel in CentOS 4.4.

l2-linux-driver-new.rar

In order to compile the kernel module you will need to install the kernel source for your current kernel version (yum install kernel-devel and kernel-smp-devel if you need it). You will also need to create a symbolic link from the source to /usr/src/linux as the kernel module looks at this location for the current kernel.

Then you need to cd to the src directory in the archive and run make to compile the module.

Once completed you can run insmod to install the module into the kernel.

After you have done that, you will need to make sure you create module alias between the module and eth0.

It is a little bit of a pain to get working, but it can be done.

Good luck and I hope that the source files come in handy to somebody out there.

CentOS 4.4 and New nForce Chipsets

CentOS is a great OS, and we use it for all our cPanel installs. It is getting a little old, but currently CentOS 5 isn’t supported by cPanel, so we must continue to use CentOS 4.4 for these installs.

The latest batch of motherboards we got in use the MCP61 (nForce 430 chipset). Luckily the SATA controller is supported using the nv_sata kernel module that comes with CentOS 4.4, so there is no need to upgrade that. However the network interface of this chipset is not recognized by the forcedeth driver (reverse engineered nForce network driver).

The solution to fix this problem is to compile the latest forcedeth.ko (kernel module). Here is how you do it.

1) Install CentOS and be sure to install gcc and kernel-devel for your kernel.
2) Download the latest forcedeth drivers from nVidia. You can get them from here: http://www.nvidia.com/object/linux_nforce_1.21.html
3) Extract the files from the zip file.
4) Change to the directory that contains the forcedeth.c source code. (./NV_Linux_DRV_PKG_v1.21/RHEL4_U4/source)
5) Create a Makefile that contains:
obj-m := forcedeth.o
6) Now compile the module with the following command:
make -C /usr/src/kernels/2.6.9-42.0.10.EL-i686/ SUBDIRS=$PWD modules

Please note that your path might differ as you might be using a different version of the kernel.

7) When this completes you will have a new forcedeth.ko file in the current directory. Move this file into modules directory:

cp forcedeth.ko /lib/modules/2.6.9-42.0.10.EL/kernel/drivers/net/

Again, your path might differ based on the version of the kernel you are running.

8) Add an entry to alias the kernel module to your network interface in /etc/modprobe.conf
alias eth0 forcedeth

I threw a reboot at the machine just in case, but you can also do:

modprobe forcedeth

and your network card should now appear in:

ifconfig -a

And there you go…

How To Find and Remove Windows Rootkits

First of all I want to wish everyone a very Happy New Year. It has been a while since I made my last posting and I want to apologize for that.

Now let’s get down to business…

So recently we discovered that a Windows 2003 server had been exploited via an apparently well known 0day exploit in MailEnable’s SMTP service. This has since been corrected by the MailEnable developers (you can read about that here).

At first there was some doubt as to whether or not there was a hacker on the server. Our first clue was the abnormal amount of traffic the server was doing. Typically this server moved about 100-200KBps per day. We knew something was up when this server started moving 2MBps. Upon inspection of the server we couldn’t see anything out of the ordinary, however we did notice that taskman.exe (Task Manager) was running at 100% CPU utilization when ever we looked at it. This threw up all sorts of red flags to us, and we knew that we had a hacker on the server and we needed to find out what they were doing.

I have to say that Event Viewer is your friend. You must look at it every once in a while to make sure you know what is going on. Even with a hacker on the server and rootkit installed on the server to hide his activity, he still wasn’t able to hide some log entries in Event Viewer. After looking in Event Viewer we noticed several references to VMWare. After asking around, we determined that none of the legitimate administrators had installed VMWare and we knew that this must be the hacker.

Here is how we found him, and how we removed him.

We could see that there were some hidden directories on the server that we couldn’t access through the normal Explorer interface, so we knew we were dealing with an on boot rootkit. (You can see file accesses using the file system and disk tools from Windows SysInternals Tools.)

We installed HiJack Free from a-squared. This piece of software is pretty powerful and does some deep inspections of the registry to find services and applications that are not normally visible in the Control Panel services listing. We sorted the services and looked for services that were set to start at boot up, and looked for anything that wasn’t signed. Hi-Jack Free displays the company that signed the driver/service, and any service that was set to run, but wasn’t signed was on our hit list to remove.

With a list of services to disable, we installed the Windows Recovery Console and rebooted the server into the recovery console. We disabled the services that we identified as problems and rebooted the system normally.

At this point we could see the directories that were hidden from us earlier. We discovered that the hacker had installed VMWare. Because we wanted to see what the hacker was actually doing with the VMWare installation, we used Virtual Disk Driver to mount the WMWare disk images to see what they were doing. Turns out they were downloading Pokemon episodes. Ugh… All that hassle and it wasn’t even anything good. 🙂

So we removed the rootkit from the server, removed the VMWare installation, and patched the MailEnable install, and the server has been cruising along ever since.

We hope that this description of what we did will help you find and remove Windows rootkits on your servers.

Find Out What Your DNS Server is Doing

What is my DNS server responding to?

We have been in the process of moving from an old server to a newer server. The process is straight forward, we move the sites over to the new server and then update their zone records to point at the new server (the zone has a low TTL – Time To Live to make this transition smoother). Overall everything has gone smoothly with little interuption in the service of each site.

Finally once everything was moved over, we updated the nameserver records to point at the new server so now everything should be running off the new server’s DNS. We are ready to turn off the old server, but noticed that named (bind) was still handing out DNS responses (based on its activity in top). We thought we had everything updated so that this server shouldn’t be used at all.

So we had to find out what DNS requests were still hitting the old server and why we missed those. Here is what we did to find out.

Edit your named.conf (ours was in /etc).

Add the following section if you do not already have a section called logging {}.

logging {
channel query_logging {
syslog daemon;
severity debug 9;
};
category queries {
query_logging;
};
};

What this does it record any DNS query named serves up in the default syslog for named (generally /var/log/messages). This will help you see what domains are being requested from your server.

We determined what DNS queries were coming in, and based on the whois information found out that there were some very old nameserver records pointing at the server’s IP. Without the logging change above, we could have lost 3 or 4 long time customer’s DNS information when the old server was turned off. As it is now we have updated those nameserver records to point at the new nameservers, and will need to keep the old server up and running for at least another 48 hours (the amount of time a root nameserver record is cached). Saved us a black eye for sure.

What else is my DNS server handing out?

Additionally, you might want to look at the log information and determine if anybody is using your server for recursive lookups too.

What is DNS recursion?

Well, recursion itself isn’t bad, and actually a vital part of DNS. Recursion means that if you request a DNS lookup against a DNS server, and that server isn’t authoritative for that domain (it doesn’t have a zone for that domain), it must pass the DNS request to another server.

Why is it bad to allow recursion?

Until recently DNS recursion wasn’t really a bad thing, but hackers have determined that it is possible to “amplify” or magnify their DDoS (Distributed Denial of Service) attacks using spoofed UDP based DNS requests. (UDP is extremely easy to spoof the originating IP address of the request.) The hackers send a spoofed UDP request for a given domain with a large number of records to a DNS server that allows recursive lookups. Since the initial UDP request is realtively small, and the response (because it has so many records in it) is very large, hackers can amplify the amount of data they can send at a target using recursive third party DNS servers.

How do I turn off recursion in named/bind?

To turn off recursive lookups from unauthorized sources you can add the follownig ACL to your named.conf:

acl recursion { 127.0.0.1; 1.2.3.4/24; };

And then in your options do:

options {
allow-recursion { “recursion”; };
};

The first line creates an ACL (Access Control List) to let named (bind) know who is allowed to do recursive lookups against the server. The IP’s should be listed in CIDR notation, and be followed by a semicolon. Include any IP address that uses this server for legitimate DNS lookup purposes.

The second section should already exist in your named.conf, and you just want to add the allow-recursion line to that section. This will apply the ACL to your server. Then you just need to restart named, and you are good to go.

So that is why you should know exactly what your DNS server is doing.

How To Install NetBSD as a DomU in Xen 3.0

Ever since the first time I heard about Xen and its ability to run any OS side by side on the same server I have had the urge to run a BSD based OS with a Linux OS. Today I have sucessfully achieved my goal, and this is how I did it.

First some background on the server itself. The server is a Dell PowerEdge 1750 with Dual Xeon processors and 3GB of RAM and 500GB of RAID storage. The server is running the Xen 3.0.2 hypervisor kernel (the main kernel that handles the paralization, or virtualization, of the hardware). The Dom0 system is running Debian 3.1 with some patches to the kernel to work with the LSI based RAID 5 card in the server. Each virtual OS installed on the server is given its own partition and is managed using LVM in Dom0.

The vast majority of information about NetBSD running under Xen as a DomU seems to be either Xen 2.0 specific, or assumes you are running NetBSD as Dom0. Unfortunetly, the Xen 2.0 information is no going to work on a Xen 3.0 machine, and more so our Dom0 is Debian, so we needed to come up with our own.

Here is how I did it, and what sort of problems I encountered.

The entire process is pretty easy, but finding the actual information can be tough, and finding the files you need can be even tougher. Here is kind of a rough over view of the process…

1) Set up your partition that will hold NetBSD. We are using a LVM partition named vg00-netbsd.
2) Set up the xen domU config file.
3) Boot the netbsd install kernel for Xen 3.0.
4) Follow the sysinstall steps like you normally do to install NetBSD. I had to use an FTP based installation, because I could never get the CDROM to work correctly.
5) Complete the install and shutdown NetBSD.
6) Edit the domU config file and change the kernel from the install kernel to the normal NetBSD kernel.
7) Boot NetBSD DomU and enjoy.

So here are the specifics.

Step 1: You need to download the NetBSD Xen 3.0 kernels (install and normal) and put them some place on your Dom0. I put mine in the /boot of the server, because it sort of made sense to me, but they can be almost anywhere. You can download the DomU kernels from NetBSD’s FTP servers from the daily build areas. The kernels for Xen 3.0 are not in the release versions of NetBSD so you have to find them. I would post links to them, but most likely the would go stale over time. Go to ftp://ftp.netbsd.org/pub/NetBSD-daily/ and navigate through to either the NetBSD 3.1 tree or the NetBSD 4.0 tree. You are looking for the directory i386/binary/kernel/ in that directory you will find the two kernels you need. The install kernel is called netbsd-INSTALL_XEN3_DOMU.gz and the normal kernel is named netbsd-XEN3_DOMU.gz. Download both of those kernels as you will need them later.

Step 2: Once you have downloaded your kernels you will need to create a xen config file for your NetBSD DomU. Here is an example of the one I used:

kernel = “/boot/netbsd-INSTALL_XEN3_DOMU.gz”
memory = 128
name = “netbsd”
vif = [ ” ]
disk = [ ‘phy:/dev/mapper/vg00-netbsd,0x01,w’ ]
root = “/dev/wd0d”

You will need to change the disk = line to match where you are installing NetBSD to on your server. After you have created that file in your xen config directory (our was /etc/xen/).

Step 3: We are ready to boot NetBSD for the first time. To boot NetBSD we run the command:

xm create -c netbsd

“netbsd” is the name of the DomU config file we created in step 2, so change that to match what you used in that step.

A couple of times we noticed that Xen didn’t attach us to the console of the booting NetBSD DomU, so you may need to connect to it manually. To do so do the following:

xm list

Which will print out a list of running Xen instances like this:

Name ID Mem(MiB) VCPUs State Time(s)
debian 0 1374 4 r—– 2354.5
plesk 10 1024 1 -b—- 161.5
netbsd 50 128 1 -b—- 1.2
qmail 8 128 1 -b—- 953.4

We will need to know the ID of the instance we want to attach to. In the example above this is 50. Then we attach to the console of that DomU by typing:

xm console 50

To break out of the console at any time simple press CTRL+] at the same time.

Step 4: Once you are in the console you should see the sysinstall application. You can follwo the prompts and install NetBSD like you would normally do. One problem I did encounter was that for what ever reason the server would stop talking to the FTP server due to some sort of DNS lookup failure. It did this no matter which kernel I tried. I eventually resorted to using the IP address instead, and the installation worked perfectly.

Step 5: Once the install is completed, break out of the server and shut it down via the command:

xm shutdown 50

Again replace 50 with the ID of the DomU of your NetBSD install.

Step 6: Edit the DomU file and change the kernel line to point to your normal NetBSD kernel. So your DomU config file should look somethnig like this now:

kernel = “/boot/netbsd-XEN3_DOMU.gz”
memory = 128
name = “netbsd”
vif = [ ” ]
disk = [ ‘phy:/dev/mapper/vg00-netbsd,0x01,w’ ]
root = “/dev/wd0d”

Step 7: Reboot your NetBSD DomU via the command:

xm create -c netbsd

Enjoy your NetBSD running under a Debian/Linux Dom0 in Xen 3.0!

Gotchas:

Having used the NetBSD system only breifly I have noticed that there is something “funky” with the networking and the way it behaves. I noticed over a sustained ping that the network interface starts to drop packets, every other packet it seems. Modifying the vif = line in the DomU config to read:

vif = [ ‘bridge=xenbr0’ ]

seems to have cleared up the issue. This line bridges the ethernet interface inside the DomU to the xenbr0 interface in the Dom0. It seems to have cleared up the issue to date.

And there
you have it! NetBSD running under a Linux Dom0 on top of Xen 3.0. The world just got a whole lot smaller.

Why You Shouldn’t Use Fedora Core For Production Servers

I am going to vent a little here.

I don’t understand why web hosting companies (and dedicated server providers for that matter) continue to provide Fedora Core based servers to customers. It just doesn’t make sense. Fedora Core was never intended to be used in a production enviroment and why people keep using in those enviroments baffles me.

We had a server that was running Fedora Core 4 and cPanel and has been in production for almost a year. It was always updated and properly maintained… that is until RedHat dropped updates for it (which they do every 6 months as that is the development cycle for Fedora). Dropping updates isn’t that big of a deal as there is the Fedora Legacy Project that continues to provide updates after RedHat has dropped updates.

Here is what happened with this server, Fedora Core 4 was dropped from RedHat support, and was being taken over by Fedora Legacy. No big deal, this has happened a bunch of times in the past with previous Fedora Core versions. The problem enters when you throw in a local kernel exploit that is discovered for the /proc fs. We attempted to upgrade this kernel to prevent this problem, but because FC4 was in a state of transition between the two providers, the updates weren’t available yet. As such this server was put into the “holding for update” queue.

In that time the server was hacked and needed to be wiped and reinstalled with another OS that won’t cause this particular problem (CentOS 4.3).

What we recommend to customers instead of Fedora Core?

CentOS: RedHat Enterprise Linux based OS. They grab the source RPM files from RedHat and remove the images that reference RedHat and replace them with CentOS based images, and other minor modifications, but generally this is the same code that is used to run RedHat Enterprise Linux. Because it is based on RHEL it has a good development cycle as well as excellent support for new drivers. Does it have it’s problems? Yes, any problem that exists in RHEL will also exist in CentOS (various spin_lock kernel panic issues have been documented).

We recommend the use of CentOS for any application that does not support our number one OS of choice.

Debian (stable): I admit it. I am a big fan of Debian. It works great and I am very familair with it. It handles in place version upgrades (moving from woody to sarge was done with minimal effort using apt-get dist-upgrade). It has an easy to use package manager, with a large selection of applications, as well as an active developer community. My biggest problem with Debian is the that the kernel gets out of date very quickly. Unlike CentOS which actually backports drivers from a more current kernel revision (for instance 2.6.16) into the current standardized kernel for the OS (2.6.9 for CentOS 4.3), Debian does not do that. So it is often a chore to get Debian to run (or even install) on newer chipsets.

Overall, Debian is the winner for us. We install it anytime we have the option (even for Virtual Private Servers) as it just works. It has its faults and is by no means perfect, but we are willing to over look those as they are minor in comparison to the benefits it can provide in ease of management and administration.

So to recap:

  • Don’t use Fedora Core for a production server, it will only cause you pain and suffering in the long run.
  • Anyone that suggests you use Fedora Core in a production enviroment has not analyised the potential ramifications of that decision completely.
  • There are many viable alternatives that are actively developed and maintained as well as providing up to date and current device driver support and security updates.

In conclusion:

Don’t use Fedora Core on a production machine.

Steps to Tighten PHP Security

Recently I have had to deal with some insecure PHP scripts on a couple of servers that have caused us some serious problems, as well as time, to recover from.

I am going to list some of the important steps you can take to protect your server and your site from insecure PHP applications. This will by no means a complete list of items to stop them, but this should help prevent most of the basic attacks (“script kiddie” attacks – where the attacker doesn’t have a great deal of skill and is depending on a preset scenario).

Most of these solutions will assume you have some level of control (root) over the server.

  • Turn off allow_url_fopen: This modification will prevent the use of http://www.domain.com/somescript.php and ftp://www.domain.com/somescript.php from being used in include(), include_once(), require(), require_once(), as well as fopen(). This will prevent a hacker from including malicious code remotely. This modification will prevent the vast majority of hack attempts in PHP from working.

    The downside of this modification is that some legitmate applications that use fopen() to open a remote web page might be broken. You can encourage users to use the PHP curl functions instead as they accomplish the same results.

    Edit your php.ini and change allow_url_fopen = On to allow_url_fopen = Off and restart your web server.

  • Use open_basedir: open_basedir allows you to dictate which paths PHP is allowed to access for a given site. Generally we set this to the path of the website (/var/www/html), /tmp (for writting session data as well as for file uploads), and the path to the PEAR repository.

    The good thing about open_basedir is that it is default deny, meaning if you don’t specify the path in the open_basedir it is blocked.

    The bad part about open_basedir is that it come sometimes block access to legitmate applications needed by your PHP applications (Gallery is one that comes to mind, as it makes use of some external applications in /usr/bin, so you must open access to that location).

    Generally, you should use this directive for every website you host as it allows you to control which directories PHP can access and you can make sure that those directories have the correct permissions to prevent potential exploitation.

  • Mount /tmp noexec, nosuid and nodev: This one is particularly useful as it allows you to tell the operating system not to run any applications in a given area of the hard drive. I can tell you personally that this one as saved me numerous times. When used in combintation with the open_basedir directive you can effectively limit what the hackers have access to, and what they are able to run.

    If you have a seperate /tmp partition on your hard drive you can edit the /etc/fstab file and change the options for the /tmp line.

    /dev/hda3 /tmp ext3 noexec,nosuid,nodev 0 0

    If you don’t have a seperate /tmp partition you can create one as a loopback filesystem and mount that as noexec. I will make a seperate post on how to do this later.

  • Mount your web space noexec: This might not be possible, but is a great option just like mounting /tmp in noexec. The reason for this is to prevent binary applications from being uploaded and executed in your webspace. Like I said this might not be possible, but is an excellent prevention method to prevent local root exploit binaries (where a binary exploits a problem in the kernel to gain escallated priviledges on the server, the /proc race in the 2.6 kernel is a recent example of this type of exploit).

  • Make sure your permissions are correct: You should not allow world write permissions on any files, other than those required (configuration files that are written by the web server, etc). This is very important as it will save you a great deal of trouble in the event that a hacker does gain access to your site, as they won’t be able to overwrite your files. Additionally, you should only allow write access to specific folders where needed.

  • Prevent PHP from parsing scripts in world writtable directories: This will prevent hackers from uploading malicious scripts and running them from your site. This is a great way to tighten security, and is pretty easy to manage, just set “php_admin_flag engine off” for any directory that can be written to by the web server user. This one will save you more hassle than you can ever imagine.

  • Install mod_security: Mod Security is basically a web server firewall, allowing you to block specific types of requests, as well as inspect data as it is sent to the server. My only problem with mod_security is that it isn’t default deny, which means you must identify what is blocked, rather than identifying what is allowed and rejecting everything else. Still it has its uses and you should have it installed in order to react to problems that do not have patches available yet.

We hope that this list of items will help protect your server from hackers and allow you to have some peace of mind that your sites are as secure as you can make them.