Building a basic X86 kernel part 5 - The bootloader

When a PC is powered on, the first thing that happens is the Power-On Self Test (POST). This is carried out by the BIOS (Basic Input/Output System) and during this process, system memory (RAM) is identified, other vital components are validated and the integrity of the BIOS code itself is checked. If these checks are succesful, the BIOS code is executed and hardware is initialized.

Once hardware is initialized, it will check all the available storage devices to see which one is “bootable”. A storage device is bootable if the last two bytes of the first sector have the values 0x55 and 0xAA (the MBR boot signature). If it finds one, it loads the first sector (called the Master Boot Record) of that storage device to memory address 0x7C00, sets the instruction pointer of the CPU to that address and lets it execute that code. The code is called the bootloader and is the first thing we need to write to load our actual kernel.

Apart from hardware initialization, the BIOS also provides an environment to interact with system devices during the boot stage. Each device has a “service” available that can be used to control it or check its status. Each service has its own interrupt number and that number must be used in conjunction with the “int” instruction (like we did with “int 0x80” for our system call previously) to make it do something. Registers are again used for parameters to these operations.

For a complete list of the routines as offered by the BIOS, Ralf Brown’s Interrupt List is a well known reference for interrupts.

At startup, and thus in our bootloader stage, we are working in real mode (see part 2).

Testing and debugging

The sections below will be about code, but let’s see how we can test and debug our code first, so you can experiment with it. In the previous sections we already learned that we use NASM to create our flat binary and that we can hook up gdb to debug and inspect our code, but how does all this wire up when debugging your bootloader or kernel?

We use QEMU as our emulator to execute our bootloader. An emulator, like a real PC, needs something to boot from, so we need to tell it what device to boot from. Let’s create a virtual floppy!

dd if=/dev/zero of=disk.img bs=512 count=2880

This creates a 1.44MB (512 * 2880) disk.img with all zeroes. We can use this file to write our bootloader to (and our kernel later) and tell QEMU to boot from this disk. You will create the bootloader.o itself further down this chapter, but to write it to the virtual disk we just created, use this:

dd conv=notrunc if=bootloader.o of=disk.img bs=512 count=1 seek=0

This copies the raw bytes of bootloader.o to the first (seek=0, count=1) sector of 512 (bs) bytes and the ‘notrunc’ flag means it won’t truncate the disk.img file so it’ll stay 1.44MB in size, filled up with zeroes.

Once you have the disk.img prepared, you can run QEMU using this command:

qemu-system-i386 -machine q35 -fda disk.img -gdb tcp::26000 -S

This will launch QEMU, which emulates the Q35 chipset, using disk.img as its first floppy drive and allowing gdb to connect to it on port 26000.

To do so, from a second console, launch gdb and type the following:

(gdb) target remote localhost:26000

This will connect gdb remotely to your QEMU instance so you can step through your code. Gdb offers layouts and two of them come in handy while debugging: asm and reg, which show assembly code and register values respectively. You can enable them using these commands:

(gdb) layout asm
(gdb) layout reg

As you probably remember, the bootloader starts at 0x7c00s, so to start debugging, you can set a breakpoint at that address and then run the program as follows:

(gdb) b *0x7c00
(gdb) c

In your assembly window you should see the first line of your program and using “ni” you can step through each instruction.

A minimal bootloader

Below you’ll find the code for a minimal bootloader that displays some relevant wiring for later. I would recommend playing around with this code yourself to get familiar with it. For instance, see the difference between using ORG and not using ORG, using BITS 16 and not using BITS 16. Compile and hexdump the results to see the difference in generated code.

; bootloader.asm

org 0x7c00
bits 16

cli

hlt
	
times 510 - ($-$$) db 0
dw 0xAA55

Basically, this bootloader does nothing but halt the system, but it shows some interesting bits that we will dive into a bit deeper.

org 0x7c00

When creating an ELF binary, code becomes position independent and relocatable, meaning it can be loaded into memory at any location and the linker will take care of the relative addressing. For a flat binary, required for a bootloader, this is not the case. By default, NASM uses 0x0 as its memory origin and starts the relativity from there; all addresses are therefore fixed. Because our bootloader gets loaded into memory at address 0x7c00 by the BIOS, we need to be explicit about the memory origin so the memory locations we declare are relative to this origin instead of 0x0. This is done using the “org 0x7c00” instruction.

Without declaring any data locations, you won’t notice any difference in addressing, so when playing around with this instruction, make sure you declare a data location and reference it somewhere.

bits 16

This line forces NASM into 16 bit mode, which is actually the default for flat binaries, but it is good to be explicit about it and leave no room for assumption. We can still access 32 bit registers in this mode, but the operand size override prefix 0x66 will be added to those instructions.

Again, when compiling a flat binary and only using 16 bit registers, you won’t notice any difference in binary output.

cli

The “cli” instruction clears the interrupt flag (i.e. sets it to 0), meaning hardware interrupts will be ignored in our bootloader.

hlt

This line now halts the system and you would replace this line with more interesting code in a real bootloader.

times 510 - ($-$$) db 0
dw 0xAA55

As said before, our bootloader gets loaded as the first sector of 512 bytes into memory and as such, our code should be padded to be exactly 512 bytes in size. This is achieved by the “times” instruction, which repeats the zero byte passed as the operand 510 times minus the current location pointer ($) and minus the segment start ($$, which in flat binary mode is the application start).

It’s 510 instead of 512 because the very last line sets the two byte boot signature 0xAA55 (remember that this is little endian, so it becomes 0X55 and 0XAA, which is the MBR boot signature).

Let’s print something

The fact our minimal bootloader works is hard to prove, because we see nothing happening. Let’s spice it up a little bit and print a welcome message using a BIOS subroutine to print a character:

org 0x7c00
bits 16

cli
cld

mov si, msg
mov ah, 0x0e

print:
 	lodsb

	or al, al
	jz halt
	int 0x10
	jmp print

halt:
	hlt

msg db "Welcome to the bootloader!", 0xa, 0xd, 0x0

times 510 - ($-$$) db 0
dw 0xAA55

Let’s only focus on the changes we made now:

cld

Alongside “cli”, we now also added “cld” which clears the direction flag. This means that any string operations we do will increment SI (as opposed to decrement when the direction flag is 1).

mov si, msg
mov ah, 0x0e

This prepares two registers that are used in the loop later on. It makes SI point to the beginning of our message and selects the “write character” function for int 0x10 by placing 0x0e into AH.

print:
 	lodsb

	or al, al
	jz halt
	int 0x10
	jmp print

This is the actual loop that prints out all the characters in msg. “lodsb” moves the value of SI into AL and then increments SI to make it point at the next character of the string. The OR will yield 0 if the input is 0. In other words, if AL currently contains the null terminating character 0x0 (the end of the string), it will jump to “halt” because the ZERO flag gets set. Otherwise, it calls interrupt 10h and makes another iteration in the loop.

halt:
	hlt

This is the location we jump to if we hit the null terminating character in our loop. It simply halts the system.

msg db "Welcome to the bootloader!", 0xa, 0xd, 0x0

This declares the welcome message that we will print later on. The carriage return, linefeed and null terminating character are added to the string (0xa, 0xd and 0x0).

You may be wondering why we don’t declare a .data section here (or a .text for that matter). This again has to do with the fact that we’re creating a flat binary, in which these concepts don’t really exist. In fact, if you’d declare “msg” somewhere halfway through your executable code, it will simply get executed (as in, the characters in the string will be treated as instructions). To avoid this, it is best practice to declare data after your code (but before the padding). There are ways around this, but this is a simple and effective approach.

Now, it’s finally time for our actual goal: the kernel!