The NULL kernel

In this article, we’ll take our previous work getting GRUB2 on a QEMU disk and actually use it to boot code that we’ve written.

Don’t get too excited. Mostly we’re going to focus on getting a working Makefile and coaxing GCC into creating an ELF binary without being Linux specific.

The Code

The kernel we’re going to run is extremely simple and pointless, but it will form the basis for our future experiments and get our codebase kicked off.

void main (void)
{
    while(1) {}
}

Nothing to see here, just a loop to keep the CPU in place.

Compiling

Now, how do we get this to work? Well, if you’re interested, you can compile this as a (pointless) Linux program with the standard gcc main.c -o kernel but that will generate a binary with a lot of stuff in it that we don’t want, and can’t have even if we did. Looking at the output of objdump -d kernel (a tool that will come in handy later) you can see a lot of symbols and sections from glibc stuff. From the end, for example:

...
0000000000400550 <__libc_csu_fini>:
  400550:       f3 c3                   repz retq 
  400552:       90                      nop
  400553:       90                      nop

Disassembly of section .fini:

0000000000400554 <_fini>:
  400554:       48 83 ec 08             sub    $0x8,%rsp
  400558:       48 83 c4 08             add    $0x8,%rsp
  40055c:       c3

These ELF sections are from libc which this binary has been implicitly linked against. These sections allow GCC to insert things like constructors and destructors into your code, let it interact with the operating system to do things like argv and other magic that is 100% irrelevant to our kernel.

No, we need to find some flags to GCC that let us ignore everything else and just compile what’s written in our source. No libraries, nothing. A brief look through the GCC manpage leads us to:

 -ffreestanding
           Assert that compilation takes place in a freestanding environment.
           This implies -fno-builtin. A freestanding environment is one in
           which the standard library may not exist, and program startup may
           not necessarily be at "main". The most obvious example is an OS
           kernel. This is equivalent to -fno-hosted.

 -nostdlib
           Do not use the standard system startup files or libraries when
           linking.  No startup files and only the libraries you specify will
           be passed to the linker, options specifying linkage of the system
           libraries, such as "-static-libgcc" or "-shared-libgcc", will be
           ignored.  The compiler may generate calls to "memcmp", "memset",
           "memcpy" and "memmove".  These entries are usually resolved by
           entries in libc.  These entry points should be supplied through
           some other mechanism when this option is specified.

           One of the standard libraries bypassed by -nostdlib and
           -nodefaultlibs is libgcc.a, a library of internal subroutines
           which GCC uses to overcome shortcomings of particular machines, or
           special needs for some languages.

           In most cases, you need libgcc.a even when you want to avoid other
           standard libraries.  In other words, when you specify -nostdlib or
           -nodefaultlibs you should usually specify -lgcc as well.  This
           ensures that you have no unresolved references to internal GCC
           library subroutines.  (For example, __main, used to ensure C++
           constructors will be called.)

These look like good suspects. -nostdlib is the real workhorse option, stripping the glibc cruft from our binary. -ffreestanding is less important, but it will suppress GCC complaining about our main function being non-standard at least.

jack@sagan:$ gcc -o kernel -nostdlib -ffreestanding main.c
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000400144

We’ll handle that error later. For now, let’s see what code it put out with objdump -d

jack@sagan:$ objdump -d kernel
./kernel:     file format elf64-x86-64


Disassembly of section .text:

0000000000400144 <main>:
  400144:       55                      push   %rbp
  400145:       48 89 e5                mov    %rsp,%rbp
  400148:       eb fe                   jmp    400148 <main+0x4>

Excellent. Much more concise and understandable. main() is just setting up an empty stack frame and looping in place infinitely.

Linking

We have a number of problems with our current ELF output. The first of which, as ld told us above, is that it doesn’t know what the starting address is, so it guessed. The second is that the link address GCC chose is completely arbitrary and isn’t a good default. And the third is that, if you use objdump -D (capital D) to dump all of the sections of the file, we still have two extraneous sections, .eh_frame and .comment that are wasting space.

Both of these problems can be solved with a linker script which will tell the linker, ld

  1. What address the code should be linked at.
  2. What address the code should be loaded at.
  3. What symbol is the entry symbol.
  4. What sections should be kept and which discarded.

Let’s take a look at the linker script:


OUTPUT_FORMAT("elf64-x86-64")
ENTRY(main)
SECTIONS
{
    .text 0xFFFFFFFF80100000 : AT(0x100000)
    {
        *(.text)
    }
    .data :
    {
        *(.data)
    }
    .bss :
    {
        *(.bss)
    }
    /DISCARD/ :
    {
        *(.comment)
        *(.eh_frame)
    }
}

This script keeps the relevant sections (.text, which is code, .data which is inited data, and .bss which is basically un-inited data) by grouping them together. It discards the extra GCC sections (.comment, and .eh_frame) by placing them in the ld special “/DISCARD/” section. It also sets the output format as 64-bit x86 ELF, which is correct for our kernel to be loaded by GRUB, and sets the entry point to main().

Most importantly it sets the link address for code to 0xFFFFFFFF80100000, but load the code to physical memory 0x100000 with the AT directive. If we omit this AT directive, GRUB will attempt to load to 0xFFFFFFFF80100000 physical and unless you’ve got 16 million terabytes of memory in your VM it will complain about being out of memory and subsequently fail.


Why are we linking at 0xFFFFFFFF80100000?

First let’s just note that the 64-bit architecture only supports 48-bit addresses and the top 16 bits are sign-extensions of the 48th bit. There’s a massive hole of unaddressable memory between 0x7FFFFFFFFFFF and 0xFFFF800000000000 because of this sign extension. We take advantage of this hole by using it to separate user (0 – 128 TB) and kernel (16 Exabytes -roughly- and up) addresses. This gives both halves (user and kernel) plenty of space.

However, there is one more wrinkle. When linked together there are things called ‘relocations’ which have to do with pointer math. Consider loading a pointer like int *bar = &foo. Syntactically and logically that is sound, however, as part of optimizing the 99% (non-kernel) usecase, GCC assumes that your code is going to be compiled with addresses between 0 and 2G. The result is that &foo is assumed to be four bytes by GCC, and at link time ld discovers it’s actually eight bytes (a 64 bit address) ld throws an error complaining that this relocation has been truncated (i.e. the top four bytes would be discarded if this program was run).

GCC’s 0 to 2G assumption can be controlled with the -mcmodel flag. By default, it’s set to “small” (code in 0-2G), but there are also “large” (makes no assumption about addresses but generates more inefficient assembly by assuming all pointers and jumps are going to be anywhere in the 64 bit range), “medium” (a compromise between small and large) and, most importantly, “kernel” which was added so that the Linux kernel could have the assembly efficiency of “small” with the desired virtual address separation. The downside is that “kernel” assumes the code is in -2G to MAX addresses or 0xFFFFFFFF80000000+. So, to take advantage of this compromise between address restrictions and assembly efficiency, we link at 0xFFFFFFFF80100000 and specify -mcmodel=kernel on the GCC command line.


To use this linker script, we split the compilation process into two parts. First, the compilation of the C in to object (.o) files. Then the linking of object files into an ELF binary, with the linker script.

jack@sagan:$ gcc -nostdlib -ffreestanding -mcmodel=kernel -c main.c
jack@sagan:$ ld -T linker.ld -o kernel main.o

Which now yields kernel which is a 64-bit ELF file, linked to 0xFFFFFFFF80100000 and ready to be loaded at 0x100000.

Unfortunately, on x86-64 hosts, this also generates an executable that’s positively massive (1 or 2M) compared to the amount of code we have. This is no good because it’s a waste of space and, worse, it pushes the actual sections of our code out of the 8k that GRUB is going to search for a magic header.

On x86-64 we can solve this by giving the -n flag to ld which tells it to not align the program sections at a huge offset.

The following produces a kernel under 1k on x86-64.

jack@sagan:$ ld -T linker.ld -n -o kernel main.o

GRUB Magic

If you tried to load the kernel at this point, GRUB would complain that the binary is missing a signature and you wouldn’t get any farther.

GRUB expects to find a known “Multiboot Header”. You can read the Multiboot Specification which describes what must be embedded into the binary for GRUB to recognize the ELF file as a bootable file in section 3.1.

We’ll be using more of the GRUB features when we want to take advantage of some of the values that it can give us (denoted in flags) but for right now we just want to make GRUB happy so we can load our kernel.

In short, just to boot, we need 3 32-bit values

u32    0x0    magic value (0x1BADB002)
u32    0x4    flags (we'll set to 0x0 for now)
u32    0x8    checksum (added to the previous 2 must = 0)

And, just to make it easy on GRUB, the signature has to show up in the first 8192 (8k) of the binary. Considering that ours is 3 bytes long (without the ELF header) we could place it anywhere, but let’s take advantage of our linker script to place the grub magic immediately after the ELF header.

Specifying the GRUB Signature

Using the above information and some basic information about default types on 64-bit (i.e. that unsigned int is 32-bit) we can easily create a struct to contain the information.

struct grub_signature {
    unsigned int magic;
    unsigned int flags;
    unsigned int checksum;
};

#define GRUB_MAGIC 0x1BADB002
#define GRUB_FLAGS 0x0
#define GRUB_CHECKSUM (-1 * (GRUB_MAGIC + GRUB_FLAGS))

struct grub_signature gs =
    { GRUB_MAGIC, GRUB_FLAGS, GRUB_CHECKSUM };

But now we have to ensure that the signature shows up in the first 8k of the file so GRUB can find it.

Considering the kernel is less than 1k, that’s already done and this kernel will boot. But eventually the kernel will be far larger than 8k, so we can’t rely on it.

The easy way to accomplish this is to split the GRUB signature into a separate file (grub.c) and make sure that that file’s object code (grub.o) is the first file linked into the kernel by making sure it’s the first object argument to ld. However, that seems too fragile since it’s based on the build system that we haven’t even touched yet.

In my opinion, we need to enforce that the GRUB signature is the first thing. To that end, let’s add a new code section to the linker script and tell GCC to put our grub_signature struct gs into it.

First, the modifications to linker.ld:


...
SECTIONS
{
    .grub_sig 0xFFFFFFFF80100000 : AT(0x100000)
    {
        *(.grub_sig)
    }
    .text :
    {
        *(.text)
    }
...

The grub_sig section is now the very first thing in our binary after the ELF header.

Now, let’s use GCC’s __attribute__ directive to put the signature in that section by changing the definition of gs

struct grub_signature gs __attribute__ ((section (".grub_sig"))) =
        { GRUB_MAGIC, GRUB_FLAGS, GRUB_CHECKSUM };

Great. After a recompile, we can look again at the output of objdump -D and make sure that worked:


jack@sagan:$ objdump -D kernel

kernel:     file format elf64-x86-64

Disassembly of section .grub_sig:

ffffffff80100000 <gs>:
ffffffff80100000:       02 b0 ad 1b 00 00       add    0x1bad(%rax),%dh
ffffffff80100006:       00 00                   add    %al,(%rax)
ffffffff80100008:       fe 4f 52                decb   0x52(%rdi)
ffffffff8010000b:       e4                      .byte 0xe4

Disassembly of section .text:

ffffffff8010000c <main>:
ffffffff8010000c:       55                      push   %rbp
ffffffff8010000d:       48 89 e5                mov    %rsp,%rbp
ffffffff80100010:       eb fe                   jmp    ffffffff80100010 <main+0x4>

Looks correct, the .grub_sig section is ahead of .text as the first thing in the binary after the ELF header.

Booting

Now, all that’s left is to give it a try. Copy your kernel onto the first partition of your disk (instructions on mounting from the disk image here).

jack@sagan:$ sudo mount loop0 /mnt/os_boot
jack@sagan:$ sudo cp kernel /mnt/os_boot/
jack@sagan:$ sync

After the sync completes (should be momentarily unless you’ve got a bunch of other IO going), you can then fire up QEMU.

jack@sagan:$ qemu -hda disk.img -m 1024

Which will quickly drop you at the GRUB prompt.

grub> multiboot (hd0,msdos1)/kernel
grub> boot

And if no errors are printed, the kernel is running.

Double Checking

I wouldn’t be much of a hacker if I thought that no output and no confirmation means everything is okay. Let’s check and make sure that everything looks good.

If this was a real machine, we’d be in a hurry to get output to the screen, or flashing LEDs, or we’d be breaking out hardware debuggers to analyze the chip state in the worst case. Fortunately, using QEMU, you can use GDB on your kernel like any other piece of software. We’ll get into more detail later, but for now let’s just see if the machine is looping.

First, make sure you have GDB installed. QEMU won’t complain if you don’t.

Second, (re)start QEMU with the -s option that tells QEMU to start a gdbserver for your system on TCP port 1234. If you wanted to use breakpoints or walk through GRUB you could also pass it -S which will keep the CPU from starting until you’ve engaged GDB and issued a ‘continue’.

jack@sagan:$ qemu -hda disk.img -m 1024 -s

Now simply fire up GDB from another terminal and give it a remote target:

jack@sagan:~ $ gdb
...
(gdb) target remote tcp::1234
Remote debugging using tcp::1234
0x00008376 in ?? ()
(gdb) c
Continuing.

[ Booted with GRUB to get into our code ]

^C
Program received signal SIGINT, Interrupt.
0x00100010 in ?? ()
(gdb) info registers
eax            0x2badb001       732803073
ecx            0x0      0
edx            0x0      0
ebx            0x10000  65536
esp            0x7fefc  0x7fefc
ebp            0x7fefc  0x7fefc
esi            0x0      0
edi            0x0      0
eip            0x100010 0x100010
eflags         0x200002 [ ID ]
cs             0x10     16
ss             0x18     24
ds             0x18     24
es             0x18     24
fs             0x18     24
gs             0x18     24

This output confirms that we’ve booted our kernel. The dead give away is that the current instruction (listed when I ^C but also in the eip register) matches the load address of our jmp instruction.

Packaging it Up

It’s extremely tedious to have to hand compile this over and over again. I’ve included my source in my git repo with main.c,linker.ld, and a Makefile.

You can browse the ‘the-null-kernel’ tag here.

Alternatively you can clone the git repo with the files and history:

jack@sagan:$ git clone http://codezen.org/src/viridis.git

2 thoughts on “The NULL kernel

  1. Thank you for your contribu─▒tion, this really simplifies the simple operating system development! It is much more sensible to use GRUB, and I also made a simple grub.cfg, leaving it for future newbies like me:

    menuentry “My Operating System” {
    multiboot (hd0,msdos1)/kernel
    boot
    }

    Do not forget to change hd0,msd0s1 to wherever your kernel is. Write it to a file named grub.cfg under grub directory and voila!

Leave a Reply

Your email address will not be published. Required fields are marked *