Page Allocation and Basic VGA

In this article, we’ll build on our previous work to tackle one of the core duties of a kernel: memory management.

Concepts

Page Allocation

In our last article, we jumped through all of the hoops to get our kernel into a 64 bit C environment. Now we can start looking toward writing drivers, and the most important driver we have to get started on is the one for the device that we’ve already started to use. Memory.

Last time we had to wrangle the concept of paging just enough to get us into the kernel, and most of the basic stuff we did, I’ve also implemented in C.

But just handling paging doesn’t mean we’re done with memory. Now, we need to be able to write systems that allocate and free memory without caring about the underlying mechanisms and, most importantly, without giving out memory that’s reserved or already used.

That’s where the page allocator comes in. This allocator is the only allocator that’s aware of physical memory directly and has the coarse grain control given to us by the hardware. The page allocator will know what sections of memory are reserved, what sections are already used, and what sections can still be handed out.

This is not the general purpose smaller allocator (which would implement kmalloc) but that system will be one of the primary users of the page allocator eventually.

The Buddy Allocator

The main concept of this article is the “buddy allocator” which is a specific type of allocator designed to allow us to track large amounts of memory with comparatively little overhead. This is the approach used by the Linux kernel and, similar to Viridis, can be found in mm/page_alloc.c of the Linux source.

The idea of a buddy allocator is that you have multiple linked lists, each one with a number of blocks of memory that can be easily searched. Each list only contains blocks of memory of a certain size. In Linux and Viridis, the top size is 8M and, since it’s the smallest allocated unit, 4k is the bottom size, which means that we have 12 linked lists

 /*  0 - 4k
 *  1 - 8k
 *  2 - 16k
 *  3 - 32k
 *  4 - 64k
 *  5 - 128k
 *  6 - 256k
 *  7 - 512k
 *  8 - 1M
 *  9 - 2M
 * 10 - 4M
 * 11 - 8M
 */

#define MAX_ORDER 11

struct page_block *free_pages[MAX_ORDER + 1] = { 0 };
struct page_block *used_pages[MAX_ORDER + 1] = { 0 };
struct page_block *unused_page_blocks = NULL;

Of a struct that looks like

struct page_block {
    u64 address;
    struct page_block *next;
};

An 8M top block size might not seem like a lot, but it’s bigger than most allocations are going to be and a 1 GB block of memory only divides into 128 8M blocks, so it’s a good compromise between usefulness and resource usage.

When we setup the buddy allocator, we divide the given memory into these chunks, but for simplicity’s sake, let’s say we’re booting in an embedded environment and 8MB is actually our total amount of memory. Easy, that’s represented by one big 8 MB chunk, with an address of 0 and no next. All of the other lists are empty.

Then, someone comes along and uses the page allocator to grab a single 4k page.

That 8M block is then divided up. 8MB gets split into 4M 4M. Still can’t get 4k out of it, so one of those 4M chunks gets split, and so on until that 8MB is split into 4M, 2M, 1M, 512k, 256k, 64k, 32k, 16k, 8k, 4k, 4k and one of those 4k chunks is moved to the allocated list of the right size (used_pages[0]), before getting handed off to the requester.

Now, the cool part. Those two 4k blocks we created are “buddies” because they’re contiguous and their addresses are going to differ by exactly one bit, the bit that represents the size of the block. These two blocks are 4k in size, 4k in bytes is 0x1000, and that single bit in 0x1000 is the bit that will be different between those two. 0x2000 and 0x3000 are buddies (they only differ by the 0x1000 bit). 0x3000 and 0x4000 are not buddies even though they’re contiguous because their addresses differ by more than one bit (bits 0x4000, 0x2000, and 0x1000 are different).

This might seem a bit confusing, but it’s actually extremely easy to calculate an address’s buddy of a certain size using XOR.

4k Buddies
0x2000 ^ 0x1000 = 0x3000
0x3000 ^ 0x1000 = 0x2000

Not 4k Buddies
0x3000 ^ 0x1000 = 0x2000
0x4000 ^ 0x1000 = 0x5000

The reason that this is useful is that when we free the pages, we know the address, and we can find the size (since we have to find the freed address in the used lists) so we can calculate the allocation’s buddy and if it’s also free, we can then merge those two blocks into one bigger block.

So if our lists look like 4M, 2M, 1M, 512k, 256k, 64k, 32k, 16k, 8k, 4k and that 4k page’s buddy gets freed, it will merge back into an 8k block, whose buddy is also free so it merges into a 16k block whose buddy is free… and so on back until you have your pristine 8M block back.

Advantages of the Buddy Allocator

This system has a lot of advantages, especially if you plan on having heavy usage.

  1. It’s really fast to search for memory to allocate. Finding a block of free memory (or realizing you don’t have enough memory to satisfy the request) takes at most MAX_ORDER comparisons so it runs in constant time no matter how much or little memory you’re tracking whereas a simple linked list or a bitmap would require an exhaustive linear search to determine this. Don’t underestimate this feature, as eventually we may be tracking and fragmenting terabytes of memory and allocating in linear time instead of constant time would add significant time overhead, especially in the failure case.
  2. It’s a great compromise on memory usage. A bitmap would be the most memory efficient fully functional allocator. For 1GB of memory, with a bitmap bit-per-page approach, you’d have a constant 262,144 bits, or 32,768 bytes, or 32k worth of overhead per gigabyte. The buddy allocator, with the absolute very worst case fragmentation (i.e. all of memory allocated in 4k chunks) you’d have 4MB worth of overhead per gigabyte (128 times as much overhead). However, best case, 1GB totally unallocated, you have only 2k worth of overhead (128x 8MB chunks, 16 bytes of overhead apiece) or 1/16th the overhead of the bitmap. In short, with real use cases (in which will not approach maximum fragmentation unless we’re stupid about it) we get roughly comparable memory usage with a much faster data structure. A simple singly- linked list would have similar memory usage (or even slightly better if you didn’t restrict block sizes to be convenient powers of 2).
  3. It’s dead simple. This advantage can’t be overstated in a fundamental kernel data structure. Any CS 101 student is familiar with how to manipulate singly-linked lists. The data structure used to represent free memory is the same one used to represent used memory (a failing of a bitmap). There’s no awkward bit masking and manipulation or weird SSE optimizations as their would be for a bitmap, and the buddy system means the logic to merge blocks is exceedingly simple – checking for the existence of a single known address at a single known block size, rather than having to search for essentially any adjacent free block.

Quirks of My Implementation

Now I’ve made some choices that compromise some of the above, but they’re not inherent to the algorithm. Crucially, both of these quirks could be re-visited at a future time and changed without altering the base algorithm we use.

  1. I sort the lists. While true constant time allocation is possible with this algorithm, I’ve decided to keep the blocks sorted by address. This turns the split / re-sort operation into a linear operation because you have to search for the correct place to patch in the block, instead of just prepending it. Strictly this makes allocation linear as well, but in practice it’s linear over a much smaller subset of blocks and it’s still constant time when there are already blocks of the correct size. On the other hand sorting makes searching take less time, so freeing pages and searching for blocks to merge gets a small boost. The reason I sort, however, is that it makes it easy to test and examine the data in a debugger to verify the allocator’s memory map looks right.
  2. I assume worst case fragmentation. Obviously, the kernel needs this data structure to be positively bulletproof and to make that true, while retaining simplicity, I’ve made the setup code reserve enough memory to have a page block struct for every single page in the system. That means we’re always hitting the worst case of 4M of overhead per 1G of memory and thus using 0.3% of total memory, but it also means we don’t need to complicate the page allocator by making it call itself either.

Virtual Address Allocator

The buddy allocator isn’t the only one that the kernel needs even though it forms the backbone of the page allocator.

Remember from our previous work that in paging there is a difference between the physical memory address and the virtual address to use it when paging is enabled. The page allocator controls how physical memory is handed out, but we still need to allocate that virtual address so we don’t overwrite someone else’s mapping.

The good news is that we already know the size of the virtual address space we’re dividing up. It’s architecture specific, but constant (in our case we’ll just assume it’s a full 64 bits) so we don’t really need to care about what is “free”, only what we have already allocated. We also get a totally free virtual address space for every context (process / thread) we create, so we don’t have to worry too much about performance (the number of virtual allocation ranges is likely to be small per context) but obviously we can’t waste too much memory if we’re potentially going to have thousands of copies of it.

The bad news is that we’re going to want to know a lot more about these address allocations. We’re going to want to know not just address and size, but also what is expected to be there if the kernel loaded it (like what file data is there), or which driver requested the mapping. This is not only useful for debugging purposes, but later when we look at something like having swap memory, or implementing an mmap() style syscall.

Fortunately, for now, we can afford to effectively stub this out. Our implementation is a simple linked list that will merge adjacent allocations, but it doesn’t do anything smart with the name and, like our page allocator, it assumes a constant amount of pre-allocate overhead – in this case one page which will probably be enough until we start forking processes and threads at which point we’ll need to revamp the virtual address allocator to be smarter anyway.

Early Page Allocator Implementation

The code for the page allocator is browseable in git. I won’t cover the entirety of it (it’s almost 1000 lines long with comments), but I will highlight a few of the pieces.

Also note that I wrote some very basic mapping functions that essentially implement what we did in assembly to get to long mode, but in C. That is, take a physical address and a virtual address, go through the page tables, fill it in, and reset CR3. I don’t think these need to be directly covered except to say that the “early” variants expect to be pointed at the beginning of some amount of blank pages and update that pointer as necessary, where the standard variants will call out to the page allocator to get pages when necessary.

The Chicken and the Egg

The trickiest part of initializing the page allocator is getting it to account not only for the memory that’s present in the system, but for everything you’ve already done, as well as its own overhead, before you already have the allocator setup.

There are three sources for this information.

For the system memory, we get a pretty good memory map from GRUB and that was passed to main() in EDI.

For the kernel memory, we still have the linker trick (kernel_size) to know the size of the binary, but we also passed ESI from head.asm which contained the computed end of the structures we “allocated” after the binary just to get into long mode.

The final piece is how much overhead the page allocator needs to allow for itself, which is a function of how much system memory there is. As I mentioned in the concepts section, I’ve chosen to just prepare for maximum fragmentation to make the page allocator code itself bulletproof (i.e. the page allocator may tell you it has no memory left, but it will never run out of memory itself).

First we’ll start with figuring out exactly how much memory there is, which will give us a size for how much memory we need to reserve to setup the page allocator.

    max_phys_address = 0;
    for (i = 0; i < info->mmap_length; i += cur->size + 4) {
        cur = (struct grub_mmap *) (((long) mmap) + i);
        end = cur->base + cur->length;

        if (end > max_phys_address)
            max_phys_address = end;

        total_mem += cur->length;
    }

    /* max_page_blocks describes how many page_block structs we'd have if the
     * entirety of physical memory was allocated in 4k chunks (maximum
     * fragmentation).
     */

    max_pages = max_phys_address >> PAGE_SHIFT;

    /* page_block_pages describes how many pages we need to reserve if we are
     * going to store max_page_blocks.
     */

    page_block_pages = ((max_pages * sizeof(struct page_block)) >> PAGE_SHIFT) + 1;

You can see here that I’m traversing the GRUB provided memory map to find both the total amount of memory, as well as the largest address. We use the largest address instead of total memory to compute the maximum overhead because we will have to mark reserved or non-existent chunks of memory as “allocated” even if they can’t be handed out.

After we know exactly how many pages we’ll need for the allocator in the worst case, we traverse the GRUB memory map again to find a piece of usable memory that’s big enough to hold all of those pages. We have to be careful to avoid allocating space that our kernel is using already, as well as accounting for how many pages it will take to actually map the pages the page allocator needs (sheesh!).

    alloc_address = MAXLONG;
    stack_size = S_PAGES << PAGE_SHIFT;

    for (i = 0; i < info->mmap_length; i += cur->size + 4) {
        cur = (struct grub_mmap *) (((long) mmap) + i);

        /* We don't care about reserved memory */

        if (cur->type != 1)
            continue;

        /* If this is the block our binary is in, then adjust its
         * base and length to avoid the binary and the page structures
         * we allocated immediately afterward.
         */

        /* Assume that GRUB put us in a contiguous block of memory
         * that started before or at KERNEL_START and ended after
         * our page structures. If this isn't true then we shouldn't
         * have gotten this far.
         */

        if (cur->base <= KERNEL_START && ((cur->base + cur->length) >= end_structures)) {
            base = end_structures;
            length = cur->length - (end_structures - KERNEL_START);
        }
        else if (cur->base <= STACK_PAGES_PHYS &&
                ((cur->base + cur->length) >= (STACK_PAGES_PHYS + stack_size))) {
            base = STACK_PAGES_PHYS + stack_size;
            length = cur->length - stack_size;
        }
        else {
            base = cur->base;
            length = cur->length;
        }

        /* Round to page size, adjust length accordingly */
        length -= PAGE_ALIGN_OFF(base);
        base = PAGE_ALIGN(base);

        /* If we allocated at this address, find out how many page structure
         * pages we'd need to add to get it all allocated*/

        overhead = __overhead_pages(base, page_block_pages);
        alloc_size = ((page_block_pages + overhead) * PAGE_SIZE);

        /* Too short, skip to the next block*/
        if (length < alloc_size)
            continue;

        /* We're page-aligned, non clobbering and large enough, done! */
        alloc_address = base;
        break;
    }

    /* We couldn't find a large enough block, we're fucked. */

    if(alloc_address == MAXLONG)
        return -1;

    /* big_alloc is now a physical pointer to a contiguous piece of
     * memory that can hold enough page_block structs to control all of our
     * pages at maximum fragmentation, as well as the page structures needed to
     * map them
     *
     * We also know we need `overhead` pages of structures (at most).
     */

Pretty straightforward. One note is that the __overhead_pages function on line 44 accounts for exactly how many pages of page table structures we’d need to map a number of pages at a certain address based on how many page structure boundaries. Mapping 2 pages, for example, could take between 3 and 6 overhead pages for the tables depending on what address those two pages are mapped at, assuming that the page table is completely empty except for the PML4 page we have to already have.

So at the end of this bit of code, we either screwed up and returned -1, or we have an address that should be able to fit all of our page allocator overhead into it.

Then we move on, we point free_pages_address at our memory block, which will begin with the page overhead. unused_page_blocks we point to the memory after the overhead, the memory that we’ll actually be using and __init_unused_page_blocks then initializes that variable to be one giant singly linked list of page block structures that the page allocator can use.

    unused_page_blocks = (struct page_block *) (alloc_address + (overhead << PAGE_SHIFT));

    free_pages_address = alloc_address;

    for (i = 0; i < page_block_pages; i++) {
        physical = (u64) unused_page_blocks + (i * PAGE_SIZE);
        virtual = (VIRT_BASE | physical);
        map_page_early(virtual, physical, &free_pages_address);
        __clear_page((u64 *) virtual);
    }

    /* Convert to a virtual pointer */
    unused_page_blocks = (struct page_block *) (VIRT_BASE | ((long) unused_page_blocks));

    __init_unused_page_blocks(max_pages);

#ifdef TEST_PAGE_ALLOC
    test_page_alloc();
#endif

Since the page allocator only needs the unused_page_blocks populated to function (in addition to the builtin variables of course), we can now tell the page allocator what our memory map looks like.

    /* Now put that reserved memory to use */
    __init_free_region(0, max_pages);

    /* Now iterate once more on the memmap and alloc all the unavailable pages */

    for (i = 0; i < info->mmap_length; i += cur->size + 4) {
        cur = (struct grub_mmap *) (((long) mmap) + i);

        if (cur->type == 1)
            continue;

        /* Partial pages are useless to us, so -expand- bad sections by
         * page aligning base and rounding the length up to a multiple
         * of page size
         */

        base = PAGE_ALIGN(cur->base);
        length = cur->length + PAGE_ALIGN_OFF(cur->base);
        if(!PAGE_ALIGNED(length))
            length = PAGE_ALIGN(length) + PAGE_SIZE;

        /* Mark the physical addresses as allocated */

        __reserve_region(base, length >> PAGE_SHIFT);
    }

    /* Finally, mark our own pages as allocated */

    kernel_size = (end_structures - KERNEL_START);
    stack_size = (S_PAGES << PAGE_SHIFT);

    __reserve_region((u64) info_phys, 1);
    __reserve_region(info->mmap_addr, 1);
    __reserve_region(STACK_PAGES_PHYS, S_PAGES);
    __reserve_region(KERNEL_START, kernel_size >> PAGE_SHIFT);
    __reserve_region(alloc_address, alloc_size >> PAGE_SHIFT);

We start by assuming all of system memory is free from 0 to max_pages. Then, we iterate over the GRUB memory map for a third and final time, reserving each of the unusable sections. Then we reserve the structures GRUB gave to us, our stack, and the entire space of memory that we used for the kernel including the early paging structures (up to end_structures) and the memory we just set aside for the page allocator itself.

I won’t spend too much time quoting code, but the two functions that are integral here follow.

Page Allocator Init Functions

/* __init_free_region will place a number of __page_block structs on free_pages
 * without doing any form of checking, and without removing from used_pages.
 */

static void __init_free_region(u64 address, u64 pages)
{
    struct page_block *new;
    u32 order, size;

    for(order = MAX_ORDER; pages && order >= 0; order--) {

        // size in pages
        size = 1 << order;

        while(pages >= size) {
            new = __pop_unused_page_block();

            new->address = address;
            new->next = NULL;
            __page_list_add(&free_pages[order], new);

            address += (size << PAGE_SHIFT);
            pages -= size;
        }
    }
}

Here you can see that we’re using the unused block linked list like a stack where we just pop one off of the front when we need it and push one when we discard (elsewhere on free). Note that this is not how we actually free, this does no checking or merging and is only used during init when it’s assumed that you’re not relying on these features.

void __reserve_region(u64 address, u64 pages)
{
    struct page_block *cur = NULL;
    struct page_block *prev;
    u64 last_pages;

    if(PAGE_ALIGN_OFF(address))
        pages++;

    address = PAGE_ALIGN(address);

    int order, size_pages, size, num_free;

    while (pages > 0) {
        last_pages = pages;
        for(order = MAX_ORDER; order >= 0; order--) {
            size_pages = 1 << order;
            size = size_pages << PAGE_SHIFT;

            /* First, find the free block that contains address. */

            prev = NULL;
            for (cur = free_pages[order]; cur; prev = cur, cur = cur->next) {
                if (cur->address <= address && (cur->address + size) > address)
                    break;
                if (cur->address > address) {
                    cur = NULL;
                    break;
                }
            }

            /* Didn't find relevant chunk */
            if(!cur)
                continue;

            /* Okay, we've found the block, let's break it up */

            /* First, patch it out of the current order list */

            __setprev(&free_pages[order], prev, cur->next);
            cur->next = NULL;

            /* If our address is somewhere inside the block, free
             * all of the memory before it and shrink sizes.
             */

            if (address != cur->address) {
                num_free = (address - cur->address) >> PAGE_SHIFT;
                __init_free_region(cur->address, num_free);

                size_pages -= num_free;
                size = (size_pages << PAGE_SHIFT);
            }

            /* If the remaining block is bigger or equal to what we're freeing,
             * then just discard this page_block and start again with the
             * remaining free block */

            if ( pages >= size_pages) {
                __push_unused_page_block(cur);
                pages -= size_pages;
                address += size;
                break;
            }

            /* Otherwise, we know we have pages at the end to free as well */

            __init_free_region(address + (pages << PAGE_SHIFT), size_pages - pages);

            /* Finally, discard cur */

            __push_unused_page_block(cur);
            return;
        }

        /* If we searched everywhere for this address but didn't find it,
         * then go ahead and advance. This can happen when you have overlapping
         * reserves. For example, trying to reserve GRUB info that's already
         * been reserved in the mmap.
         */

        if (last_pages == pages) {
            pages--;
            address += PAGE_SIZE;
        }
    }
}

This provides a better example of how the order lists work. We search each free block list from 8M down to 4k and, when we find a block that contains the address we’re trying to reserve, we split the block and then advance to work on a smaller region.

Page Allocator Use Functions

At this stage we have all of the system physical memory tracked, which is the hard part, but here is the text for the actual core allocate and free functions. Note that these are still entirely based on physical addresses.

void __free_phys(struct page_block *phys, u32 order)
{
	struct page_block *cur, *prev;
	u64 buddy_address;
	u32 merge;

    phys->next = NULL;

	if (free_pages[order] == NULL) {
		free_pages[order] = phys;
		return;
	}

	prev = NULL;

	/* For orders under MAX_ORDER, attempt to find buddies, and merge them as
	 * up as far as possible.*/

	for (order = order; order < MAX_ORDER; order++) {
		buddy_address = (phys->address ^ (1 << (order + PAGE_SHIFT)));
		merge = 0;
		for (cur = free_pages[order]; cur; prev = cur, cur = cur->next) {

			if (cur->address > phys->address) {

				/* Found a buddy that's bigger than us, so we have the valid
				 * address, patch cur out of the current order, break to try
				 * and "free" phys on the next higher order */

				if (cur->address == buddy_address) {
					__setprev(&free_pages[order], prev, cur->next);
					cur->next = NULL;

					__push_unused_page_block(cur);
					merge = 1;
					break;

				/* Not a buddy, just patch phys into the order. */
				} else {
					__setprev(&free_pages[order], prev, phys);
					phys->next = cur;
					return;
				}
			}

			/* If address is smaller, we only care if it's a buddy, and
			 * if it is, it has the valid address*/

			else if(cur->address == buddy_address) {
				__setprev(&free_pages[order], prev, cur->next);
				phys->address = cur->address;
				__push_unused_page_block(cur);
				merge = 1;
				break;
			}
		}

		/* If merge is set, we need to advance to the next highest order
		 * to attempt to integrate phys into it.
		 */
		if(merge)
			continue;

		/* If merge is unset, it means that we didn't find a buddy, but we
		 * made it to the end of the order list without finding a greater
		 * page block address, so just tack it on.
		 */
                if(cur)
		    cur->next = phys;
                else
                    free_pages[order] = phys;
		return;
	}

	/* If we get here, it means that we're at MAX_ORDER, no merging
	 * is possible, so just insert it with __page_list_add
	 */

	__page_list_add(&free_pages[order], phys);
}

/* Get a physical address of a block of (1 << order) pages
 * 
 * returns -E2BIG if order was insane
 * returns -ENOMEM if a block that size wasn't available
 */

u64 page_alloc_phys(u32 order)
{
	struct page_block *new;
	u64 ret, end, left_over_pages, pages;
	int i,j;

	if (order > MAX_ORDER)
		return 0;

	for (i = order; i <= MAX_ORDER; i++) {
		if (free_pages[i] == NULL)
			continue;

		new = free_pages[i];
		ret = new->address;
		free_pages[i] = new->next;

		__page_list_add(&used_pages[order], new);

		/* If it's just the right size, we're done */

		if (i == order)
			return ret;

		/* Otherwise, we need to split the block up */

		/* Example, 1M order, being fulfilled by an 8M block The 8M block needs
		 * to be broken in 1M, 1M, 2M, 4M blocks.
		 */

		left_over_pages = (1 << i) - (1 << order);
		end = ret + (1 << (i + PAGE_SHIFT));

		for(j = i - 1; j >= 0; j--) {
			pages = (1 << j);
			while (left_over_pages >= pages) {
				new = __pop_unused_page_block();
				new->address = end - (pages << PAGE_SHIFT);
				end = new->address;
				__free_phys(new, j);

				left_over_pages -= pages;
			}
		}

		return ret;
	}

	return 0;
}

/* Counterpart to page_alloc_phys, unlike __free_phys, it will search for its
 * record in used_pages first */

u64 page_alloc_free_phys(u64 phys)
{
	struct page_block *cur, *prev;
	int i;

	prev = NULL;
	for (i = 0; i < MAX_ORDER; i++) {
		for(cur = used_pages[i]; cur; prev = cur, cur = cur->next) {
			if (cur->address == phys) {
				__setprev(&used_pages[i], prev, cur->next);
				__free_phys(cur, i);
				return 0;
			} else if (cur->address > phys)
				break;
		}
	}

	return -EINVAL;
}

Simple Virtual Address Allocator

As I mentioned in the concepts section, the virtual address allocator is a much simpler affair, but it’s required for the page allocator to really be useful. We want consumer functions to be able to do page_alloc(order) and get back a valid pointer of the right size without having to manually specify a virtual address.

NOTE: this allocator is extremely basic and at this point is only used to properly mark parts of the virtual address space as taken. The name can be ignored and discarded, and there is a restrictive limit on the number of virtual blocks that can be used. These caveats are only acceptable in the current state where virtual address allocation is just a hurdle for the page allocator but later this code will have to be improved.

struct virtual_block {
    u64 address;
    u64 size;
    char *name;
    struct virtual_block *next;
};

static struct virtual_block *virtual_alloc = NULL;
static struct virtual_block *spare_blocks = NULL;
static int alloc_order = 0;
u32 used_vblocks = 0;

Similar to the page block structure, this is a singly linked list structure.

void vsalloc_init(void)
{
    /* Grab a page to work with */

    virtual_alloc = page_alloc_to_virtual(VSALLOC_HEAP_VIRTUAL, alloc_order);

#ifdef TEST_VSALLOC
    test_alloc_vspace();
#endif

    __alloc_vspace(&kernel_virtual_space, "vsalloc", VSALLOC_HEAP_VIRTUAL, (1 << (alloc_order + PAGE_SHIFT)));
}

The initialization begins with getting a page from the early allocator through a helper function (essentially just page_alloc_phys + mapping to the given address). At the moment, this 4k chunk of space is assumed to be enough to handle all of our virtual blocks and the pop function will hang if we go beyond that to indicate that that needs to be implemented (a problem that was sidestepped in the page allocator by pre-allocating everything).

NOTE: The only space that we pre-map into the allocator is what we’ve allocated from the VSALLOC_HEAP_VIRTUAL ourselves. This is because when we eventually use vsalloc we give a hint address that functions as a minimum value. We probably should reserve a massive chunk for the kernel just to be safe, but at this point that’s unnecessary.

The interesting part of the virtual address allocator is the following

void __alloc_vspace(struct virtual_block **head, char *name, u64 virtual, u64 size)
{
	struct virtual_block *cur, *prev, *new;
	u64 end, cur_end;

	size = PAGE_ALIGN_UP(size + PAGE_ALIGN_OFF(virtual));
	virtual = PAGE_ALIGN(virtual);

	end = virtual + size;

	prev = NULL;
	for (cur = *head; cur; prev = cur, cur = cur->next) {
		cur_end = (cur->address + cur->size);

		/* If our virtual is in this block or adjacent, expand it */

		if (cur->address <= virtual) {
			/* cur is adjacent left */

			/* Start in block */
			if (virtual <= cur_end) {
				/* Overhangs right */
				if (end > cur_end) {
					cur->size += (end - cur_end);

					/* Since we're adjusting this size, check if we're now
					 * adjacent to the next one */

					if (cur->next && cur->next->address <= end) {
						cur->size += cur->next->size - (end - cur->next->address);

						/* pointer reuse, new is actually old =P */
						new = cur->next;
						cur->next = new->next;
						push_virtual_block(new);
					}
				}

				return;
			}

		/* cur is adjacent right */

		/* Ends in block */

		} else if (cur->address <= end) {
			cur->size += size - (end - cur->address);
			cur->address = virtual;
			return;
		}

		/* Is not connected, but this is virtual's place in the list */
		else if (cur->address > virtual) {
			break;
		}
	}

	new = pop_virtual_block();
	new->name = name;
	new->address = virtual;
	new->size = size;

	if (prev)
		prev->next = new;
	else
		*head = new;
	new->next = cur;
}

int __free_vspace(struct virtual_block **head, u64 virtual, u64 size)
{
	struct virtual_block *cur, *prev, *new;
	u64 end, cur_end;

	end = virtual + size;

	prev = NULL;
	for (cur = *head; cur; prev = cur, cur = cur->next) {
		cur_end = cur->address + cur->size;

		if (cur->address > virtual)
			return -ENOENT;

		if(cur->address == virtual) {
			if(size < cur->size) {
				/* Shrink */
				cur->address = virtual + size;
				cur->size = cur->size - size;
			}
			else {
				/* Remove */
				if (prev)
					prev->next = cur->next;
				else
					*head = cur->next;
				push_virtual_block(cur);
			}
			break;
		}
		else if (cur_end > virtual) {
			if (end >= cur_end)
				cur->size -= (cur_end - virtual);
			else {
				/* Shrink cur, add a new block for the remaining shard */
				cur->size = (virtual - cur->address);

				new = pop_virtual_block();
				new->address = end;
				new->size = (cur_end - end);
				new->name = cur->name;

				new->next = cur->next;
				cur->next = new;
			}
			break;
		}
	}

	return 0;
}

As you can see, these functions keep the virtual block list in order, and on allocate will attempt to merge blocks that are adjacent or overlapping and split them apart when you free a subset of a block. As I mentioned above, name is currently ignored when merging blocks, which is bad behavior, but again this is intended to be just enough to get the page allocator running.

Final Page Allocator

Now that we (nominally) have the ability to get a physical page and a virtual address, we can combine these two into our final product.

void *page_alloc(u32 order)
{
    void *virtual;
    void *ret;

    if (order > MAX_ORDER)
        return NULL;

    virtual = vsalloc("page heap", PAGE_HEAP_VIRTUAL, (1 << (order + PAGE_SHIFT)));

    if (virtual == NULL)
        return NULL;
    
    ret = page_alloc_to_virtual((u64) virtual, order);

    if (ret == NULL) {
        vfree((u64) virtual, (1 << (order + PAGE_SHIFT)));
        return NULL;
    }

    return ret;
}

Basic VGA Support

It might seem like an off the wall tangent to go from page allocation to display support, but x86 machines have a primitive frame buffer that’s exposed at a known physical address (0xB8000) with a window of 80 columns and 25 rows, each with one byte for a character and one byte for a color foreground / background. This has already been eliminated from the page allocator by being part of a reserved block in the GRUB memory map, but now that we have the ability to allocate virtual address space and map pages reliably, it’s trivial to start putting characters on the screen.


char *vga_mem = NULL;
u8 vga_x = 0;
u8 vga_y = 0;

u8 vga_max_x = 80;
u8 vga_max_y = 25;

void memcpy(void *dest, void *src, u32 size)
{
    int i;
    for (i = 0; i < size; i++)
        ((char *)dest)[i] = ((char *)src)[i];
}

static inline void __vga_clear_coord(u8 y, u8 x)
{
    vga_mem[((y * vga_max_x) + x) * 2] = ' ';
    vga_mem[(((y * vga_max_x) + x) * 2) + 1] = 0xf;
}

static inline void __vga_clear_line(u8 y)
{
    int i;
    for (i = 0; i < vga_max_x; i++)
        __vga_clear_coord(y, i);
}

void vga_putchar(char c)
{
    if ((c == '\n') || (vga_x >= vga_max_x))
    {
        vga_x = 0;
        vga_y++;
    }

    if (vga_y >= vga_max_y) {
        vga_y--;
        memcpy(vga_mem, &vga_mem[vga_max_x * 2],  (vga_max_x * vga_y * 2));
        __vga_clear_line(vga_y);
    }


    vga_mem[((vga_max_x * vga_y) + vga_x) * 2] = c;
    vga_x++;
}

void vga_clear(void)
{
    int i;
    for(i = 0; i < vga_max_y; i++)
        __vga_clear_line(i);
}

void vga_init(void)
{
    vga_mem = vsalloc("VGA mem", VGA_MEM_VIRTUAL, PAGE_SIZE);
    map_page((u64) vga_mem, 0xB8000);
    vga_clear();
}

Nothing too mystifying. I’ve also implemented a very nice version of vsnprintf() using GCC’s built in var args (stdarg.h) and some CS 101 buffer manipulation which gives us a very nice printk on top of a more general console layer to abstract through.

I feel the need to mention this here because I don’t want to spend an entire article on it, but the next one I write I’ll likely be using console output to debug.

The Code

The tag for this article is ‘page-alloc-and-vga’. You can browse it here.

Next Time

Next time, we’re going to look at handling interrupts in C, and using the xapic / x2apic timer (which will include doing CPUID for detection, MSR, and MMIO functions). If I’m feeling ambitious we might actually start scheduling kernel threads too.

Leave a Reply

Your email address will not be published. Required fields are marked *