X86 based Operating system built from scratch for learning purpose
- KISS (Keep It Simple Stupid) Philosophy
- Bootloader is assumed to be GRUB, kernel has GRUB specific multiboot header
- Higher half Kernel, Kernel sets itself up to run from higher half, 3GB region
- Multitasking
- Basic scheduler with multitasking support, round robin, with same priority
- Timer interrupt forces context switch
- User mode, Kernel mode distinction
- GDT have been setup with appropriate DPLs
- Kernel code runs in ring 0, and user code in ring 3
- System call happens through
int 64
which has required GATE descriptors for privilege escalation
- Initramfs
- Standard cpio format for ramfs (no gzip compression)
- Standard ELF format for user space applications
- Fork support
- Clone parent process, no COW support
- Exec support
- Overwrite address space with new process
To build operating system (assuming gcc, nasm installed),
make
To run under QEMU
make qemu
To run (assuming bochs, bochs-sdl installed),
make run
To run under QEMU with debugging support (GDB)
make qemu_gdb
(Attach GDB, required commands already provided in .gdbinit file in top level dir)
BIOS starts processor in 16-bit real mode, GRUB initiates 32 bit protected
mode. Kernel ELF image is provided with multiboot header as per GRUB
specification, if GRUB finds this header in first 512 bytes of image, then it
loads ELF at 0x100000 location, lower memory belongs to BIOS, and other
hardware/IO mappings like VGA. Bootup code, executes from _start
entry point
of kernel, and (note BSS is already zero initialized by GRUB),
- Sets up two 4MB (size-extended) PTE's to map kernel image in, 0x0-0x400000 and 0xC0000000-0xC0400000 range respectively.
- Former being identity map as EIP is still in lower memory range and later serving as higher half mapping.
- Stack pointer is set up in higher half range, earlier being set by GRUB below 1M
- Once paging is enabled, program does long jump to higher half, thus starting execution from higher half.
Simple first-fit
strategy memory allocator, allocates memory from kernel
space, during free
it also manages compaction of adjacent free blocks.
For userspace malloc/free are provided which internally used sbrk
system call
to increase system break (if required). Kernel does required page table setups and
returns increases system break limit.
This is divided into kernel space memory mapping and user space memory mappings. Kernel space mappings remains constant and are part of every process address space, only linked not cloned, as changes from one process in kernel space should be visible ot other processes as well.
User space mappings depends on exec
call, and every process has its own
kernel as well as user stack.
During context switch, CR3 register gets loaded with current process page directory base address and that also internally invalidates TLB (Translation Lookaside Buffer).
User space applications are stored in initramfs
, a cpio format archieve in
standard ELF format. During exec
kernel finds and parses ELF image, sets up its
page tables accordingly. Init
process only starts shell
then hangs in there
forever.
Scheduler runs on behalf of currently executing process, mainly in two cases,
- In case of timer IRQ, process will be swapped out
- In case of relinquishing CPU if process blocks on something or yields
Kernel mode CODE/DATA segments are separate than user mode CODE/DATA segments with privilege levels programmed accordingly. While returning from exception, hardware pops up CS (code segment) and SS (stack segment) from stack to return to lower privilege level.
We program this stack frame accordingly while creating new task,
/* Task will start in CPL = 3, i.e. user mode */
task->irqf->cs = (SEG_UCODE << 3) | DPL_USER;
task->irqf->ds = (SEG_UDATA << 3) | DPL_USER;
task->irqf->eflags = 0x200;
task->irqf->ss = (SEG_UDATA << 3) | DPL_USER;
X86 hardware has built-in support for task switching, basically switching from user stack to kernel stack along with other segmentation parameters. For this every task needs to setup with its own TSS, which gets modified during context switching. We will not be using hardware task switching, but will be doing same in software itself. In any case we need to set up at-least one TSS which has valid kernel mode SS (stack segment) and ESP (stack pointer) for current task.
Feel free to fork and send merge request