A deep dive into QEMU: memory regions
In this post we'll have a glance at high level memory organisation in QEMU: memory regions (MR).
We won't cover address spaces, because we usually manage memory regions directly. However, have a look at docs/devel/memory, recommanded and the code (include/exec/memory.h) for more details.
An high level external presentation of memory organisation is available there. You will also find a very interesting internal documentation at docs/devel/loads-stores. This is an enumeration of the available QEMU APIs for accessing memory.
When you want to play with memory regions in QEMU, you can either:
- get a direct pointer to the host buffer backing your VM memory region
- implement read/write callback functions to intercept every access (usually IO memory)
- use QEMU
cpu_physical_memory_rw()to safely access the region
In the blog post dedicated to the TCG, we will exactly see how translated instructions access VM memory and how we can intercept at this level.
Looking at the memory tree (abbreviated)
Below is the tree of available memory regions once a SiFive HiFive Unleashed board is ready. As you can see, memory regions can contain other memory regions (called subregions). This a clean way to organize memory. Each memory region has its own properties and is attached to a kind of view called the address space.
$ qemu-system-riscv64 -M sifive_u -s -S -nographic # then press Ctrl-A C to enter the monitor
QEMU 10.0.2 monitor - type 'help' for more information
(qemu) info mtree
address-space: cpu-memory-0
address-space: cpu-memory-1
address-space: dma
address-space: memory
0000000000000000-ffffffffffffffff (prio 0, i/o): system
0000000000001000-000000000000ffff (prio 0, rom): riscv.sifive.u.mrom
0000000002000000-0000000002003fff (prio 0, i/o): riscv.aclint.swi
0000000002004000-000000000200bfff (prio 0, i/o): riscv.aclint.mtimer
0000000002010000-0000000002010fff (prio -1000, i/o): riscv.sifive.u.l2cc
0000000003000000-00000000030fffff (prio 0, i/o): sifive.pdma
0000000008000000-0000000009ffffff (prio 0, ram): riscv.sifive.u.l2lim
000000000c000000-000000000fffffff (prio 0, i/o): riscv.sifive.plic
0000000010000000-0000000010000fff (prio 0, i/o): riscv.sifive.u.prci
0000000010010000-000000001001001f (prio 0, i/o): riscv.sifive.uart
0000000010011000-000000001001101f (prio 0, i/o): riscv.sifive.uart
0000000010020000-00000000100200ff (prio 0, i/o): sifive-pwm
0000000010021000-00000000100210ff (prio 0, i/o): sifive-pwm
0000000010040000-0000000010040fff (prio 0, i/o): sifive.spi
0000000010050000-0000000010050fff (prio 0, i/o): sifive.spi
0000000010060000-00000000100600ff (prio 0, i/o): sifive_soc.gpio
0000000010070000-0000000010070fff (prio 0, i/o): riscv.sifive.u.otp
0000000010090000-00000000100907ff (prio 0, i/o): enet
00000000100a0000-00000000100a0fff (prio -1000, i/o): riscv.sifive.u.gem-mgmt
00000000100b0000-00000000100bffff (prio -1000, i/o): riscv.sifive.u.dmc
0000000020000000-000000002fffffff (prio 0, ram): riscv.sifive.u.flash0
0000000080000000-0000000087ffffff (prio 0, ram): riscv.sifive.u.ram
address-space: I/O
0000000000000000-000000000000ffff (prio 0, i/o): ioDefault memory regions and address spaces are created by QEMU. The most important is the system memory region which is created by memory_map_init() from cpu_exec_init_all().
It can be seen as the top level one, and usually subregions are added to the system memory region.
Allocating system memory
This might be one of the most desired things when creating a new machine : get RAM and load a firmware. The correct function to invoke is memory_region_add_subregion().
If we look at some other board implementations, for instance the MIPS malta
void mips_malta_init(MachineState *machine)
{
...
MemoryRegion *system_memory = get_system_memory();
/* register RAM at high address where it is undisturbed by IO */
memory_region_add_subregion(system_memory, 0x80000000, machine->ram);
...
}A new memory region for the RAM is created and directly added as a subregion of the system memory region. From that point, accessing physical addresses 0x80000000 - 0x80000000+machine->ram_size will access the RAM.
The QEMU memory API allows you to create memory regions backed by file descriptors, already allocated host buffers and callbacks as we will see for IOs.
IO memory regions
Getting back to our simple MIPS board example: (the following code is simplified)
void mips_malta_init(MachineState *machine)
{
...
MemoryRegion *iomem = g_new(MemoryRegion, 1);
memory_region_init_io(iomem, NULL, &malta_fpga_ops, s, "malta-fpga", 0x100000);
memory_region_add_subregion(system_memory, FPGA_ADDRESS, iomem);
}A new memory region iomem is created with memory_region_init_io() and also added as a subregion of the system memory. This region is not of RAM but IO type and has a special MemoryRegionOps argument.
static const MemoryRegionOps malta_fpga_ops = {
.read = malta_fpga_read,
.write = malta_fpga_write,
.endianness = DEVICE_NATIVE_ENDIAN,
};
static uint64_t malta_fpga_read(void *opaque, hwaddr addr,
unsigned size)
{
MaltaFPGAState *s = opaque;
uint32_t val = 0;
uint32_t saddr;
saddr = (addr & 0xfffff);
/* SWITCH Register */
switch (saddr) {
case 0x00200:
val = 0x00000000;
break;
}
...
return val;
}
static void malta_fpga_write(void *opaque, hwaddr addr,
uint64_t val, unsigned size)
{
MaltaFPGAState *s = opaque;
uint32_t saddr;
saddr = (addr & 0xfffff);
switch (saddr) {
case 0x00500:
if (val == 0x42) {
qemu_system_reset_request(SHUTDOWN_CAUSE_GUEST_RESET);
}
break;
...
}
return;
}IO memory regions expose devices memory. They usually need special interpretation during read/write accesses to simulate the expected device behavior. Using MemoryRegionOps callback helps you implement device operations.
In the previous example, the iomem region is mapped at FPGA_ADDRESS, which is 0x1f000000ULL, to FPGA_ADDRESS+0x100000. Whenever the VM accesses this memory range, the read/write callbacks will be called. The addr argument is an offset from the beginning of the related memory region.
So doing something like writeq(0x42, 0x1f000500) will call malta_fpga_write(), and then triggers a system reset.
Init memory for CLabPU Machine
CLabPU has a DRAM memory region, which should be initialized. Add the following code to clabpu.c
static const MemMapEntry clabpu_memmap[] = {
[CLABPU_MROM] = { 0x1000, 0xf000 },
[CLABPU_HTIF] = { 0x1000000, 0x1000 },
[CLABPU_CLINT] = { 0x2000000, 0x10000 },
[CLABPU_DRAM] = { 0x80000000, 0x0 },
};
static void clabpu_init_mem(CLabPUState *clabpu, MachineState *machine)
{
const MemMapEntry *memmap = clabpu_memmap;
MemoryRegion *system_memory = get_system_memory();
MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
/* register system main memory */
memory_region_add_subregion(system_memory, memmap[CLABPU_DRAM].base, machine->ram);
/* boot rom */
memory_region_init_rom(mask_rom, NULL, "riscv.clabpu.mrom", memmap[CLABPU_MROM].size, &error_fatal);
memory_region_add_subregion(system_memory, memmap[CLABPU_MROM].base, mask_rom);
}
void clabpu_machine_init(ObjectClass *oc, void *data) {
...
mc->default_ram_id = "clabpu.ram"; // add this line, else no memory region will be created for the machine
...
}Add enums required in include/hw/riscv/clabpu.h:
enum {
CLABPU_MROM,
CLABPU_HTIF,
CLABPU_CLINT,
CLABPU_DRAM,
};Don't forget to call clabpu_init_mem() in the clabpu_init() function.
You can validate your implementation by running the following command. You can see the memory regions created for the CLabPU machine, including the DRAM and the mask ROM:
qemu-system-riscv64 -M clabpu -s -S -nographic
qemu-system-riscv64: warning: disabling zfa extension for hart 0x0000000000000000 because privilege spec version does not match
QEMU 10.0.2 monitor - type 'help' for more information
(qemu) info mtree
address-space: cpu-memory-0
address-space: memory
0000000000000000-ffffffffffffffff (prio 0, i/o): system
0000000000001000-000000000000ffff (prio 0, rom): riscv.clabpu.mrom
0000000080000000-00000000ffffffff (prio 0, ram): clabpu.ram
address-space: I/O
0000000000000000-000000000000ffff (prio 0, i/o): io