The next memory allocation function that we’ll show you is
vmalloc, which allocates a contiguous memory
region in the virtual address space. Although
the pages are not necessarily consecutive in physical memory (each
page is retrieved with a separate call to
__get_free_page), the kernel sees them as
a contiguous range of addresses. vmalloc returns
0 (the NULL
address) if an error occurs, otherwise,
it returns a pointer to a linear memory area of size at least
size
.
The prototypes of the function and its relatives (ioremap, which is not strictly an allocation function, will be discussed shortly) are as follows:
#include <linux/vmalloc.h> void * vmalloc(unsigned long size); void vfree(void * addr); void *ioremap(unsigned long offset, unsigned long size); void iounmap(void * addr);
It’s worth stressing that memory addresses returned by kmalloc and get_free_pages are also virtual addresses. Their actual value is still massaged by the MMU (memory management unit, usually part of the CPU) before it is used to address physical memory.[30] vmalloc is not different in how it uses the hardware, but rather in how the kernel performs the allocation task.
The (virtual) address range used by kmalloc and
get_free_pages features a one-to-one mapping to
physical memory, possibly shifted by a constant
PAGE_OFFSET
value; the functions don’t need to
modify the page tables for that address range. The address range used
by vmalloc and ioremap, on
the other hand, is completely synthetic, and each allocation builds
the (virtual) memory area by suitably setting up the page tables.
This difference can be perceived by comparing the pointers returned by
the allocation functions. On some platforms (for example, the x86),
addresses returned by vmalloc are just greater
than addresses that kmalloc addresses. On other
platforms (for example, MIPS and IA-64), they belong to a completely
different address range. Addresses available for
vmalloc are in the range from
VMALLOC_START
to
VMALLOC_END
. Both symbols are defined in
<asm/pgtable.h>
.
Addresses allocated by vmalloc can’t be used outside of the microprocessor, because they make sense only on top of the processor’s MMU. When a driver needs a real physical address (such as a DMA address, used by peripheral hardware to drive the system’s bus), you can’t easily use vmalloc. The right time to call vmalloc is when you are allocating memory for a large sequential buffer that exists only in software. It’s important to note that vmalloc has more overhead than __get_free_pages because it must both retrieve the memory and build the page tables. Therefore, it doesn’t make sense to call vmalloc to allocate just one page.
An example of a function that uses vmalloc is the
create_module system call, which uses
vmalloc to get space for the module being
created. Code and data of the module are later copied to the
allocated space using copy_from_user, after
insmod has relocated the code. In this
way, the module appears to be loaded into contiguous memory. You can
verify, by looking in /proc/ksyms
, that kernel
symbols exported by modules lie in a different memory range than
symbols exported by the kernel proper.
Memory allocated with vmalloc is released by vfree, in the same way that kfree releases memory allocated by kmalloc.
Like vmalloc, ioremap builds new page tables; unlike vmalloc, however, it doesn’t actually allocate any memory. The return value of ioremap is a special virtual address that can be used to access the specified physical address range; the virtual address obtained is eventually released by calling iounmap. Note that the return value from ioremap cannot be safely dereferenced on all platforms; instead, functions like readb should be used. See Section 8.4.1 in Chapter 8for the details.
ioremap is most useful for mapping the (physical) address of a PCI buffer to (virtual) kernel space. For example, it can be used to access the frame buffer of a PCI video device; such buffers are usually mapped at high physical addresses, outside of the address range for which the kernel builds page tables at boot time. PCI issues are explained in more detail in Section 15.1 in Chapter 15.
It’s worth noting that for the sake of portability, you should not directly access addresses returned by ioremap as if they were pointers to memory. Rather, you should always use readb and the other I/O functions introduced in Section 8.4, in Chapter 8. This requirement applies because some platforms, such as the Alpha, are unable to directly map PCI memory regions to the processor address space because of differences between PCI specs and Alpha processors in how data is transferred.
There is almost no limit to how much memory vmalloc can allocate and ioremap can make accessible, although vmalloc refuses to allocate more memory than the amount of physical RAM, in order to detect common errors or typos made by programmers. You should remember, however, that requesting too much memory with vmalloc leads to the same problems as it does with kmalloc.
Both ioremap and vmalloc are page oriented (they work by modifying the page tables); thus the relocated or allocated size is rounded up to the nearest page boundary. In addition, the implementation of ioremap found in Linux 2.0 won’t even consider remapping a physical address that doesn’t start at a page boundary. Newer kernels allow that by “rounding down” the address to be remapped and by returning an offset into the first remapped page.
One minor drawback of vmalloc is that it can’t be
used at interrupt time because internally it uses
kmalloc(GFP_KERNEL)
to acquire storage for the page
tables, and thus could sleep. This shouldn’t be a problem—if
the use of __get_free_page isn’t good
enough for an interrupt handler, then the software design needs some
cleaning up.
Sample code using vmalloc is provided in the scullv module. Like scullp, this module is a stripped-down version of scull that uses a different allocation function to obtain space for the device to store data.
The module allocates memory 16 pages at a time. The allocation is
done in large chunks to achieve better performance than
scullp and to show something that takes too
long with other allocation techniques to be feasible. Allocating more
than one page with __get_free_pages is
failure prone, and even when it succeeds, it can be slow. As we saw
earlier, vmalloc is faster than other functions
in allocating several pages, but somewhat slower when retrieving a
single page, because of the overhead of page-table
building. scullv is designed like
scullp. order
specifies
the “order” of each allocation and defaults to 4. The only
difference between scullv and
scullp is in allocation management. These
lines use vmalloc to obtain new memory:
/* Allocate a quantum using virtual addresses */ if (!dptr->data[s_pos]) { dptr->data[s_pos] = (void *)vmalloc(PAGE_SIZE << dptr->order); if (!dptr->data[s_pos]) goto nomem; memset(dptr->data[s_pos], 0, PAGE_SIZE << dptr->order); }
And these lines release memory:
/* Release the quantum set */ for (i = 0; i < qset; i++) if (dptr->data[i]) vfree(dptr->data[i]);
If you compile both modules with debugging enabled, you can look at
their data allocation by reading the files they create in
/proc
. The following snapshots were taken on two
different systems:
salma% cat /tmp/bigfile > /dev/scullp0; head -5 /proc/scullpmem Device 0: qset 500, order 0, sz 1048576 item at e00000003e641b40, qset at e000000025c60000 0:e00000003007c000 1:e000000024778000 salma% cat /tmp/bigfile > /dev/scullv0; head -5 /proc/scullvmem Device 0: qset 500, order 4, sz 1048576 item at e0000000303699c0, qset at e000000025c87000 0:a000000000034000 1:a000000000078000 salma% uname -m ia64 rudo% cat /tmp/bigfile > /dev/scullp0; head -5 /proc/scullpmem Device 0: qset 500, order 0, sz 1048576 item at c4184780, qset at c71c4800 0:c262b000 1:c2193000 rudo% cat /tmp/bigfile > /dev/scullv0; head -5 /proc/scullvmem Device 0: qset 500, order 4, sz 1048576 item at c4184b80, qset at c71c4000 0:c881a000 1:c882b000 rudo% uname -m i686
The values show two different behaviors. On IA-64, physical addresses and virtual addresses are mapped to completely different address ranges (0xE and 0xA), whereas on x86 computers vmalloc returns virtual addresses just above the mapping used for physical memory.
[30] Actually, some architectures define ranges of “virtual” addresses as reserved to address physical memory. When this happens, the Linux kernel takes advantage of the feature, and both the kernel and get_free_pages addresses lie in one of those memory ranges. The difference is transparent to device drivers and other code that is not directly involved with the memory-management kernel subsystem.
Get Linux Device Drivers, Second Edition now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.