November 30, 2015

Linux memory: What are those buffers?

Let's figure how 'free' command on linux gathers information related to it's output esp. buffers
Here is a sample output of free command on Linux
# free
             total       used       free     shared    buffers     cached
Mem:       1004300     852292     152008          4     194872     335584
-/+ buffers/cache:     321836     682464 
Swap:      1048572      88500     960072
It's evident from simple strace,free is simply reading/parsing information from /proc/meminfo
# strace -e open,read free 
.
open("/proc/meminfo", O_RDONLY)         = 3
read(3, "MemTotal:        1004300 kB\nMemF"..., 2047) = 1170
Looking at kernel source meminfo.c information collected with si_meminfo and si_swapinfo functions are displayed in KB. /proc/meminfo fields of interest map as below
56                 "MemTotal:       %8lu kB\n" -> 108                 K(i.totalram),
57                 "MemFree:        %8lu kB\n" -> 109                 K(i.freeram),
58                 "Buffers:        %8lu kB\n" -> 110                 K(i.bufferram),
59                 "Cached:         %8lu kB\n" -> 111                 K(cached),
60                 "SwapCached:     %8lu kB\n" -> 112                 K(total_swapcache_pages),
As seen from here buffers field is populated from si_meminfo call, which in turn leads to
2831         val->bufferram = nr_blockdev_pages();
nr_blockdev_pages() is defined in block_dev.c as shown below
555 long nr_blockdev_pages(void)
556 {
557         struct block_device *bdev;
558         long ret = 0;
559         spin_lock(&bdev_lock);
560         list_for_each_entry(bdev, &all_bdevs, bd_list) {
561                 ret += bdev->bd_inode->i_mapping->nrpages;
562         }
563         spin_unlock(&bdev_lock);
564         return ret;
565 }
As seen above, this function is returning sum of all 'nrpages' of all the block_device  address space (i_mapping) inodes (bd_inode). So the 'Buffers' field is directly related to all the block devices. nrpages is the number of resident pages in use by the address space.

Looking at source, found two functions which update nrpages from mm/filemap.c i.e __delete_from_page_cache and add_to_page_cache_locked

Out of all I/O that pass through pagecache, if the inode mapping is for a file it would be accounted under cached field and if it's for a block device, buffers field is updated.

So, if you try reading a block device directly say for ex: using dd, that should increase buffers field of free output. Let's give it a try
[root@cent8 ~]# free
             total       used       free     shared    buffers     cached
Mem:       1004300     312268     692032          4         64       9844
-/+ buffers/cache:     302360     701940 
Swap:      1048572      88496     960076
[root@cent8 ~]# dd if=/dev/vda1 of=/dev/null bs=1024k count=100
100+0 records in
100+0 records out
104857600 bytes (105 MB) copied, 0.279128 s, 376 MB/s 
[root@cent8 ~]# free
             total       used       free     shared    buffers     cached
Mem:       1004300     428852     575448          4     102592      10008
-/+ buffers/cache:     316252     688048 
Swap:      1048572      88496     960076 
As expected 'Buffers' increased around 100MB. Another reason for block device inode mapping is for file metadata i.e file system related disk blocks, directories etc. Let's run find to read a bunch of files,directories and observe 'buffers' field
[root@cent8 ~]# free
             total       used       free     shared    buffers     cached
Mem:       1004300     432796     571504          4     104612      12092
-/+ buffers/cache:     316092     688208 
Swap:      1048572      88496     960076 
[root@cent8 ~]# find / > /dev/null
[root@cent8 ~]# free
             total       used       free     shared    buffers     cached
Mem:       1004300     494556     509744          4     148880      12328
-/+ buffers/cache:     333348     670952 
Swap:      1048572      88496     960076 
Sure enough about 40M+ data pulled into 'buffers'. We can find nrpages used per block device, using crash with live kernel debugging i.e basically trying to find block_device.bd_inode->i_mapping->nrpages value for each block_device

Using below commands from crash, you will be able to find this info

1. list block_device.bd_list -H all_bdevs -s block_device.bd_inode  => all_bdevs is the list head, that contains list of all block_devices and it's walked using bd_list as pointer to next member
2. once you get all bd_inode's you can find i_mapping i.e address_space struct  using below (for each device)
struct inode.i_mapping [address of bd_inode struct from step 1]
3. Find nrpages using above address_space struct (for each device)
struct address_space.nrpages [address of address_space struct from step 2]
Wrote simple python script to do the same: nrpages_by_dev
[root@cent8 ~]# /tmp/nrpages_by_dev;free
Running /usr/bin/crash to determine nrpages by device ...
Device#              DeviceName           nrpages              nrpages - KB        
--------------------------------------------------------------------------------
264241153            "vda1                25639                102556              
265289728            "dm-0                11844                47376               
             total       used       free     shared    buffers     cached
Mem:       1004300     689448     314852          4     149932     153600
-/+ buffers/cache:     385916     618384 
Swap:      1048572      88496     960076 
P.S: Above script has no error checking and I have tested only on Fedora/Centos linux.Please use with *care* and it assumes crash, kernel-debugino rpm's are installed and 'crash -s' works! (ofcourse, you need to be root to run). I am also interested to find this info by other means, probably via ftrace or perf to avoid having crash,debuginfo pkg dependency. Let me know, if any one knows !
 

No comments:

Post a Comment