python通过mmap库映射文件到内存用法详解
转自:http://blog.chinaunix.net/uid-20393955-id-1645587.html
                示例使用的文本如下lorem.txt:
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec
egestas, enim et consectetuer ullamcorper, lectus ligula rutrum leo,
a elementum elit tortor eu quam. Duis tincidunt nisi ut ante. Nulla
facilisi. Sed tristique eros eu libero. Pellentesque vel
arcu. Vivamus purus orci, iaculis ac, suscipit sit amet, pulvinar eu,
lacus. Praesent placerat tortor sed nisl. Nunc blandit diam egestas
dui. Pellentesque habitant morbi tristique senectus et netus et
malesuada fames ac turpis egestas. Aliquam viverra fringilla
leo. Nulla feugiat augue eleifend nulla. Vivamus mauris. Vivamus sed
mauris in nibh placerat egestas. Suspendisse potenti. Mauris
massa. Ut eget velit auctor tortor blandit sollicitudin. Suspendisse
imperdiet justo.
        
数据读取:
                使用mmap()函数可以创建内存映射文件。第一个参数是一个文件描述符,可以来自一个文件对象的fileno()方法或从os.open()。调用者要在调用mmap()前打开文件,并调用结束后关闭它。第二个参数以字节为单位,是映射文件的大小。如果值是0,映射整个文件。如果大于当前文件大小,则扩展这个文件。注意可选参数access:ACCESS_READ,ACCESS_WRITE,ACCESS_COPY。
import mmap import contextlib with open('lorem.txt', 'r') as f: with contextlib.closing(mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) ) as m: print 'First 10 bytes via read :', m.read(10) print 'First 10 bytes via slice:', m[:10] print '2nd 10 bytes via read :', m.read(10)
 执行结果:
$ python mmap_read.py First 10 bytes via read : Lorem ipsu First 10 bytes via slice: Lorem ipsu 2nd 10 bytes via read : m dolor si
数据写入
import mmap import shutil import contextlib # Copy the example file shutil.copyfile('lorem.txt', 'lorem_copy.txt') word = 'consectetuer' reversed = word[::-1] print 'Looking for :', word print 'Replacing with :', reversed with open('lorem_copy.txt', 'r+') as f: with contextlib.closing(mmap.mmap(f.fileno(), 0)) as m: print 'Before:' print m.readline().rstrip() m.seek(0) # rewind loc = m.find(word) m[loc:loc+len(word)] = reversed m.flush() m.seek(0) # rewind print 'After :' print m.readline().rstrip() f.seek(0) # rewind print 'File :' print f.readline().rstrip()
执行结果:
$ python mmap_write_slice.py Looking for : consectetuer Replacing with : reutetcesnoc Before: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec After : Lorem ipsum dolor sit amet, reutetcesnoc adipiscing elit. Donec File : Lorem ipsum dolor sit amet, reutetcesnoc adipiscing elit. Donec
使用ACCESS_COPY则不会改变实际存储的文件
import mmap import shutil import contextlib # Copy the example file shutil.copyfile('lorem.txt', 'lorem_copy.txt') word = 'consectetuer' reversed = word[::-1] with open('lorem_copy.txt', 'r+') as f: with contextlib.closing(mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_COPY) ) as m: print 'Memory Before:' print m.readline().rstrip() print 'File Before :' print f.readline().rstrip() print m.seek(0) # rewind loc = m.find(word) m[loc:loc+len(word)] = reversed m.seek(0) # rewind print 'Memory After :' print m.readline().rstrip() f.seek(0) print 'File After :' print f.readline().rstrip()
执行结果:
$ python mmap_write_copy.py Memory Before: Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec File Before : Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec Memory After : Lorem ipsum dolor sit amet, reutetcesnoc adipiscing elit. Donec File After : Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Donec
正则表达式
    可以与正则表达式配合使用:
import mmap import re import contextlib pattern = re.compile(r'(\.\W+)?([^.]?nulla[^.]*?\.)', re.DOTALL | re.IGNORECASE | re.MULTILINE) with open('lorem.txt', 'r') as f: with contextlib.closing(mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) ) as m: for match in pattern.findall(m): print match[1].replace('\n', ' ')
执行结果:
$ python mmap_regex.py Nulla facilisi. Nulla feugiat augue eleifend nulla.
参考资料:mmap (http://docs.python.org/lib/module-mmap.html)