• 欢迎访问开心洋葱网站,在线教程,推荐使用最新版火狐浏览器和Chrome浏览器访问本网站,欢迎加入开心洋葱 QQ群
  • 为方便开心洋葱网用户,开心洋葱官网已经开启复制功能!
  • 欢迎访问开心洋葱网站,手机也能访问哦~欢迎加入开心洋葱多维思维学习平台 QQ群
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏开心洋葱吧~~~~~~~~~~~~~!
  • 由于近期流量激增,小站的ECS没能经的起亲们的访问,本站依然没有盈利,如果各位看如果觉着文字不错,还请看官给小站打个赏~~~~~~~~~~~~~!

python使用mongo的mapreduce实现简单的统计和group by操作

python 水墨上仙 1916次浏览

python使用mongo的mapreduce实现简单的统计和group by操作,mapreduce的效率还是非常高的,替代sql里面的group by

mongo里面的数据是这样的:
doc1 = {
“freq”:1
…..
}
doc2 = {
“freq”:3
…..
}
要求是统计出freq=1的文档个数,freq=2的文档的个数。。。
典型的mapreduce任务,正好试试mongo的mapreduce。
感觉还行,做一些简单的聚集操作还凑活,看看回头有没有更复杂一些的应用。

#!/usr/bin/env&nbsppython
import&nbsppymongo
from&nbspbson.code&nbspimport&nbspCode
&nbsp
def&nbspcalc_freq_distribution(collection_handler):
&nbsp&nbsp&nbsp&nbspout_collection_name&nbsp=&nbspcollection_handler.name+’_freqdist’
&nbsp&nbsp&nbsp&nbspmap&nbsp=&nbspCode(“function&nbsp()&nbsp{”
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp”emit(this.freq,&nbsp{count:1});”
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp”}”)
&nbsp
&nbsp&nbsp&nbsp&nbspreduce&nbsp=&nbspCode(“function&nbsp(key,&nbspvalues)&nbsp{”
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp”&nbsp&nbspvar&nbsptotal&nbsp=”
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp”&nbsp&nbspfor&nbsp(var&nbspi&nbsp=&nbspi&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp”&nbsp&nbsp&nbsp&nbsptotal&nbsp+=&nbspvalues[i].count;”
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp”&nbsp&nbsp}”
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp”&nbsp&nbspreturn&nbsp{count:total};”
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp”}”)
&nbsp&nbsp&nbsp&nbspresult&nbsp=&nbspcollection_handler.map_reduce(map,&nbspreduce,&nbspout&nbsp=&nbspout_collection_name)
&nbsp&nbsp&nbsp&nbspfname&nbsp=&nbspout_collection_name+’.csv’
&nbsp&nbsp&nbsp&nbspwith&nbspopen(fname,&nbsp’w’)&nbspas&nbspf:
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspfor&nbspdoc&nbspin&nbspresult.find():
&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbspf.write(‘,’.join([str(doc[‘_id’]),&nbspstr(doc[‘value’][‘count’])])+’\n’)
&nbsp
if&nbsp__name__&nbsp==&nbsp’__main__’:
&nbsp&nbsp&nbsp&nbspconn&nbsp=&nbsppymongo.Connection([‘192.168.1.1’],&nbsp27018)
&nbsp&nbsp&nbsp&nbspinput_collection=&nbspconn.cname.things
&nbsp&nbsp&nbsp&nbspprint&nbspcalc_freq_distribution(merge_spam)
&nbsp
&nbsp&nbsp&nbsp&nbspmerge_ham&nbsp=&nbspconn.antispam.mergeham
&nbsp&nbsp&nbsp&nbspprint&nbspcalc_freq_distribution(merge_ham)


开心洋葱 , 版权所有丨如未注明 , 均为原创丨未经授权请勿修改 , 转载请注明python使用mongo的mapreduce实现简单的统计和group by操作
喜欢 (0)
加载中……