bert-as-service 用 BERT 作为句子编码器, 并通过 ZeroMQ 服务托管, 只需两行代码就可以将句子映射成固定长度的向量表示;
准备
windows10 + python3.5 + tensorflow1.2.1
安装流程
- 安装 tensorflow, 参考
- 安装 bert-as-service
bert-as-service, 依赖于 python≥3.5 AND tensorflow≥1.10;
pip install bert-serving-server
pip instlal bert-serving-client
-
下载中文 bert 预训练的模型
BERT-Base, Uncased 12-layer, 768-hidden, 12-heads, 110M parameters BERT-Large, Uncased 24-layer, 1024-hidden, 16-heads, 340M parameters BERT-Base, Cased 12-layer, 768-hidden, 12-heads , 110M parameters BERT-Large, Cased 24-layer, 1024-hidden, 16-heads, 340M parameters BERT-Base, Multilingual Cased (New) 104 languages, 12-layer, 768-hidden, 12-heads, 110M parameters BERT-Base, Multilingual Cased (Old) 102 languages, 12-layer, 768-hidden, 12-heads, 110M parameters BERT-Base, Chinese Chinese Simplified and Traditional, 12-layer, 768-hidden, 12-heads, 110M parameters -
启动 bert-as-serving 服务
bert-serving-start -model_dir /tmp/english_L-12_H-768_A-12/ -num_worker=2 //模型路径自改
usage: xxxx\Anaconda3\envs\py35\Scripts\bert-serving-start -model_dir D:\env\bert\chinese_L-12_H-768_A-12 -num_worker=2
ARG VALUE
__________________________________________________
ckpt_name = bert_model.ckpt
config_name = bert_config.json
cors = *
cpu = False
device_map = []
do_lower_case = True
fixed_embed_length = False
fp16 = False
gpu_memory_fraction = 0.5
graph_tmp_dir = None
http_max_connect = 10
http_port = None
mask_cls_sep = False
max_batch_size = 256
max_seq_len = 25
model_dir = D:\env\bert\chinese_L-12_H-768_A-12
no_position_embeddings = False
no_special_token = False
num_worker = 2
pooling_layer = [-2]
pooling_strategy = REDUCE_MEAN
port = 5555
port_out = 5556
prefetch_size = 10
priority_batch_size = 16
show_tokens_to_client = False
tuned_model_dir = None
verbose = False
xla = False
I:[35mVENTILATOR[0m:freeze, optimize and export graph, could take a while...
I:[36mGRAPHOPT[0m:model config: D:\env\bert\chinese_L-12_H-768_A-12\bert_config.json
I:[36mGRAPHOPT[0m:checkpoint: D:\env\bert\chinese_L-12_H-768_A-12\bert_model.ckpt
I:[36mGRAPHOPT[0m:build graph...
I:[36mGRAPHOPT[0m:load parameters from checkpoint...
I:[36mGRAPHOPT[0m:optimize...
I:[36mGRAPHOPT[0m:freeze...
I:[36mGRAPHOPT[0m:write graph to a tmp file: C:\Users\Memento\AppData\Local\Temp\tmpo07002um
I:[35mVENTILATOR[0m:bind all sockets
I:[35mVENTILATOR[0m:open 8 ventilator-worker sockets
I:[35mVENTILATOR[0m:start the sink
I:[32mSINK[0m:ready
I:[35mVENTILATOR[0m:get devices
W:[35mVENTILATOR[0m:no GPU available, fall back to CPU
I:[35mVENTILATOR[0m:device map:
worker 0 -> cpu
worker 1 -> cpu
I:[33mWORKER-0[0m:use device cpu, load graph from C:\Users\Memento\AppData\Local\Temp\tmpo07002um
I:[33mWORKER-1[0m:use device cpu, load graph from C:\Users\Memento\AppData\Local\Temp\tmpo07002um
I:[33mWORKER-0[0m:ready and listening!
I:[33mWORKER-1[0m:ready and listening!
I:[35mVENTILATOR[0m:all set, ready to serve request!
- 用 python 模拟调用 bert-as-service 服务
bc = BertClient(ip="localhost", check_version=False, check_length=False)
vec = bc.encode(['你好', '你好呀', '我很好'])
print(vec)
输出结果:
[[ 0.2894022 -0.13572647 0.07591158 ... -0.14091237 0.54630077
-0.30118054]
[ 0.4535432 -0.03180456 0.3459639 ... -0.3121457 0.42606848
-0.50814617]
[ 0.6313594 -0.22302179 0.16799903 ... -0.1614125 0.23098437
-0.5840646 ]]
亮点
- State-of-the-art: build on pretrained 12/24-layer BERT models released by Google AI, which is considered as a milestone in the NLP community.
- Easy-to-use: require only two lines of code to get sentence/token-level encodes.
- Fast: 900 sentences/s on a single Tesla M40 24GB. Low latency, optimized for speed. See benchmark.
- Scalable: scale nicely and smoothly on multiple GPUs and multiple clients without worrying about concurrency. See benchmark.
- Reliable: tested on multi-billion sentences; days of running without a break or OOM or any nasty exceptions.
可视化监控
启动服务时加入参数 -http_port 8081
即可通过 8081 端口对外提供查询服务;
请求 http://localhost:8081/status/server
可以查看到服务的状态:
{
"ckpt_name": "bert_model.ckpt",
"client": "7a033047-f177-45fd-9ef5-45781b10d322",
"config_name": "bert_config.json",
"cors": "*",
"cpu": false,
"device_map": [],
"do_lower_case": true,
"fixed_embed_length": false,
"fp16": false,
"gpu_memory_fraction": 0.5,
"graph_tmp_dir": null,
"http_max_connect": 10,
"http_port": 8081,
"mask_cls_sep": false,
"max_batch_size": 256,
"max_seq_len": 25,
"model_dir": "D:\\env\\bert\\chinese_L-12_H-768_A-12",
"no_position_embeddings": false,
"no_special_token": false,
"num_concurrent_socket": 8,
"num_process": 3,
"num_worker": 1,
"pooling_layer": [
-2
],
"pooling_strategy": 2,
"port": 5555,
"port_out": 5556,
"prefetch_size": 10,
"priority_batch_size": 16,
"python_version": "3.5.6 |Anaconda, Inc.| (default, Aug 26 2018, 16:05:27) [MSC v.1900 64 bit (AMD64)]",
"pyzmq_version": "20.0.0",
"server_current_time": "2021-03-03 15:53:03.859211",
"server_start_time": "2021-03-03 10:00:21.128310",
"server_version": "1.10.0",
"show_tokens_to_client": false,
"statistic": {
"avg_last_two_interval": 1665.306127225,
"avg_request_per_client": 8.333333333333334,
"avg_request_per_second": 0.09246377980293276,
"avg_size_per_request": 102.58333333333333,
"max_last_two_interval": 17484.7365829,
"max_request_per_client": 53,
"max_request_per_second": 0.9194538223647459,
"max_size_per_request": 601,
"min_last_two_interval": 1.087602199997491,
"min_request_per_client": 2,
"min_request_per_second": 0.00005719274038008647,
"min_size_per_request": 1,
"num_active_client": 0,
"num_data_request": 12,
"num_max_last_two_interval": 1,
"num_max_request_per_client": 1,
"num_max_request_per_second": 1,
"num_max_size_per_request": 1,
"num_min_last_two_interval": 1,
"num_min_request_per_client": 6,
"num_min_request_per_second": 1,
"num_min_size_per_request": 1,
"num_sys_request": 63,
"num_total_client": 9,
"num_total_request": 75,
"num_total_seq": 1231
},
"status": 200,
"tensorflow_version": [
"1",
"10",
"0"
],
"tuned_model_dir": null,
"ventilator -> worker": [
"tcp://127.0.0.1:52440",
"tcp://127.0.0.1:52441",
"tcp://127.0.0.1:52442",
"tcp://127.0.0.1:52443",
"tcp://127.0.0.1:52444",
"tcp://127.0.0.1:52445",
"tcp://127.0.0.1:52446",
"tcp://127.0.0.1:52447"
],
"ventilator <-> sink": "tcp://127.0.0.1:52439",
"verbose": false,
"worker -> sink": "tcp://127.0.0.1:52467",
"xla": false,
"zmq_version": "4.3.3"
}
然后做个可视化的前端呈现数据即可, 也可以直接使用 bert-as-service 项目里的 plugin/dashboard;
参考:
- https://github.com/hanxiao/bert-as-service#monitoring-the-service-status-in-a-dashboard
- https://bert-as-service.readthedocs.io/en/latest/tutorial/add-monitor.html
QA
Q: 启动 bert-as-service 服务提示缺少 cudart64_100.dll
dll 文件
A: 从网上下载个 dll 文件, 然后放置在 C:\Windows\System32
目录下, 重新启动命令行窗口执行命令即可;
Q: fail to optimize the graph!, TypeError: cannot unpack non-iterable NoneType object
A: 降级安装 TF 1.10.0 版本; 确认 model 路径是绝对路径;
pip uninstall tensorflow
pip uninstall tensorflow-estimator
conda install --channel https://conda.anaconda.org/aaronzs tensorflow
参考:
- https://github.com/hanxiao/bert-as-service/issues/467
- https://blog.csdn.net/cktcrawl/article/details/103028725
参考资料
- Elasticsearch meets BERT
- windows下的启动bert-serving-server
- bert+es7实现相似度搜索(待测试与更新bert中文预处理模型)
- bert-as-service
- Bert 中文使用方式
- 使用文档