Modify the corresponding parts in the cpu(gpu)_example.py
model_id = "bert-base-uncased"
We can load a model from the directory of pre-trained model.
model_id = "your_saved_model" directory
cd /workspace
python tools/convert_huggingface_bert_tf_to_npz.py bert-base-uncased /workspace/bert_tf.npz
update the corresponding line in bert_example.py
tt_model = turbo_transformers.BertModel.from_npz(
'/workspace/bert_tf.npz', cfg)
python bert_example.py
Attention : If you want to use turbo with C++ backend instead of onnxrt. Directly linking an MKL of Pytorch installed by conda will lead to poor performance in our hand-crafted C++ version. You should install an official MKL an set MKL PATH in CMakeLists.txt. As a not so elegant alternative, you can uninstall OpenNMT-py and downgrade torch to 1.1.0.
I have prepared an image for bert only runtime on dockerhub with .
thufeifeibear/turbo_transformers_cpu:bert_only_v0.1
Attention : If you want to use turbo with C++ backend instead of onnxrt. Directly linking an MKL of Pytorch installed by conda will lead to poor performance in our hand-crafted C++ version. You should install an official MKL an set MKL PATH in CMakeLists.txt. As a not so elegant alternative, you can uninstall OpenNMT-py and downgrade torch to 1.1.0.
I have prepared an image for bert only runtime on dockerhub with .
thufeifeibear/turbo_transformers_cpu:bert_only_v0.1
Chinese Version Because TurboTransformer has accelerated embedding + BERT encoder + pooler, which are major hotspots. Users may have to customize the not so time-consuming post-processing layers according to their own needs. We take a classfication task as an example. It requires a Linear Layer after pooler.
__init__, __call__, from_torch, from_pretrained. The implementation code and description of the class can refer to bert_for_sequence_classification_example.py