使用BERTScore指标：加载HuggingFace缓存_Asp.net

环境：

1.操作系统：windows11
2.python版本：python 3.9.10

一、下载安装

pip install bert-score

二、示例代码

此示例代码由chatgpt自动生成，参考句子与生成句子皆以数组的形式输入。

from bert_score import score

# 定义参考句子和生成句子
refs = ["the cat sat on the mat.", "it was raining outside."]
cands = ["the cat sat on the mat.", "it was pouring outside."]

# 使用bert_score计算分数
p, r, f1 = score(cands, refs, lang='en', model_type="roberta-large", verbose=true)

# 打印结果
print("precision:", p)
print("recall:", r)
print("f1 score:", f1)

三、程序报错

oserror: we couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like roberta-large is not the path to a directory containing a file named config.json.
checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.

四、解决方案（能科学上网）

因为上述代码会首先检查本地是否有roberta-large模型文件，所以可以使用如下代码预先加载roberta-large的模型文件保存到本地的.cache文件中。

from transformers import autotokenizer, automodelforsequenceclassification

bert_tokenizer = autotokenizer.from_pretrained('roberta-large')
bert_model = automodelforsequenceclassification('roberta-large')

五、解决方案（不能科学上网）

因为上述代码会首先检查本地是否有roberta-large模型文件，所以可以到国内的huggingface镜像站下载模型文件，再放置到合适位置。
本地缓存没有roberta-large模型文件时的目录结构：

c:\users\user\.cache\huggingface> tree
c:.
├─datasets
├─hub
├─metrics
│  └─rouge
│      └─default
└─modules

本地缓存拥有roberta-large模型文件时的目录结构（其中的722…1d59是sha散列函数值）：

c:\users\user\.cache\huggingface> tree
c:.
├─datasets
├─hub
│  ├─.locks
│  │  └─models--roberta-large
│  └─models--roberta-large
│      ├─blobs
│      ├─refs
│      │  └─main
│      └─snapshots
│          └─722cf37b1afa9454edce342e7895e588b6ff1d59
│             ├─config.json
│             ├─merges.txt
│             ├─pytorch_model.bin
│             ├─tokenizer.json
│             ├─tokenizer_config.json
│             └─vocab.json
├─metrics
│  └─rouge
│      └─default
└─modules

六、测试代码

from transformers import pipeline
 
classifier = pipeline("sentiment-analysis")
res = classifier(["we are very happy.", 'we are very sad.'])
print(res)