java 使用HanLP 安装入门详细教程_Java

hanlp是一系列模型与算法组成的nlp工具包，目标是普及自然语言处理在生产环境中的应用。hanlp具备功能完善、性能高效、架构清晰、语料时新、可自定义的特点。

hanlp 是基于 java开发的 nlp工具包，由一系列模型与算法组成，目标是普及自然语言处理在生产环境中的应用。而且 hanlp具备功能完善、性能高效、架构清晰、语料时新、可自定义的特点，因此十分好上手，下面看下java 使用hanlp 入门教程。

1. 安装 hanlp

maven 依赖

<dependency>
    <groupid>com.hankcs</groupid>
    <artifactid>hanlp</artifactid>
    <version>portable-1.8.4</version> <!-- 最新版本请查看官网 -->
</dependency>

注意：portable 版本内置小型词典，适合基础任务；若需完整功能，需下载完整数据包。

2. 基础功能

(1) 分词

import com.hankcs.hanlp.hanlp;
import com.hankcs.hanlp.seg.common.term;
public class basicdemo {
    public static void main(string[] args) {
        string text = "你好，欢迎使用hanlp！这是一段测试文本。";
        // 标准分词
        list<term> termlist = hanlp.segment(text);
        system.out.println(termlist);
        // 输出: [你好/vl, ，/w, 欢迎/v, 使用/v, hanlp/nx, ！/w, 这是/r, 一段/m, 测试/vn, 文本/n, 。/w]
    }
}

(2) 词性标注

hanlp 的分词结果已包含词性（如 n=名词，v=动词）：

for (term term : termlist) {
    system.out.println(term.word + " : " + term.nature);
}

常用词性标记：

n：名词
v：动词
w：标点符号
nx：外文单词

3. 进阶功能

(1) 关键词提取

import com.hankcs.hanlp.summary.textrankkeyword;
list<string> keywords = hanlp.extractkeyword(text, 5); // 提取前5个关键词
system.out.println(keywords); // 输出: [文本, 测试, hanlp, 欢迎, 使用]

(2) 命名实体识别（ner）

list<term> termlist = hanlp.segment("马云在阿里巴巴工作。");
for (term term : termlist) {
    if (term.nature.tostring().startswith("nr")) { // nr=人名
        system.out.println("人名: " + term.word);
    } else if (term.nature.tostring().startswith("ns")) { // ns=地名
        system.out.println("地名: " + term.word);
    }
}
// 输出: 人名: 马云  地名: 阿里巴巴

(3) 自定义词典

// 方式1：临时添加单词
hanlp.config.customdictionarypath = new string[]{"data/dictionary/custom/customdictionary.txt"};
hanlp.config.enabledebug();
// 方式2：动态添加
customdictionary.add("量子计算", "n 1024");
customdictionary.insert("神经网络", "n 1024");
// 使用自定义词典分词
system.out.println(hanlp.segment("量子计算是未来趋势"));
// 输出: [量子计算/n, 是/v, 未来/t, 趋势/n]

4. 高级配置

(1) 切换分词模式

// 极速词典分词（不标注词性）
list<string> fastsegresult = hanlp.segmentfaster(text);
// 标准分词（带词性）
list<term> stdsegresult = hanlp.segment(text);
// nlp分词（高精度，需完整数据包）
list<term> nlpsegresult = hanlp.newsegment().enablenamerecognize(true).seg(text);

(2) 加载完整数据包

下载数据包并解压。
配置 hanlp.properties：
```
root=path/to/hanlp-data
```

5. 完整示例

import com.hankcs.hanlp.hanlp;
import com.hankcs.hanlp.seg.common.term;
import java.util.list;
public class hanlpfulldemo {
    public static void main(string[] args) {
        string text = "清华大学位于北京市海淀区。";
        // 分词 + 词性标注
        list<term> terms = hanlp.segment(text);
        system.out.println("分词结果: " + terms);
        // 命名实体识别
        terms = hanlp.newsegment().enableplacerecognize(true).seg(text);
        for (term term : terms) {
            if (term.nature.tostring().startswith("ns")) {
                system.out.println("地名: " + term.word);
            }
        }
        // 关键词提取
        list<string> keywords = hanlp.extractkeyword(text, 3);
        system.out.println("关键词: " + keywords);
    }
}

输出：