【人工智能】Transformers之Pipeline（概述）：30w+大模型极简应用_开源

一、引言

二、pipeline库

2.1 概述

2.2 使用task实例化pipeline对象

2.2.1 基于task实例化“自动语音识别”

2.2.2 task列表

2.2.3 task默认模型

2.3 使用model实例化pipeline对象

2.3.1 基于model实例化“自动语音识别”

2.3.2 查看model与task的对应关系

三、总结

一、引言

pipeline（管道）是huggingface transformers库中一种极简方式使用大模型推理的抽象，将所有大模型分为语音（audio）、计算机视觉（computer vision）、自然语言处理（nlp）、多模态（multimodal）等4大类，28小类任务（tasks）。共计覆盖32万个模型

本文对pipeline进行整体介绍，之后本专栏以每个task为主题，分别介绍各种task使用方法。

二、pipeline库

2.1 概述

管道是一种使用模型进行推理的简单而好用的方法。这些管道是从库中抽象出大部分复杂代码的对象，提供了专用于多项任务的简单 api，包括命名实体识别、掩码语言建模、情感分析、特征提取和问答。在使用上，主要有2种方法

2.2 使用task实例化pipeline对象

2.2.1 基于task实例化“自动语音识别”

自动语音识别的task为automatic-speech-recognition：

import os
os.environ["hf_endpoint"] = "https://hf-mirror.com"
os.environ["cuda_visible_devices"] = "2"

from transformers import pipeline

speech_file = "./output_video_enhanced.mp3"
pipe = pipeline(task="automatic-speech-recognition")
result = pipe(speech_file)
print(result)

2.2.2 task列表

task共计28类，按首字母排序，列表如下，直接替换2.2.1代码中的pipeline的task即可应用：

2.2.3 task默认模型

针对每一个task，pipeline默认配置了模型，可以通过pipeline源代码查看：

supported_tasks = {
    "audio-classification": {
        "impl": audioclassificationpipeline,
        "tf": (),
        "pt": (automodelforaudioclassification,) if is_torch_available() else (),
        "default": {"model": {"pt": ("superb/wav2vec2-base-superb-ks", "372e048")}},
        "type": "audio",
    },
    "automatic-speech-recognition": {
        "impl": automaticspeechrecognitionpipeline,
        "tf": (),
        "pt": (automodelforctc, automodelforspeechseq2seq) if is_torch_available() else (),
        "default": {"model": {"pt": ("facebook/wav2vec2-base-960h", "55bb623")}},
        "type": "multimodal",
    },
    "text-to-audio": {
        "impl": texttoaudiopipeline,
        "tf": (),
        "pt": (automodelfortexttowaveform, automodelfortexttospectrogram) if is_torch_available() else (),
        "default": {"model": {"pt": ("suno/bark-small", "645cfba")}},
        "type": "text",
    },
    "feature-extraction": {
        "impl": featureextractionpipeline,
        "tf": (tfautomodel,) if is_tf_available() else (),
        "pt": (automodel,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("distilbert/distilbert-base-cased", "935ac13"),
                "tf": ("distilbert/distilbert-base-cased", "935ac13"),
            }
        },
        "type": "multimodal",
    },
    "text-classification": {
        "impl": textclassificationpipeline,
        "tf": (tfautomodelforsequenceclassification,) if is_tf_available() else (),
        "pt": (automodelforsequenceclassification,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"),
                "tf": ("distilbert/distilbert-base-uncased-finetuned-sst-2-english", "af0f99b"),
            },
        },
        "type": "text",
    },
    "token-classification": {
        "impl": tokenclassificationpipeline,
        "tf": (tfautomodelfortokenclassification,) if is_tf_available() else (),
        "pt": (automodelfortokenclassification,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"),
                "tf": ("dbmdz/bert-large-cased-finetuned-conll03-english", "f2482bf"),
            },
        },
        "type": "text",
    },
    "question-answering": {
        "impl": questionansweringpipeline,
        "tf": (tfautomodelforquestionanswering,) if is_tf_available() else (),
        "pt": (automodelforquestionanswering,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"),
                "tf": ("distilbert/distilbert-base-cased-distilled-squad", "626af31"),
            },
        },
        "type": "text",
    },
    "table-question-answering": {
        "impl": tablequestionansweringpipeline,
        "pt": (automodelfortablequestionanswering,) if is_torch_available() else (),
        "tf": (tfautomodelfortablequestionanswering,) if is_tf_available() else (),
        "default": {
            "model": {
                "pt": ("google/tapas-base-finetuned-wtq", "69ceee2"),
                "tf": ("google/tapas-base-finetuned-wtq", "69ceee2"),
            },
        },
        "type": "text",
    },
    "visual-question-answering": {
        "impl": visualquestionansweringpipeline,
        "pt": (automodelforvisualquestionanswering,) if is_torch_available() else (),
        "tf": (),
        "default": {
            "model": {"pt": ("dandelin/vilt-b32-finetuned-vqa", "4355f59")},
        },
        "type": "multimodal",
    },
    "document-question-answering": {
        "impl": documentquestionansweringpipeline,
        "pt": (automodelfordocumentquestionanswering,) if is_torch_available() else (),
        "tf": (),
        "default": {
            "model": {"pt": ("impira/layoutlm-document-qa", "52e01b3")},
        },
        "type": "multimodal",
    },
    "fill-mask": {
        "impl": fillmaskpipeline,
        "tf": (tfautomodelformaskedlm,) if is_tf_available() else (),
        "pt": (automodelformaskedlm,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("distilbert/distilroberta-base", "ec58a5b"),
                "tf": ("distilbert/distilroberta-base", "ec58a5b"),
            }
        },
        "type": "text",
    },
    "summarization": {
        "impl": summarizationpipeline,
        "tf": (tfautomodelforseq2seqlm,) if is_tf_available() else (),
        "pt": (automodelforseq2seqlm,) if is_torch_available() else (),
        "default": {
            "model": {"pt": ("sshleifer/distilbart-cnn-12-6", "a4f8f3e"), "tf": ("google-t5/t5-small", "d769bba")}
        },
        "type": "text",
    },
    # this task is a special case as it's parametrized by src, tgt languages.
    "translation": {
        "impl": translationpipeline,
        "tf": (tfautomodelforseq2seqlm,) if is_tf_available() else (),
        "pt": (automodelforseq2seqlm,) if is_torch_available() else (),
        "default": {
            ("en", "fr"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
            ("en", "de"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
            ("en", "ro"): {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
        },
        "type": "text",
    },
    "text2text-generation": {
        "impl": text2textgenerationpipeline,
        "tf": (tfautomodelforseq2seqlm,) if is_tf_available() else (),
        "pt": (automodelforseq2seqlm,) if is_torch_available() else (),
        "default": {"model": {"pt": ("google-t5/t5-base", "686f1db"), "tf": ("google-t5/t5-base", "686f1db")}},
        "type": "text",
    },
    "text-generation": {
        "impl": textgenerationpipeline,
        "tf": (tfautomodelforcausallm,) if is_tf_available() else (),
        "pt": (automodelforcausallm,) if is_torch_available() else (),
        "default": {"model": {"pt": ("openai-community/gpt2", "6c0e608"), "tf": ("openai-community/gpt2", "6c0e608")}},
        "type": "text",
    },
    "zero-shot-classification": {
        "impl": zeroshotclassificationpipeline,
        "tf": (tfautomodelforsequenceclassification,) if is_tf_available() else (),
        "pt": (automodelforsequenceclassification,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("facebook/bart-large-mnli", "c626438"),
                "tf": ("facebookai/roberta-large-mnli", "130fb28"),
            },
            "config": {
                "pt": ("facebook/bart-large-mnli", "c626438"),
                "tf": ("facebookai/roberta-large-mnli", "130fb28"),
            },
        },
        "type": "text",
    },
    "zero-shot-image-classification": {
        "impl": zeroshotimageclassificationpipeline,
        "tf": (tfautomodelforzeroshotimageclassification,) if is_tf_available() else (),
        "pt": (automodelforzeroshotimageclassification,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("openai/clip-vit-base-patch32", "f4881ba"),
                "tf": ("openai/clip-vit-base-patch32", "f4881ba"),
            }
        },
        "type": "multimodal",
    },
    "zero-shot-audio-classification": {
        "impl": zeroshotaudioclassificationpipeline,
        "tf": (),
        "pt": (automodel,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("laion/clap-htsat-fused", "973b6e5"),
            }
        },
        "type": "multimodal",
    },
    "image-classification": {
        "impl": imageclassificationpipeline,
        "tf": (tfautomodelforimageclassification,) if is_tf_available() else (),
        "pt": (automodelforimageclassification,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("google/vit-base-patch16-224", "5dca96d"),
                "tf": ("google/vit-base-patch16-224", "5dca96d"),
            }
        },
        "type": "image",
    },
    "image-feature-extraction": {
        "impl": imagefeatureextractionpipeline,
        "tf": (tfautomodel,) if is_tf_available() else (),
        "pt": (automodel,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("google/vit-base-patch16-224", "3f49326"),
                "tf": ("google/vit-base-patch16-224", "3f49326"),
            }
        },
        "type": "image",
    },
    "image-segmentation": {
        "impl": imagesegmentationpipeline,
        "tf": (),
        "pt": (automodelforimagesegmentation, automodelforsemanticsegmentation) if is_torch_available() else (),
        "default": {"model": {"pt": ("facebook/detr-resnet-50-panoptic", "fc15262")}},
        "type": "multimodal",
    },
    "image-to-text": {
        "impl": imagetotextpipeline,
        "tf": (tfautomodelforvision2seq,) if is_tf_available() else (),
        "pt": (automodelforvision2seq,) if is_torch_available() else (),
        "default": {
            "model": {
                "pt": ("ydshieh/vit-gpt2-coco-en", "65636df"),
                "tf": ("ydshieh/vit-gpt2-coco-en", "65636df"),
            }
        },
        "type": "multimodal",
    },
    "object-detection": {
        "impl": objectdetectionpipeline,
        "tf": (),
        "pt": (automodelforobjectdetection,) if is_torch_available() else (),
        "default": {"model": {"pt": ("facebook/detr-resnet-50", "2729413")}},
        "type": "multimodal",
    },
    "zero-shot-object-detection": {
        "impl": zeroshotobjectdetectionpipeline,
        "tf": (),
        "pt": (automodelforzeroshotobjectdetection,) if is_torch_available() else (),
        "default": {"model": {"pt": ("google/owlvit-base-patch32", "17740e1")}},
        "type": "multimodal",
    },
    "depth-estimation": {
        "impl": depthestimationpipeline,
        "tf": (),
        "pt": (automodelfordepthestimation,) if is_torch_available() else (),
        "default": {"model": {"pt": ("intel/dpt-large", "e93beec")}},
        "type": "image",
    },
    "video-classification": {
        "impl": videoclassificationpipeline,
        "tf": (),
        "pt": (automodelforvideoclassification,) if is_torch_available() else (),
        "default": {"model": {"pt": ("mcg-nju/videomae-base-finetuned-kinetics", "4800870")}},
        "type": "video",
    },
    "mask-generation": {
        "impl": maskgenerationpipeline,
        "tf": (),
        "pt": (automodelformaskgeneration,) if is_torch_available() else (),
        "default": {"model": {"pt": ("facebook/sam-vit-huge", "997b15")}},
        "type": "multimodal",
    },
    "image-to-image": {
        "impl": imagetoimagepipeline,
        "tf": (),
        "pt": (automodelforimagetoimage,) if is_torch_available() else (),
        "default": {"model": {"pt": ("caidas/swin2sr-classical-sr-x2-64", "4aaedcb")}},
        "type": "image",
    },
}

2.3 使用model实例化pipeline对象

2.3.1 基于model实例化“自动语音识别”

如果不想使用task中默认的模型，可以指定huggingface中的模型：

import os
os.environ["hf_endpoint"] = "https://hf-mirror.com"
os.environ["cuda_visible_devices"] = "2"

from transformers import pipeline

speech_file = "./output_video_enhanced.mp3"
#transcriber = pipeline(task="automatic-speech-recognition", model="openai/whisper-medium")
pipe = pipeline(model="openai/whisper-medium")
result = pipe(speech_file)
print(result)

2.3.2 查看model与task的对应关系

可以登录https://huggingface.co/tasks查看

三、总结

本文为transformers之pipeline专栏的第0篇，后面会以每个task为一篇，共计讲述28+个tasks的用法，通过28个tasks的pipeline使用学习，可以掌握语音、计算机视觉、自然语言处理、多模态乃至强化学习等30w+个huggingface上的开源大模型。让你成为大模型领域的专家！

期待您的3连+关注，如何还有时间，欢迎阅读我的其他文章：

《ai—工程篇》

ai智能体研发之路-工程篇（一）：docker助力ai智能体开发提效

ai智能体研发之路-工程篇（二）：dify智能体开发平台一键部署

ai智能体研发之路-工程篇（三）：大模型推理服务框架ollama一键部署

ai智能体研发之路-工程篇（四）：大模型推理服务框架xinference一键部署

ai智能体研发之路-工程篇（五）：大模型推理服务框架localai一键部署

《ai—模型篇》

ai智能体研发之路-模型篇（一）：大模型训练框架llama-factory在国内网络环境下的安装、部署及使用

ai智能体研发之路-模型篇（二）：deepseek-v2-chat 训练与推理实战

ai智能体研发之路-模型篇（三）：中文大模型开、闭源之争

ai智能体研发之路-模型篇（四）：一文入门pytorch开发