使用Python在本地完整部署Stable Diffusion的操作指南_Python

想要 ai 绘图却担心隐私泄露？不想付费调用 api？本文带你用 python 在本地完整部署 stable diffusion，从环境搭建到出图，手把手搞定属于自己的 ai 画师。

一、stable diffusion 是什么？

stable diffusion（sd）是一个开源的文本生成图像（text-to-image）深度学习模型。只需输入文字描述，就能生成高质量的图片。

1.1 为什么选择本地部署？

┌────────────────────────────────────────────────────────┐
│          云服务 vs 本地部署 对比                         │
├──────────────┬─────────────────┬───────────────────────┤
│    维度       │   云服务(midjourney等)  │   本地部署(sd)  │
├──────────────┼─────────────────┼───────────────────────┤
│ 费用          │ 按月订阅/按次计费 │ 一次性硬件投入         │
│ 隐私          │ 数据上传云端     │ ✅ 完全本地，无泄露风险 │
│ 自由度        │ 受平台审核限制   │ ✅ 无限制               │
│ 可定制性      │ 固定模型         │ ✅ 任意切换模型/lora    │
│ api 集成      │ 受限/需付费      │ ✅ python 完全控制      │
│ 硬件要求      │ 无              │ 需要 nvidia gpu        │
└──────────────┴─────────────────┴───────────────────────┘

1.2 sd 版本演进

sd 1.4 (2022) → sd 1.5 (2022) → sd 2.0/2.1 (2022) → sdxl (2023) → sd 3.0 (2024) → sd 3.5 (2024)
   │                │                                      │
   └── 经典稳定 ─────┘                                      └── 高质量，推荐使用

二、硬件与环境要求

2.1 最低 / 推荐配置

硬件	最低要求	推荐配置
gpu	nvidia 8gb vram	nvidia 12gb+ vram (rtx 3060+)
内存	16 gb	32 gb
硬盘	10 gb	50 gb+ ssd
cuda	11.7+	12.x

2.2 环境搭建

# 1. 创建 python 虚拟环境（推荐 python 3.10）
conda create -n sd python=3.10 -y
conda activate sd
# 2. 安装 pytorch（cuda 12.1 版本）
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
# 3. 验证 cuda 是否可用
python -c "import torch; print(f'cuda: {torch.cuda.is_available()}, gpu: {torch.cuda.get_device_name(0)}')"
# 输出示例: cuda: true, gpu: nvidia geforce rtx 4060

三、方案一：diffusers 库 —— 适合开发者

diffusers 是 huggingface 官方的扩散模型库，纯 python api 调用，最适合 python 开发者集成到自己的项目中。

3.1 安装

pip install diffusers transformers accelerate safetensors

3.2 基础文生图（text-to-image）

import torch
from diffusers import stablediffusionpipeline
def basic_text2img(prompt: str, output_path: str = "output.png"):
    """
    基础文生图 —— 输入文字，输出图片
    流程: 文字描述 → clip 编码 → unet 去噪 → vae 解码 → 图片
    """
    # 加载模型（首次会自动下载，约 4~7 gb）
    pipe = stablediffusionpipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5",
        torch_dtype=torch.float16,       # 半精度，节省显存
        safety_checker=none              # 关闭安全检查（可选）
    )
    pipe.to("cuda")
    # 开启显存优化
    pipe.enable_attention_slicing()      # 分块注意力，减少显存占用
    # 生成图片
    image = pipe(
        prompt=prompt,
        num_inference_steps=30,          # 去噪步数（20~50，越多越精细）
        guidance_scale=7.5,              # cfg 引导系数（7~12，越大越贴合描述）
        width=512,
        height=512
    ).images[0]
    image.save(output_path)
    print(f"图片已保存: {output_path}")
    return image
# 使用示例
basic_text2img(
    "a beautiful sunset over the ocean, highly detailed, 4k, photorealistic",
    "sunset.png"
)

3.3 使用 sdxl 模型（更高质量）

from diffusers import stablediffusionxlpipeline
def sdxl_text2img(prompt: str, output_path: str = "sdxl_output.png"):
    """
    sdxl 文生图 —— 质量远超 sd 1.5
    sdxl 优势:
    - 默认 1024x1024 分辨率
    - 更好的文字理解能力
    - 更真实的色彩和细节
    """
    pipe = stablediffusionxlpipeline.from_pretrained(
        "stabilityai/stable-diffusion-xl-base-1.0",
        torch_dtype=torch.float16,
        variant="fp16"
    )
    pipe.to("cuda")
    # 显存优化（sdxl 模型更大，优化更必要）
    pipe.enable_attention_slicing()
    pipe.enable_vae_slicing()
    image = pipe(
        prompt=prompt,
        negative_prompt="blurry, low quality, distorted, deformed",
        num_inference_steps=40,
        guidance_scale=8.0,
        width=1024,
        height=1024
    ).images[0]
    image.save(output_path)
    print(f"sdxl 图片已保存: {output_path}")
    return image

3.4 关键参数详解

┌─────────────────────────────────────────────────────────────┐
│                  stable diffusion 核心参数                    │
├─────────────────┬───────────────────────────────────────────┤
│ 参数             │ 说明                                      │
├─────────────────┼───────────────────────────────────────────┤
│ prompt          │ 正向提示词：描述你想要的画面                  │
│ negative_prompt │ 反向提示词：描述你不想要的元素                │
│ num_inference_steps │ 去噪步数：20~50（↑ 质量 ↑ 速度 ↓）     │
│ guidance_scale  │ cfg 值：7~12（↑ 越贴合文字 ↑ 画面可能僵硬） │
│ width / height  │ 图片尺寸：512/768/1024                     │
│ seed            │ 随机种子：固定种子可复现同一张图              │
└─────────────────┴───────────────────────────────────────────┘

3.5 显存不足的解决方案

def low_vram_generate(pipe, prompt, output_path="output.png"):
    """
    低显存生成方案（4~6 gb vram 也能跑）
    """
    # 1. cpu 卸载：模型按需加载到 gpu，用完即卸回 cpu
    pipe.enable_model_cpu_offload()
    # 2. 分块注意力：降低注意力计算的峰值显存
    pipe.enable_attention_slicing()
    # 3. vae 分块：vae 解码时分块处理
    pipe.enable_vae_slicing()
    # 4. 降低分辨率
    image = pipe(
        prompt=prompt,
        width=512,
        height=512,
        num_inference_steps=20   # 减少步数
    ).images[0]
    image.save(output_path)

四、方案二：stable diffusion webui —— 适合非开发者

如果你更偏好图形界面操作，automatic1111 的 webui 是最受欢迎的选择。

4.1 安装 webui

# 1. 克隆仓库
git clone https://github.com/automatic1111/stable-diffusion-webui.git
cd stable-diffusion-webui
# 2. windows 用户直接运行
./webui.bat
# 3. linux 用户
./webui.sh

启动后浏览器自动打开 http://127.0.0.1:7860，即可看到 web 界面。

4.2 通过 python 调用 webui api

import requests
import base64
from pathlib import path
def sd_webui_api(prompt: str, output_path: str = "api_output.png"):
    """
    通过 api 调用本地 webui
    先启动 webui: ./webui.bat --api
    """
    url = "http://127.0.0.1:7860/sdapi/v1/txt2img"
    payload = {
        "prompt": prompt,
        "negative_prompt": "blurry, low quality, deformed",
        "steps": 30,
        "cfg_scale": 7.5,
        "width": 512,
        "height": 512,
        "sampler_name": "dpm++ 2m karras",
        "seed": -1  # -1 表示随机
    }
    response = requests.post(url, json=payload)
    result = response.json()
    # 解码并保存图片
    image_data = base64.b64decode(result["images"][0])
    with open(output_path, "wb") as f:
        f.write(image_data)
    print(f"api 生成完成: {output_path}")
    # 返回种子值，方便复现
    info = result.get("info", {})
    print(f"seed: {info.get('seed', 'unknown')}")
# 使用
sd_webui_api("a cyberpunk city at night, neon lights, rain, 4k")

五、模型管理

5.1 下载模型

"""
模型下载指南
推荐模型来源:
1. huggingface: https://huggingface.co/models?pipeline_tag=text-to-image
2. civitai:     https://civitai.com （最大的 sd 模型社区）
常用模型:
"""
models = {
    # ---- sd 1.5 系列 ----
    "sd15": "runwayml/stable-diffusion-v1-5",
    "anything-v5": "stablediffusionapi/anything-v5",          # 二次元风格
    "realistic-vision": "sg161222/realistic_vision_v5.1",     # 真实人像
    # ---- sdxl 系列 ----
    "sdxl": "stabilityai/stable-diffusion-xl-base-1.0",
    "sdxl-turbo": "stabilityai/sdxl-turbo",                   # 快速生成
    "juggernaut-xl": "rundiffusion/juggernaut-xl-v9",        # 综合高质量
    # ---- sd 3.0+ ----
    "sd3": "stabilityai/stable-diffusion-3-medium",
}

5.2 加载本地模型

from diffusers import stablediffusionpipeline

def load_local_model(model_path: str):
    """
    加载本地模型文件（.safetensors 或 huggingface 格式）

    model_path 可以是:
    - 本地文件夹路径: "./models/sd15"
    - huggingface repo: "runwayml/stable-diffusion-v1-5"
    - 本地 .safetensors 文件: "./models/model.safetensors"
    """
    pipe = stablediffusionpipeline.from_single_file(
        model_path,
        torch_dtype=torch.float16
    )
    pipe.to("cuda")
    return pipe

5.3 lora 微调风格

def apply_lora(pipe, lora_path: str, lora_scale: float = 0.8):
    """
    加载 lora 风格微调

    lora: 轻量级适配器，可以在不修改基础模型的情况下改变画风
    例如: 动漫风、水彩风、某位画师风格等
    """
    pipe.load_lora_weights(lora_path)

    # 生成时通过 cross_attention_kwargs 控制强度
    image = pipe(
        prompt="a girl in a garden, masterpiece",
        cross_attention_kwargs={"scale": lora_scale}
    ).images[0]
    return image

六、高级功能

6.1 图生图（image-to-image）

from diffusers import stablediffusionimg2imgpipeline
from pil import image

def image_to_image(init_image_path: str, prompt: str, strength: float = 0.7):
    """
    图生图：在已有图片基础上进行 ai 重绘

    参数:
        init_image_path: 原始图片路径
        prompt: 重绘提示词
        strength: 重绘强度 (0.0~1.0)
                  0.3 → 轻微修改
                  0.7 → 中等改变
                  0.9 → 几乎完全重绘
    """
    init_image = image.open(init_image_path).convert("rgb")
    init_image = init_image.resize((512, 512))

    pipe = stablediffusionimg2imgpipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5",
        torch_dtype=torch.float16
    ).to("cuda")

    image = pipe(
        prompt=prompt,
        image=init_image,
        strength=strength,
        num_inference_steps=30
    ).images[0]

    image.save("img2img_output.png")
    print("图生图完成")
    return image

# 示例: 将照片转为油画风格
image_to_image("photo.jpg", "oil painting style, masterpiece, highly detailed")

6.2 图片局部重绘（inpainting）

from diffusers import stablediffusioninpaintpipeline

def inpaint(image_path, mask_path, prompt):
    """
    局部重绘：只修改图片中被遮罩覆盖的区域

    应用场景:
    - 移除图片中的某个物体
    - 替换背景
    - 修改人物服装
    """
    image = image.open(image_path).resize((512, 512))
    mask = image.open(mask_path).convert("l").resize((512, 512))

    pipe = stablediffusioninpaintpipeline.from_pretrained(
        "runwayml/stable-diffusion-inpainting",
        torch_dtype=torch.float16
    ).to("cuda")

    result = pipe(
        prompt=prompt,
        image=image,
        mask_image=mask,
        num_inference_steps=30
    ).images[0]

    result.save("inpaint_output.png")

6.3 controlnet —— 精准控制构图

from diffusers import stablediffusioncontrolnetpipeline, controlnetmodel
from diffusers.utils import load_image

def controlnet_canny(image_path: str, prompt: str):
    """
    controlnet: 通过边缘检测图控制 ai 生成的构图

    支持多种控制模式:
    - canny 边缘: 精确轮廓控制
    - depth 深度: 空间结构控制
    - pose 姿态: 人体姿态控制
    - scribble: 涂鸦草图控制
    """
    # 加载 controlnet 模型
    controlnet = controlnetmodel.from_pretrained(
        "lllyasviel/sd-controlnet-canny",
        torch_dtype=torch.float16
    )

    pipe = stablediffusioncontrolnetpipeline.from_pretrained(
        "runwayml/stable-diffusion-v1-5",
        controlnet=controlnet,
        torch_dtype=torch.float16
    ).to("cuda")

    # 加载输入图片并提取边缘
    input_image = load_image(image_path)

    import cv2
    import numpy as np

    image_np = np.array(input_image)
    canny_image = cv2.canny(image_np, 100, 200)
    canny_image = image.fromarray(canny_image)

    # 生成
    image = pipe(
        prompt=prompt,
        image=canny_image,
        num_inference_steps=30,
        guidance_scale=7.5
    ).images[0]

    image.save("controlnet_output.png")

七、批量生成与自动化

7.1 批量生成不同风格

def batch_generate(pipe, base_prompt: str, styles: list, output_dir: str = "./outputs"):
    """
    批量生成不同风格的图片
    """
    import os
    os.makedirs(output_dir, exist_ok=true)

    for i, style in enumerate(styles):
        prompt = f"{base_prompt}, {style}"
        image = pipe(
            prompt=prompt,
            num_inference_steps=30,
            guidance_scale=7.5
        ).images[0]

        filename = f"{output_dir}/gen_{i:03d}_{style.replace(' ', '_')}.png"
        image.save(filename)
        print(f"[{i+1}/{len(styles)}] 已生成: {filename}")

# 使用
pipe = stablediffusionpipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5",
    torch_dtype=torch.float16
).to("cuda")

batch_generate(
    pipe,
    base_prompt="a beautiful mountain landscape",
    styles=[
        "oil painting style",
        "watercolor painting",
        "anime style",
        "photorealistic, 8k",
        "cyberpunk neon",
        "studio ghibli style"
    ]
)

7.2 固定种子复现结果

def reproducible_generation(pipe, prompt: str, seed: int = 42):
    """
    固定随机种子，确保每次生成完全相同的图片
    适用于: 测试、对比不同提示词效果、分享参数
    """
    import torch

    generator = torch.generator("cuda").manual_seed(seed)

    image = pipe(
        prompt=prompt,
        generator=generator,
        num_inference_steps=30
    ).images[0]

    image.save(f"seed_{seed}.png")
    return image

八、prompt 提示词工程

8.1 提示词结构模板

┌─────────────────────────────────────────────────────────────┐
│                   提示词万能公式                              │
│                                                             │
│  [主体] + [场景/环境] + [光线] + [风格] + [画质词]           │
│                                                             │
│  示例:                                                      │
│  a young woman (主体)                                       │
│  standing in a cherry blossom garden (场景)                  │
│  golden hour lighting, soft shadows (光线)                   │
│  oil painting style (风格)                                   │
│  masterpiece, best quality, highly detailed, 4k (画质词)     │
└─────────────────────────────────────────────────────────────┘

8.2 常用画质增强词

quality_boosters = [
    # 通用画质
    "masterpiece", "best quality", "highly detailed",
    "sharp focus", "8k uhd", "high resolution",

    # 光影
    "cinematic lighting", "volumetric lighting",
    "golden hour", "dramatic lighting",

    # 摄影效果
    "bokeh", "depth of field", "film grain",
    "dslr, 35mm lens", "raw photo",
]

negative_prompt = (
    "blurry, low quality, low resolution, "
    "deformed, distorted, disfigured, bad anatomy, "
    "bad hands, missing fingers, extra fingers, "
    "watermark, text, logo, signature"
)

九、完整工具类封装

"""
sdtoolkit —— 一个封装好的 stable diffusion 工具类
"""
import torch
from pil import image
from pathlib import path
from diffusers import stablediffusionpipeline


class sdtoolkit:
    """stable diffusion 本地生成工具包"""

    def __init__(self, model_name: str = "runwayml/stable-diffusion-v1-5"):
        print(f"正在加载模型: {model_name} ...")
        self.pipe = stablediffusionpipeline.from_pretrained(
            model_name,
            torch_dtype=torch.float16,
            safety_checker=none
        )
        self.pipe.to("cuda")
        self.pipe.enable_attention_slicing()
        print("模型加载完成！")

    def generate(
        self,
        prompt: str,
        negative_prompt: str = "",
        width: int = 512,
        height: int = 512,
        steps: int = 30,
        cfg_scale: float = 7.5,
        seed: int = -1,
        output_path: str = none
    ) -> image.image:
        """生成图片"""
        generator = none
        if seed != -1:
            generator = torch.generator("cuda").manual_seed(seed)

        result = self.pipe(
            prompt=prompt,
            negative_prompt=negative_prompt,
            width=width,
            height=height,
            num_inference_steps=steps,
            guidance_scale=cfg_scale,
            generator=generator
        )

        image = result.images[0]

        if output_path:
            path(output_path).parent.mkdir(parents=true, exist_ok=true)
            image.save(output_path)
            print(f"已保存: {output_path}")

        return image

    def batch_generate(self, prompts: list, output_dir: str = "./outputs"):
        """批量生成"""
        path(output_dir).mkdir(exist_ok=true)
        images = []
        for i, prompt in enumerate(prompts):
            path = f"{output_dir}/img_{i:03d}.png"
            img = self.generate(prompt=prompt, output_path=path)
            images.append(img)
        print(f"\n批量生成完成！共 {len(images)} 张图片")
        return images


# ===== 使用示例 =====
if __name__ == "__main__":
    sd = sdtoolkit("runwayml/stable-diffusion-v1-5")

    # 单张生成
    sd.generate(
        prompt="a cute cat wearing a tiny hat, masterpiece, best quality",
        negative_prompt="blurry, deformed, bad anatomy",
        output_path="outputs/cat.png"
    )

    # 批量生成
    sd.batch_generate([
        "a serene lake at sunrise, photorealistic, 8k",
        "a cyberpunk street, neon lights, rain, highly detailed",
        "a medieval castle on a cliff, fantasy art, epic",
        "a steaming cup of coffee on a wooden table, cozy atmosphere"
    ])

十、常见问题排查

┌──────────────────────────────────────────────────────────────┐
│                    常见问题 & 解决方案                         │
├─────────────────────────────┬────────────────────────────────┤
│ 问题                         │ 解决方案                       │
├─────────────────────────────┼────────────────────────────────┤
│ cuda out of memory          │ 减小分辨率/降低 steps/开启      │
│                             │ attention_slicing              │
├─────────────────────────────┼────────────────────────────────┤
│ 生成的图片全黑               │ 检查 safety_checker 是否误判;  │
│                             │ 调整 prompt 避开敏感词          │
├─────────────────────────────┼────────────────────────────────┤
│ 图片模糊/质量差              │ 提高 steps(30+); 添加画质词;   │
│                             │ 尝试更好的模型(sdxl)            │
├─────────────────────────────┼────────────────────────────────┤
│ 手指/人脸畸形               │ 使用负面提示词; 尝试 sdxl;     │
│                             │ 后期用 inpaint 修复             │
├─────────────────────────────┼────────────────────────────────┤
│ 下载模型太慢                │ 使用镜像站; 手动下载 .safeten-  │
│                             │ sors 文件到本地                 │
├─────────────────────────────┼────────────────────────────────┤
│ 生成速度太慢                │ 使用 sdxl-turbo; 降低 steps;   │
│                             │ 开启 xformers; 升级 gpu        │
└─────────────────────────────┴────────────────────────────────┘

十一、学习路线

入门 ────────────────────────────────────────────── 进阶
sd 1.5 文生图
  │
  ├── 掌握提示词工程
  ├── 学会参数调优
  │
  ▼
sdxl 高质量生成
  │
  ├── 图生图 (img2img)
  ├── 局部重绘 (inpainting)
  │
  ▼
controlnet 精准控制
  │
  ├── canny / depth / pose
  ├── lora 风格微调
  │
  ▼
进阶玩法
  │
  ├── comfyui 工作流
  ├── 训练自定义 lora
  ├── 视频生成 (svd)
  └── 集成到 web 应用

总结

本文完整覆盖了 stable diffusion 本地部署的核心内容：

diffusers 纯 python api 方案 —— 适合开发者集成
webui 图形界面方案 —— 适合快速上手和 api 调用
高级功能：图生图、局部重绘、controlnet
批量生成、显存优化、提示词工程

以上就是使用python在本地完整部署stable diffusion的操作指南的详细内容，更多关于python本地部署stable diffusion的资料请关注代码网其它相关文章！

使用Python在本地完整部署Stable Diffusion的操作指南

2026年04月12日 • Python •我要评论