Python多线程编程的核心概念与实践方法_Python

一、什么是多线程？

线程（thread）是操作系统能够进行运算调度的最小单位。一个进程可以包含多个线程，这些线程共享进程的资源（如内存、文件句柄等），但每个线程有自己独立的执行栈和程序计数器。

多线程就是在一个程序中同时运行多个线程，让它们"看起来"在同时执行不同的任务。

二、为什么要用多线程

1. 提升i/o密集型任务的效率

当程序需要等待网络请求、数据库查询、文件读写等i/o操作时，单线程会一直"傻等"。而多线程可以让一个线程等待时，另一个线程继续执行其他任务。

举例：

单线程：下载10张图片，每张等待1秒，总共10秒
多线程：同时下载10张图片，总共约1秒

2. 提高用户体验

在gui程序中，如果主线程被耗时操作阻塞，界面会"假死"。用多线程可以让后台任务运行，界面保持响应。

三、python中的多线程模块

python提供了两个主要的多线程模块：

模块	说明	适用场景
`threading`	高级模块，基于线程	通用多线程编程
`_thread`	低级模块（很少直接用）	底层控制

四、快速上手：创建线程

方法1：使用threading.thread

import threading
import time

def task(name, seconds):
    print(f"线程 {name} 开始执行")
    time.sleep(seconds)
    print(f"线程 {name} 执行完毕")

# 创建线程
t1 = threading.thread(target=task, args=("a", 2))
t2 = threading.thread(target=task, args=("b", 1))

# 启动线程
t1.start()
t2.start()

# 等待线程结束
t1.join()
t2.join()

print("所有线程执行完毕")

输出：

线程 a 开始执行
线程 b 开始执行
线程 b 执行完毕
线程 a 执行完毕
所有线程执行完毕

方法2：继承threading.thread类

import threading
import time

class mythread(threading.thread):
    def __init__(self, name):
        super().__init__()
        self.name = name
    
    def run(self):
        print(f"线程 {self.name} 开始")
        time.sleep(2)
        print(f"线程 {self.name} 结束")

t = mythread("自定义线程")
t.start()
t.join()

五、线程 vs 进程

特性	线程（thread）	进程（process）
创建开销	小	大
内存共享	共享进程内存	独立内存空间
切换速度	快	慢
gil限制	受gil影响（cpu密集型无效）	不受gil影响
适用场景	i/o密集型	cpu密集型

六、gil（全局解释器锁）是什么？

gil（global interpreter lock） 是python的一个机制，它保证同一时刻只有一个线程在执行python字节码。

影响：

i/o密集型任务：多线程依然有效（等待时会释放gil）
cpu密集型任务：多线程不会提升性能，反而可能更慢

解决方案：

使用 multiprocessing 模块（多进程）
使用 concurrent.futures.threadpoolexecutor（线程池）
使用 asyncio（异步编程）

七、线程同步：锁（lock）

当多个线程同时修改共享数据时，可能会出现数据不一致的问题。这时需要用锁来保证线程安全。

import threading

balance = 0
lock = threading.lock()

def deposit():
    global balance
    for _ in range(100000):
        lock.acquire()  # 加锁
        try:
            balance += 1
        finally:
            lock.release()  # 解锁

t1 = threading.thread(target=deposit)
t2 = threading.thread(target=deposit)

t1.start()
t2.start()
t1.join()
t2.join()

print(f"最终余额: {balance}")  # 正确输出 200000

不加锁的后果：

# 去掉 lock，结果可能是 199847（数据不一致）

八、线程池：管理线程的最佳实践

手动创建和销毁线程效率低，推荐使用线程池。

使用threadpoolexecutor

from concurrent.futures import threadpoolexecutor
import time

def task(n):
    print(f"任务 {n} 开始")
    time.sleep(1)
    return f"任务 {n} 完成"

# 创建线程池（最多5个线程）
with threadpoolexecutor(max_workers=5) as executor:
    # 提交10个任务
    futures = [executor.submit(task, i) for i in range(10)]
    
    # 获取结果
    for future in futures:
        print(future.result())

优势：

自动管理线程生命周期
限制并发数量，避免资源耗尽
简洁的api

九、实际案例：批量下载图片

import threading
import requests
from concurrent.futures import threadpoolexecutor

def download_image(url):
    try:
        response = requests.get(url, timeout=10)
        filename = url.split("/")[-1]
        with open(filename, "wb") as f:
            f.write(response.content)
        print(f"✅ 下载完成: {filename}")
    except exception as e:
        print(f"❌ 下载失败: {url}, 错误: {e}")

urls = [
    "https://example.com/image1.jpg",
    "https://example.com/image2.jpg",
    "https://example.com/image3.jpg",
]

# 使用线程池并发下载
with threadpoolexecutor(max_workers=3) as executor:
    executor.map(download_image, urls)

十、常见错误及解决方案

错误	原因	解决方案
`runtimeerror: can't start new thread`	线程数超过系统限制	使用线程池，限制并发数
数据不一致	多线程同时修改共享数据	使用 `lock` 锁
程序假死	主线程被阻塞	将耗时任务放到子线程
cpu密集型任务变慢	gil限制	改用 `multiprocessing` 多进程

十一、最佳实践总结

优先使用线程池（threadpoolexecutor），不要手动创建线程
i/o密集型任务用多线程，cpu密集型用多进程
共享数据必须加锁（threading.lock）
设置合理的线程数（一般不超过 cpu核心数 × 2）
避免在子线程中操作gui（会崩溃）
不要过度创建线程（会耗尽系统资源）

十二、进阶学习路线

阶段	内容
入门	`threading.thread`、`lock`、`threadpoolexecutor`
进阶	`queue.queue`（线程间通信）、`event`、`condition`
高级	`asyncio` 异步编程、`multiprocessing` 多进程
实战	爬虫并发、web服务器、gui后台任务