1. 需求描述
现在有一个excel表格,其中包含设备字段device_id、最后使用时间字段end_time以及其他字段若干
需要将表格中的每个设备对应的最新的使用时间筛选出来,并在结果中根据最新时间筛选出4月和5月
对应的设备号列表
2. 读取excel表格
import pandas as pd # 读取 excel 文件 file_path = r"c:\users\downloads\file_record.xlsx" # 替换为你的文件路径 df = pd.read_excel(file_path) # 显示前几行数据 # print(df.head()) # print(df)
3. 筛选最新时间
先根据时间重置dataframe对象
# assuming 'df' is your dataframe and 'end_time' is initially in string format df['end_time'] = pd.to_datetime(df['end_time']) # convert to datetime if necessary
然后根据设备号分组,再取end_time中最新即最大时间值,并重置索引
# group by 'device_id' and find the max (latest) 'end_time' for each group latest_end_times = df.groupby('device_id')['end_time'].max().reset_index()
4. 筛选具体月份数据
在上面的最新时间中筛选出4月和5月的设备列表
# filter the 'latest_end_times' dataframe to only include devices with 'end_time' in april or may filtered_devices = latest_end_times[ (latest_end_times['end_time'].dt.month == 4) | (latest_end_times['end_time'].dt.month == 5) ]
5.输出结果
遍历结果中设备和时间信息
for index, row in filtered_devices.iterrows(): device_id = row['device_id'] latest_end_time = row['end_time'] print(f"device id: {device_id}, latest end time: {latest_end_time}") # 'filtered_devices' now contains the device information for which the latest 'end_time' is in april or may
6. 完整代码
完整代码如下
import pandas as pd # 读取 excel 文件 file_path = r"c:\users\downloads\file_record.xlsx" # 替换为你的文件路径 df = pd.read_excel(file_path) # 显示前几行数据 # print(df.head()) # print(df) # assuming 'df' is your dataframe and 'end_time' is initially in string format df['end_time'] = pd.to_datetime(df['end_time']) # convert to datetime if necessary # print(df.head()) # group by 'device_id' and find the max (latest) 'end_time' for each group latest_end_times = df.groupby('device_id')['end_time'].max().reset_index() # print(df) # filter the 'latest_end_times' dataframe to only include devices with 'end_time' in april or may filtered_devices = latest_end_times[ (latest_end_times['end_time'].dt.month == 4) | (latest_end_times['end_time'].dt.month == 5) ] for index, row in filtered_devices.iterrows(): device_id = row['device_id'] latest_end_time = row['end_time'] print(f"device id: {device_id}, latest end time: {latest_end_time}") # 'filtered_devices' now contains the device information for which the latest 'end_time' is in april or may
到此这篇关于python pandas读取excel数据并根据时间字段筛选数据的文章就介绍到这了,更多相关pandas读取excel数据内容请搜索代码网以前的文章或继续浏览下面的相关文章希望大家以后多多支持代码网!
发表评论