在Python中解析并清理存储小时的文本块
发布时间:2020-12-20 12:03:07 所属栏目:Python 来源:网络整理
导读:我正在抓一个网站以下面的格式提取商店营业时间: """HoursMonday 9:30 AM - 9:00 PMTuesday 9:30 AM - 9:00 PMWednesday 9:30 AM - 9:00 PMThursday 9:30 AM - 9:00 PMFriday 9:30 AM - 11:00 PMSaturday 9:30 AM - 11:00 PMSunday 11:00 AM - 6:00 PMHolid
我正在抓一个网站以下面的格式提取商店营业时间:
"""Hours Monday 9:30 AM - 9:00 PM Tuesday 9:30 AM - 9:00 PM Wednesday 9:30 AM - 9:00 PM Thursday 9:30 AM - 9:00 PM Friday 9:30 AM - 11:00 PM Saturday 9:30 AM - 11:00 PM Sunday 11:00 AM - 6:00 PM Holiday Hours Thanksgiving Day 11:00 AM - 6:00 PM""" 我想要处理它最终如此: """Mon-Thu 9:30AM-9:00PM Fri-Sat 9:30AM-11:00PM Sun & Hol 11:00AM-6:00PM""" 我很乐意为了学习和建立自己而采用一种提议的伪代码解决方案.我只是无法在这里解决任何问题. 解决方法
我认为这是
itertools.groupby() 的一个很好的用例 – 我们可以用它来对连续几天进行相同的时间范围分组.这些方面的东西:
from itertools import groupby from operator import itemgetter from pprint import pprint data = """Hours Monday 9:30 AM - 9:00 PM Tuesday 9:30 AM - 9:00 PM Wednesday 9:30 AM - 9:00 PM Thursday 9:30 AM - 9:00 PM Friday 9:30 AM - 11:00 PM Saturday 9:30 AM - 11:00 PM Sunday 11:00 AM - 6:00 PM Holiday Hours Thanksgiving Day 11:00 AM - 6:00 PM""" # filter relevant rows with weekdays only rows = [row.split(" ",1) for row in data.splitlines()[1:-2]] # group consecutive days by a time range result = [] for time_range,group in groupby(rows,key=itemgetter(1)): days_in_group = [item[0] for item in group] first_day,last_day = days_in_group[0][:3],days_in_group[-1][:3] range_end = "-" + str(last_day) if first_day != last_day else "" result.append("{begin}{end} {time_range}".format(begin=first_day,end=range_end,time_range=time_range)) pprint(result) 打印: ['Mon-Thu 9:30 AM - 9:00 PM','Fri-Sat 9:30 AM - 11:00 PM','Sun 11:00 AM - 6:00 PM'] 请注意,如果每一天都有不同的时间范围,这甚至可以工作. (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |