soup = BeautifulSoup(req.text, 'html.parser') tag = soup.find("ul", attrs={'class': ['f-hide']}) neteaseList = [] for song in tag.children: neteaseList.append( re.sub(u'(\\(.*?\\))|(\\(.*?\\))|(\\[.*?\\])', '', song.contents[0].text))
同样将歌曲名处理括号后保存在neteaseList列表里
数据分析
为了对比两个歌单,我打算用difflib库来对比歌曲名的相似程度
1 2 3 4 5 6 7
songList = set() for s1 in neteaseList: for s2 in qqList: t = difflib.SequenceMatcher(None, s1, s2).ratio() if t >= 0.6: songList.add(s1) results = set(neteaseList).difference(songList)
将结果写入文件
1 2 3
withopen('list.txt', 'w', encoding='utf-8') as f: for s in results: f.writelines(s + '\n')