soup = BeautifulSoup(html, 'html.parser') movies = soup.find_all('div', class_='item') for movie in movies: title = movie.find('span', class_='title').text rating = movie.find('span', class_='rating_num').text director = movie.find('div', class_='bd').p.text.split('\n')[1].strip().split(':')[1] actors = [actor.strip() for actor in movie.find('div', class_='bd').p.text.split('\n')[2].strip().split('/')] print(title, rating, director, actors)
上述代码中,我们首先使用BeautifulSoup库解析HTML代码,并找到所有电影的div元素。
然后,我们使用find()方法来查找每个电影的标题、评分、导演和主演等信息,并将它们保存到变量中。
保存数据到本地文件 最后,我们可以将爬取到的数据保存到本地文件中:
1 2 3 4 5 6 7 8 9 10 11
import csv
withopen('top250.csv', 'w', newline='', encoding='utf-8') as f: writer = csv.writer(f) writer.writerow(['标题', '评分', '导演', '主演']) for movie in movies: title = movie.find('span', class_='title').text rating = movie.find('span', class_='rating_num').text director = movie.find('div', class_='bd').p.text.split('\n')[1].strip().split(':')[1] actors = [actor.strip() for actor in movie.find('div', class_='bd').p.text.split('\n')[2].strip().split('/')] writer.writerow([title, rating, director, actors])