• 欢迎访问开心洋葱网站,在线教程,推荐使用最新版火狐浏览器和Chrome浏览器访问本网站,欢迎加入开心洋葱 QQ群
  • 为方便开心洋葱网用户,开心洋葱官网已经开启复制功能!
  • 欢迎访问开心洋葱网站,手机也能访问哦~欢迎加入开心洋葱多维思维学习平台 QQ群
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏开心洋葱吧~~~~~~~~~~~~~!
  • 由于近期流量激增,小站的ECS没能经的起亲们的访问,本站依然没有盈利,如果各位看如果觉着文字不错,还请看官给小站打个赏~~~~~~~~~~~~~!

python如何使用BeautifulSoup分析网页信息

python 水墨上仙 1276次浏览

python通过BeautifulSoup分析网页信息,这段python代码查找网页上的所有链接,分析所有的span标签,并查找class包含titletext的span的内容

#import the library used to query a website
import urllib2
 
#specify the url you want to query
url = "http://www.75271.com"
 
#Query the website and return the html to the variable 'page'
page = urllib2.urlopen(url)
 
#import the Beautiful soup functions to parse the data returned from the website
from BeautifulSoup import BeautifulSoup
 
#Parse the html in the 'page' variable, and store it in Beautiful Soup format
soup = BeautifulSoup(page)
 
#to print the soup.head is the head tag and soup.head.title is the title tag
print soup.head
print soup.head.title
 
#to print the length of the page, use the len function
print len(page)
 
#create a new variable to store the data you want to find.
tags = soup.findAll('a')
 
#to print all the links
print tags
 
#to get all titles and print the contents of each title
titles = soup.findAll('span', attrs = { 'class' : 'titletext' })
for title in allTitles:
    print title.contents


开心洋葱 , 版权所有丨如未注明 , 均为原创丨未经授权请勿修改 , 转载请注明python如何使用BeautifulSoup分析网页信息
喜欢 (0)
加载中……