반응형
[웹 크롤러 만들기]
import requests
import urllib.request
import re
from bs4 import BeautifulSoup
URL = 'https://www.daum.net/'
headers = {'Content-Type': 'application/json; charset=utf-8'}
res = requests.get(URL, headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
i = 0
for img in soup.find_all("img"):
if img.get('src') is None:
continue
if img.get('data-src') is None:
continue
a = img.get("src").find("http")
b = img.get("data-src").find("http")
if a == -1:
i = i + 1
img_name = str(i) + ".jpg"
c = "http:" + img.get("src")
print(img_name)
urllib.request.urlretrieve(c, "./img/" + img_name)
else:
i = i + 1
img_name = str(i) + ".jpg"
print(img_name)
urllib.request.urlretrieve(img.get('src')[a:], "./img/" + img_name)
if b == -1:
i = i + 1
img_name = str(i) + ".jpg"
d = "http:" + img.get("data-src")
print(img_name)
urllib.request.urlretrieve(d, "./img/" + img_name)
else:
i = i + 1
img_name = str(i) + ".jpg"
print(img_name)
urllib.request.urlretrieve(img.get('data-src')[b:], "./img/" + img_name)
반응형
'Language_ > python' 카테고리의 다른 글
[python] requests 모듈 정리 (4) | 2018.08.19 |
---|---|
[python] 환경변수 설정 (0) | 2018.08.18 |
[pyhon] 웹 크롤러[beautifulsoup] #a태그 (0) | 2018.08.18 |
[python] 문자열 (0) | 2018.08.16 |
[python] 시작 (0) | 2018.08.16 |
댓글