[python]팟캐스트 크롤링 및 다운로드를 해보자~!!

일단 필요한 패키지를 pip이나 pip3를 이용해서 install 해준다.

1. pip3 install requests

2. pip3 install bs4

3. pip3 install itertools

import requests
from bs4 import BeautifulSoup
from urllib.parse import urljoin
from itertools import count
from time import sleep

def get_list(pid):

    for page in count(1):
        page_url = "http://www.podbbang.com/podbbangchnew/episode_list?"
        params = {'id' : pid, 'page' : page}
        print('page = {}'.format(page))

        r = requests.get(page_url, params=params)
        r.encoding = "utf-8"
        html = r.text

        soup = BeautifulSoup(html, 'html.parser')

        for li_tag in soup.select('li'):
            try:
                title = li_tag.find('dt')['title']
                link = urljoin(page_url, li_tag.find('a')['href'])
                print(title,link)
            except(TypeError, KeyError):
                print('END')
                return None
            else:
                headers = {'Referer': 'http://www.podbbang.com/ch/{pid}'.format(pid=pid)}
                mp3_bin = requests.get(link, headers=headers).content
                title = title.replace('.', '').replace('/', '') # title에 . , / 들어가 있는 경우 공란으로 변경
                filename = '{}.mp3'.format(title)
                with open(filename, 'wb') as f:
                    f.write(mp3_bin)
        sleep(0.5)


get_list(potcast_number) # 팟캐스트의 id를 적어준다.

get_list() 안의 숫자만 팟캐스트의 id를 적어주게 되면, 해당 폴더로 다운을 받는 걸 볼 수 있다.

파일명중에 ".", "/" 이 포함된 문자가 있을경우에는 윈도우나, 리눅스에서 에러를 뿜뿜 하기 때문에 공란으로 replace 하는 걸로 수정하였다.

저작자표시 비영리 변경금지

'배웁시다!!' 카테고리의 다른 글

[Do it!]코틀린 프로그래밍 - 1일차 (0)	2020.05.08
[python]현재파일 경로 지정 (1)	2020.04.01
[python]텍스트 파일 특정 줄 삭제(사이즈 줄이기) 후 저장 (0)	2020.03.18
[텔레그램 봇 + 파이썬] 봇으로 메세지 보내기 (0)	2019.12.10
[python] 파이썬 스케줄러 프로그램(scheduler) (0)	2019.12.09

인포 팩토리

[python]팟캐스트 크롤링 및 다운로드를 해보자~!!

'배웁시다!!' 카테고리의 다른 글

티스토리툴바

[python]팟캐스트 크롤링 및 다운로드를 해보자~!!

'배웁시다!!' 카테고리의 다른 글

관련글

티스토리툴바