看有人在发afreectv,是时候把爬虫放出来了!
导读
今天日常点卯, 被一位佬哥的标题吸引了注意力。
开始以为是短视频平台上很火的变装秀,打开却看到了熟悉的直播平台。
于是想起来自己去年的一个不成熟的脚本,特发来供大家使用。
爬取内容是主播主页的vod视频,结果是一个txt:https://ls小东西lb.lanzoub.com/iFMg40k68t9i
试播:https://vod-archive-global-cdn-z02.afreecatv.com/v101/hls/se/afreeca/station/2021/0524/01/1621786468833836.失眠il/original/both/playlist.m3u8
抛砖引玉,大佬们给完善完善
import requests,re,time,random
findre=re.compile('失眠il:(.*?)/playlist.m3u8')
hd={'cookie':'','User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36','accept':'text/html,application/xhtml+xml,application/xml;q=0.9,image/if,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9','accept-encoding':'gzip, deflate, br','accept-language': 'zh-CN,zh;q=0.9,en-US;q=0.8,en;q=0.7,zh-HK;q=0.6','sec-ch-ua':'"Not?A_Brand";v="8", "Chromium";v="108", "Google Chrome";v="108"','sec-ch-ua-platform': '"Windows"','sec-fetch-site': 'same-site'}
def getpageinfo(bjname):
url='https://bjapi.afreecatv.com/api/{0}/vods/all?page=1&per_page=60'.format(bjname)
indexpage=requests.get(url,headers=hd)
totalpage=indexpage.json()['meta']['last_page']
print(totalpage)
return totalpage
def getallvodinfo(bjname,sid,bbsid,tid):
try:
resapi='https://stbbs.afreecatv.com/api/video/get_video_info.php?szBjId={0}&nStationNo={1}&nBbsNo={2}&nTitleNo={3}'.format(bjname,sid,bbsid,tid)
infos=requests.get(resapi,headers=hd)
playurl=findre.findall(infos.text)[0]
print(playurl)
realurl='https://vod-archive-global-cdn-z02.afreecatv.com/v101/hls/'+playurl+'/original/both/playlist.m3u8'
return realurl
except:
print(str(tid)+'err!')
return str(tid)+'err!'
def getpagecnt(bjname,page):
baseurl='https://bjapi.afreecatv.com/api/'+bjname+'/vods/all?per_page=60&page='
pagecnt=requests.get(baseurl+str(i),headers=hd).json()
with open(bjname+'.txt','a',encoding='utf-8') as wt:
for p in pagecnt['data']:
u=getallvodinfo(bjname, p['station_no'],p['bbs_no'], p['title_no'])
wt.write(u+' ')
time.sleep(random.randint(10, 40))
if __name__=='__main__':
bjname=input('bjname:')
startpage=input('起始页码:') or '1'
totalpage=getpageinfo(bjname)
for i in range(int(startpage),totalpage+1):
print('当前是第{0}页'.format(str(i)))
getpagecnt(bjname, str(i))
time.sleep(random.randint(30, 120))复制代码小白一枚,大佬勿喷!