今天來演示一個,獲取IP海的代理IP列表
代理IP有什么用呢? 代理IP是做爬蟲的是常常用到的東西,它可以讓我們規避被爬蟲,服務器上的反爬蟲機制;還有一個方法也可以規避那就是隨機改變UA,當然兩種方式一起用那是效果最佳的;
?
好了話不多說,直接上代碼,代碼上已經詳細注釋了;看代碼即可!
''睡眠延遲函數
?Declare?PtrSafe?Sub?Sleep?Lib?"kernel32"?(ByVal?dwMilliseconds?As?Long)
Function?取得網頁源碼(Optional?ByVal?pages?As?Integer?=?1)?As?String
????On?Error?GoTo?er:
????Dim?iurl?As?String:?iurl?=?"https://www.kuaidaili.com/free/inha/"?&?pages
????''讀取網頁源碼
????With?CreateObject("WinHttp.WinHttpRequest.5.1")?''請求對象
????????.Open?"GET",?iurl,?False?''請求參數
????????.send?''發送請求
????????''取得源碼
????????strText?=?.responseText
????????取得網頁源碼?=?strText
????End?With
????Exit?Function
er:
????取得網頁源碼?=?"查詢出錯啦:"?&?Err.Description
End?Function
Sub?解析網頁源碼()
????Dim?sht?As?Worksheet:?Set?sht?=?Worksheets("IP地址池")
????sht.Range("A1:AA65536").ClearContents
????
????''測試取5頁數據
????For?p?=?1?To?5
????????''解析html
????????Dim?xmldocstr?As?String:?xmldocstr?=?取得網頁源碼(p)
????????Dim?HTMLDoc?As?Object,?TDElements?As?Object
????????Set?HTMLDoc?=?CreateObject("htmlfile")
????????''大致判斷內容
????????If?Len(xmldocstr)?<?100?Then?Exit?Sub
????????HTMLDoc.body.innerhtml?=?xmldocstr
????????''定位html表格
????????Set?TDElements?=?HTMLDoc.getElementById("list")
????????Dim?infotb?As?Object
????????Set?infotb?=?TDElements.Children(1)
????????''讀取表頭
????????Dim?heads?As?Object:?Set?heads?=?infotb.Children(0).Children(0)
????????For?j?=?0?To?heads.Cells.Length?-?1
????????????''數據表頭寫入表格
????????????sht.Cells(1,?j?+?1)?=?heads.Children(j).innertext
????????????DoEvents
????????Next
????????''讀取內容
????????Dim?Contents?As?Object:?Set?Contents?=?infotb.Children(1)
????????For?i?=?0?To?Contents.Rows.Length?-?1
????????????Dim?Content?As?Object:?Set?Content?=?Contents.Children(i)
????????????''取得實際行數
????????????Dim?rw?As?Integer:?rw?=?sht.Range("A65536").End(xlUp).Row
????????????DoEvents
????????????For?k?=?0?To?Content.Cells.Length?-?1
????????????????''數據內容寫入表格
????????????????sht.Cells(rw?+?1,?k?+?1)?=?Content.Children(k).innertext
????????????????DoEvents
????????????Next
????????????DoEvents
????????Next
????????Sleep?800?''如果無法獲取第二頁內容,請把延遲秒數調大一點
????????DoEvents
????????
????Next
End?Sub
注意爬蟲千萬不要涉嫌隱私問題,最好遵循Robots協議!
文章來源:https://mp.weixin.qq.com/s/ZMborUHj6p4hkNFt3LR10w