Use HTML Parser To Extract Links From Web Page
|
|
|
|
|
This article will describe how you can use our HTML Parser library HTMLParser.Net to
parse and analyze a web page to extract all outgoing links like Image, PageLinks, FTP, Mail etc. The library does all
the hard work for you to create nice hierarchical view of all the tags. Only thing that you need to specify is what specific information
you are interested in extracting.
The process of extracting links can be achieved by writing 3 lines of code which starts by creating
a Parser object which takes page's URL as an argument. And then you call GetAllOutLinks
method on it. And it will return you string collection containing URLs of all links.
Sub ExtractOutlinksFromPage(ByVal strUrl As String)
Dim obParser As Parser
Dim obPageData As PageData
obParser = New Parser(New System.Uri(strUrl))
obPageData = obParser.GetAllOutLinks(1, True)
Console.WriteLine(obPageData.OutLinks.Count)
For Each obLinkData As LinkData In obPageData.OutLinks
Console.WriteLine("Depth[{0}] : Link Url={1}", obLinkData.Depth, obLinkData.Url)
Next
End Sub
|