Home
Products
Forums
Help
Publish Article

Use HTML Parser To Extract Links From Web Page

This article will describe how you can use our HTML Parser library HTMLParser.Net to parse and analyze a web page to extract all outgoing links like Image, PageLinks, FTP, Mail etc. The library does all the hard work for you to create nice hierarchical view of all the tags. Only thing that you need to specify is what specific information you are interested in extracting.

The process of extracting links can be achieved by writing 3 lines of code which starts by creating a Parser object which takes page's URL as an argument. And then you call GetAllOutLinks method on it. And it will return you string collection containing URLs of all links.

Sub ExtractOutlinksFromPage(ByVal strUrl As String)
    Dim obParser As Parser
    Dim obPageData As PageData
    obParser = New Parser(New System.Uri(strUrl))
    obPageData = obParser.GetAllOutLinks(1, True)
    Console.WriteLine(obPageData.OutLinks.Count)
    For Each obLinkData As LinkData In obPageData.OutLinks
        Console.WriteLine("Depth[{0}] : Link Url={1}", obLinkData.Depth, obLinkData.Url)
    Next
End Sub
							
Go Freelance
Home     About us     Contact us    Copyright    Privacy Policy    Return Policy    Advertisers
Copyright © Netomatix