Excel VBA videos | Excel VBA Part 47 - Browsing to Websites and Scraping Web Page

Posted by Andrew Gould on 14 November 2016

Scraping a web page in VBA involves getting a reference to an HTML document and then processing its elements to extract the useful parts and write them to another location in a readable format. This video shows you how to do that using both Internet Explorer and basic XML HTTP requests. You'll learn a bit about the Document Object Model, as well as how to identify HTML elements by name and by tag type. You'll also see how to loop through various collections of HTML elements and their children using an example involving exchange rate tables.

You can download any files that you need to follow the video here.

You can increase the size of the video:

Full screen mode for YouTube

You can view the video in full screen mode as shown on the left, using the icon at the bottom right of the frame.

You can also increase the quality of the video:

Changing resolution

You can improve the resolution of the video using another icon at the bottom right of the frame. This will slow down the connection speed, but increase the display and sound quality. This icon only becomes visible when you start playing the video.

Finally, if nothing happens when you play the video, check that you're not using IE in compatibility view.

This page has 2 threads Add post
12 Apr 17 at 12:58

Hi Andrew,

Many thanks for your reply, I have implemented the changes

and the routine works perfectly.

Best regards

Tony95

13 Apr 17 at 06:40

You're welcome, Tony!

11 Apr 17 at 14:46

Hi Andrew

I have been watching your videos for some time and must say thanks for the very

informative content. I am trying to adapt the code from part 47 but I only want to loop

over 1 table. I have tried adding (0) to the end of line "Set HTMLTables = HTMLPage.getElementsByTagName("table")

but it produces an error. Any ideas please?

Best regards

Tony95

11 Apr 17 at 21:44

Hi Tony,

When you add (0) to the end of the line you're changing the type returned by the instruction. The (0) indicates that you're returning a single item but the HTMLTables variable has a type of IHTMLElementCollection. Just change the type of variable to IHTMLElement (or use the HTMLTable variable instead), something like this:

Sub ProcessHTMLPage(HTMLPage As MSHTML.HTMLDocument)

    Dim HTMLTable As MSHTML.IHTMLElement
    Dim HTMLRow As MSHTML.IHTMLElement
    Dim HTMLCell As MSHTML.IHTMLElement
    Dim RowNum As Long, ColNum As Integer
    
    Set HTMLTable = HTMLPage.getElementsByTagName("table")(0)
        
    Worksheets.Add
    Range("A1").Value = HTMLTable.className
    Range("B1").Value = Now
    
    RowNum = 2
    For Each HTMLRow In HTMLTable.getElementsByTagName("tr")
        
        ColNum = 1
        For Each HTMLCell In HTMLRow.Children
            Cells(RowNum, ColNum) = HTMLCell.innerText
            ColNum = ColNum + 1
        Next HTMLCell
        
        RowNum = RowNum + 1
    Next HTMLRow
            
End Sub