Training videos for learning VBA - SCRAPING WEBSITES

VBA - scraping websites - 3 videos

Excel VBA Part 47 - Browsing to Websites and Scraping Web Page

Posted by Andrew Gould on 14 November 2016

Scraping a web page in VBA involves getting a reference to an HTML document and then processing its elements to extract the useful parts and write them to another location in a readable format. This video shows you how to do that using both Internet Explorer and basic XML HTTP requests. You'll learn a bit about the Document Object Model, as well as how to identify HTML elements by name and by tag type. You'll also see how to loop through various collections of HTML elements and their children using an example involving exchange rate tables.

Excel VBA Part 48 - Scraping Multiple Web Pages

Posted by Andrew Gould on 17 November 2016

PLEASE NOTE - The design of the website used in this video has changed since the video was recorded. This means that the code shown in the video no longer works. The downloadable file contains both the original version of the code and a version which works with the current version of the website. What's better than scraping one web page? Scraping lots of them with the same procedure, of couse! This video explains how to loop over multiple pages using Microsoft's HTML and XML object libraries. You'll learn about HTML tags and classes, the Document Object Model and how to loop over elements on a page.

Excel VBA Part 49 - Downloading Files from Websites

Posted by Andrew Gould on 21 November 2016

PLEASE NOTE - The design of the website used in this video has changed since the video was recorded. This means that the code shown in the video no longer works. The downloadable file contains both the original version of the code and a version which works with the current version of the website. Excel VBA doesn't have a native method for downloading files from websites but you can declare an API function that will enable you to do this. This video takes you through the process of declaring the API function and using it in your code, along with a bunch of other useful techniques such as using folder pickers, creating folders with FileSystemObjects and opening a Windows Explorer window using the Shell function.

This page has 6 threads Add post
15 Sep 18 at 01:06

Hi Andrew

Are you able to help on below, I am attempting to scrape the "Bid Price" from a website but get "[object] as a return value.

I'm sure have faulted with element collections, would appreciate you assistance.

Thnak you

Sub Price_from_BB()

Dim XMLPriceRequest As New MSXML2.XMLHTTP60
    Dim HTMLPriceSheet As New MSHTML.HTMLDocument         
    Dim ProductPrice As MSHTML.IHTMLElementCollection   

    XMLPriceRequest.Open "GET", "https://barxis.barcap.com/ZA/10/en/instruments.app?statusId=4,5#/details/350041", False
    XMLPriceRequest.send

    HTMLPriceSheet.body.innerHTML = XMLPriceRequest.responseText

    Set ProductPrice = HTMLPriceSheet.getElementById("buttonSELL350041")

    Sheet1.Range("A1").Value = HTMLPriceSheet
    Sheet1.Range("B1").Value = Now
    Set XMLPriceRequest = Nothing
   
End Sub

 

 

 

15 Sep 18 at 09:46

Hi Brendon,

The reason that you're seeing [object] in cell A1 is that you've assigned the HTMLDocument to it.  I imagine that what you're arrempting to do is write the table of assets into the worksheet?  In which case you'll probably need to loop over the individual elements of the table to identify the values you need.  The techniques you need to do this are explained in the Excel VBA Part 47 - Browsing to Websites and Scraping Web Page video.

I hope that helps!

06 Sep 18 at 09:21

Hi Andrew,

I am trying to upload a file to a web page, the following are the steps I followed:

1. Open the web page 

2. Wait for until the page is getting loaded

3. In this webpage I am uploading the file into the first “Upload a File” browser.

4. Get the input element by tag name as “input”

5. Hit the “browse” button, since the paste portion is disabled.

6. Enter the file path in the “Choose File to Upload” window (path is in excel eg :D:\MArgin Discovery (Businness Assurance)\practice2.xlsx )

7. Enter After 5th step, I am not able to enter the file path in the “Choose File to Upload” window, looks like the macro is not supporting for this.

Here is my code:

Sub File_Test()
    Dim HTMLDoc As MSHTML.HTMLDocument
    Dim HTMLButtons As MSHTML.IHTMLElementCollection
    Dim HTMLButton As MSHTML.IHTMLElement
    Dim ie As Object

    Set ie = CreateObject("internetexplorer.application")
    ie.Visible = True
    ie.navigate "http://www.htmlquick.com/reference/tags/input-file.html"

    Do While ie.readyState <> READYSTATE_COMPLETE
    Loop

    Set HTMLDoc = ie.document
    Set HTMLButtons = HTMLDoc.getElementsByTagName("input")

    For Each HTMLButton In HTMLButtons
        If HTMLButton.Type = "file" Then
            HTMLButton.Click
            HTMLButton.Value = "C:\Documents\Test\Temp.txt"
            Exit For
        End If
    Next


Kindly help how to enter path name when input type is file.

Thank you

Regards,

Jouhar

06 Sep 18 at 10:02

I was going to suggest Stack Overflow but I see that you've already posted the question there!  You might try some of the suggestions at this link https://www.ozgrid.com/forum/forum/help-forums/excel-general/100265-automate-internet-explorer-file-upload

18 Aug 18 at 22:32

 Andrew,

Thanks again for your awesome videos. 

On VBA part 47 video, you show us how to input a value, click the "Go" search button, and scrape contents from queried URL.  I used the same technique to scrape the page after the search button is clicked.  It didn't return value (innertext).

Wonder if you can help me out.

Thanks,

Suntao

20 Aug 18 at 07:27

You can reference the Document property of the IE object after navigating to the second page to access the HTML and text of the page.  You may need to introduce a delay to your code to allow IE to catch up with the instruction to navigate to a new page before you attempt to read its contents.  The example below shows how you could do this:

Sub NavigateToAnotherPage()

    Dim IE As New SHDocVw.InternetExplorer
    Dim HTMLDoc As MSHTML.HTMLDocument
    Dim HTMLParagraph As MSHTML.IHTMLElement
    Dim HTMLParagraphs As MSHTML.IHTMLElementCollection
    
    IE.Visible = True
    IE.navigate "en.wikipedia.org/wiki/Main_Page"
    
    Do While IE.ReadyState <> READYSTATE_COMPLETE
    Loop
    
    Debug.Print "First Page = ", IE.LocationName, IE.LocationURL
    
    IE.Document.forms("searchform").elements("search").Value = "Document Object Model"
    IE.Document.forms("searchform").elements("go").Click
    
    Do While IE.ReadyState <> READYSTATE_COMPLETE
    Loop
    
    'Wait two seconds
    Application.Wait Now + TimeValue("00:00:02")
    
    Set HTMLDoc = IE.Document
    Set HTMLParagraphs = HTMLDoc.getElementsByTagName("p")
    
    Debug.Print "Second Page = ", IE.LocationName, IE.LocationURL
    
    For Each HTMLParagraph In HTMLParagraphs
        Debug.Print HTMLParagraph.innerText
    Next HTMLParagraph
    
End Sub

21 Aug 18 at 06:11

Andrew, 

Thanks for your comment.  It solved my problem! 

Suntao

Andrew G  
21 Aug 18 at 07:39

No problem, happy to hear it helped!

12 Jun 18 at 14:37

Thank you sir for your tutorials

 please help me, i'm trying to run your example code, but it gives me error "access denied",

how can I solve this kind of error?

13 Jun 18 at 07:01

Hi, the most likely reason for the error is that the Wise Owl website has been updated to use https rather than http since the video was made.  You should be able to replace any reference to "http" in the code to "https" to solve the problem.

I hope that helps!

25 May 18 at 09:17

Viewed Video #47 on youtube.

Any plans to expand on XHR in Excel VBA, especially with session cookies?

25 May 18 at 12:34

There aren't, I'm afraid!

25 May 18 at 15:35

Please could you point in the direction of any instructional material or course or video that details only XHR and session cookies within an Excel VBA environment?

Andy B  
28 May 18 at 14:25

'Fraid i don't know of any, beyond Googling!

17 May 18 at 10:29

Hi Andrew,

Thank you so much for the 3 videos about Web scraping. It did guide me a lot on how to identify HTML code. Your approach is great. I seem unable to find any routine of auto login (username, password) to web site via VBA code.

Greetings,

Dude6571

18 May 18 at 07:53

Hi, we don't have a video on how to do this but there are plenty of suggestions out there for ways to do this:

https://stackoverflow.com/questions/48919491/login-to-a-website-using-vba

https://stackoverflow.com/questions/21244051/using-vba-to-automate-ie-login-having-errors

https://www.ozgrid.com/forum/forum/help-forums/excel-general/69057-automate-username-password-log-in-to-web-page

https://stackoverflow.com/questions/24038230/fill-user-name-and-password-in-a-webpage-using-vba

Of course the exact technique you'll use will depend on the page you're using to login but hopefully that's enough to point you in the right direction!