VBA - scraping websites videos | Excel VBA Part 57.2 - Basic Web Scraping with Selenium and Google Chrome

Posted by Andrew Gould on 15 January 2021


In this video you'll learn how to perform basic web scraping tasks using Selenium Basic and Google Chrome. You'll see how to open Chrome, navigate to a web page and interact with items on the page including typing text into an input box and clicking a button. You'll learn how to check if a web element exists on the page to avoid run time errors and how to set a timeout value. You'll see how to reference a collection of elements and loop over them to identify the ones you're interested in. You'll also learn how to write a table into an Excel worksheet using two different techniques, as well as how to extract hyperlinks from a web table and assign them to cells in an Excel worksheet.

See our full range of VBA training resources, or test your knowledge of VBA with one of our VBA skills assessment tests.

This video has the following accompanying files:

File name Type Description
Pt57_2 Basic Web Scraping with Selenium and Google Chrome.xlsm Excel workbook with macros

Click to download a zipped copy of the above files.

There are no exercises for this video.

Making a  video bigger

You can increase the size of your video to make it fill the screen like this:

View full screen

Play your video (the icons shown won't appear until you do), then click on the full screen icon which appears as shown at its bottom right-hand corner.

 

When you've finished viewing a video in full screen mode, just press the Esc key to return to normal view.

Improving the quality of a video

To improve the quality of a video, first click on the Settings icon:

Settings icon

Make sure you're playing your video so that the icons shown appear, then click on this gear icon at the bottom right-hand corner.

 

Choose to change the video quality:

Video quality

Click on Quality as shown to bring up the submenu.

 

The higher the number you choose, the better will be your video quality (but the slower the connection speed):

Connection speed

Don't choose the HD option unless you have a fast enough connection speed to support it!

 

Is your Wise Owl speaking too slowly (or too quickly)?  You can also use the Settings menu above to change your playback speed.

This page has 7 threads Add post
14 Jan 22 at 17:47

Enjoyed video on web scraping using Selenium and Google Chrome. The Selenium library is a must with most sites no longer compatible with IE so this video was very helpful :-)

I have a two simple question s(at least to begin with):

Is there a way to run the script without the Google Chrome running in the foregrand (or maybe run the Chrome instance minimized and then exit after webscraping completed)?

How can the scraped data be placed below the search screen/button (no new sheet each time search is performed)?

Any assistance would be appreciated.

------------

 

15 Jan 22 at 07:51

Hi!

You can open a hidden Chrome window as described here. If you want to close Chrome automatically at the end, declare the ChromeDriver variable within the subroutine - this will close Chrome automatically when the variable goes out of scope at the end of the procedure. If you want to close the browser window explicitly you can apply the Close method to the ChromeDriver variable.

If you want to add new search results to the end of an existing list you can refer to the techniques described here.

If you want to replace previous search results with the new ones you can delete the existing results by applying the Clear method to the range of cells. If you need help with basic cell selection you can refer to the techniques described here.

I hope it helps!

14 Jan 22 at 15:39

Pls help, i am getting error while running Element not found for name

<input type="text" id="_id2:logon:USERNAME" name="_id2:logon:USERNAME" value="SDN004">

Option Explicit
Private ch As Selenium.ChromeDriver
Sub test()

Set ch = New Selenium.ChromeDriver
ch.Start baseUrl:="http://sapbi.crb.apmoller.net/BOE/CMC"
ch.Get "/"
ch.FindElementByName("_id2:logon:USERNAME").SendKeys "SDN004"

End Sub

15 Jan 22 at 07:59

Hi!

I don't have access to the page you're using so I can't offer precise help but if finding the element by name isn't working you could try some of the other search strategies described here.

I hope it helps!

09 Dec 21 at 21:34

Andrew, first let me thank you for your Videos and your easy going persona, which is the embodiment of your websiet (WiseOwl).

I've learned how to scrape websites using selemium basic from you, with a great deal of success. I've run into a bit of a problem which I do not know how to solve, and I've agonized over coming here for help in order not to bother you. But I am out of ideas after a few weeks os researching. So I hope you find the time. 

I am simply taking a table from the website and sending it to excel by using (NomPrpVoteTbl.AsTable.ToExcel DS.Cells(lRow, 1).

However I am getting the following error: 

Run-time error '-2147417856 (80010100)'  System Call Failed. (Exception from HRESULT: 0x80010100 (RPC_E_SYS_CALL_FAILED))

This is in a loop and I am executing the "asTable.toExcel" a few dozen times. If I limit the amount of times in the loop the error does not happen.

Any help, or pointers would greatly be appreciated.

 

 

 

10 Dec 21 at 08:28

Hello, I appreciate the restraint in requesting help!

Sadly, I'm afraid that I don't have any insight into why this is happening or know of a solution to the problem. Is it possible to break your procedure into multiple smaller ones to reduce the number of tables you're processing each time?

10 Dec 21 at 15:28

Andrew, thanks for the reply. 

I feel like such an idiot for bothering you, especially since I just discovered (accidentally) that this is a "threading" issue with windows. I am calling this process from within another macro workbook, if I call it from within its itself the error does not occur, no matter how many iterations. Now all I have to do is figure out how threading works on windows and how to manipulate within VBA I am all set :(. 

Again, thanks for the quick reply and have a wonderful holiday season. 

Andrew G  
11 Dec 21 at 08:46

Interesting! And I don't think that I would kick myself for not recognising what is a fairly esoteric problem!

As I understand it there's no native way for VBA to control threading. This might help to give you a primer on the topic https://analystcave.com/excel-multithreading-vba-vs-vbscript-vs-c-net/

I hope it helps!

08 Jun 21 at 12:28

Thanks for sharing such great informative training videos. Even toughest of the tough topic are made easy to understand. Even a person without any coding knowledge can digest your topics and become an expert..

Im getting a

Run-time error '1004': Exception from HRESULT: 0x800A03EC

Table.AsTable.ToExcel ThisWorkbook.Worksheets("DirStatus").Cells(2, 10)

Suprisingly the same code was running fine for 30-35 days, however offlate from past two days without any changes being done to the code it authomatically started throwing up this error. If possible kindly give your thoughts on the same..

Thank again for sharing all your training vidoes.

 

 

08 Jun 21 at 13:40

Hi there!

I don't know the answer to that one but here are some suggested causes and fixes:

https://www.codeproject.com/Questions/470089/Exception-from-HRESULT-0x800A03EC-Error

https://stackoverflow.com/questions/7099770/hresult-0x800a03ec-on-worksheet-range

I hope it helps!

28 May 21 at 01:32

Hi Andrew,

Subsequent to my earier post, i made some adjustments. I am still stuck at how to get the output data into one results sheet in the pivot table format. Your assistance would be greatly appreciated. The sample input data is valid for this code. 

Thanks much!

 

Option Explicit

Private chromebrowser As Selenium.ChromeDriver ' declaration at the module level. note we changed DIM to Private


Sub LoopScrapePropTaxWebsiteNew()

Dim FindBy As New Selenium.By ' declaration of an auto instancing BY object to query any changes to HTML class name

Dim PropertyOwnerQueryResults As Selenium.WebElements ' Declaration of a variable that can take multiple web elements
Dim PropertyTaxResults As Selenium.WebElements   ' Declaration of a variable that can take only one web element
Dim PropertyOwnerQueryResult As Selenium.WebElement
Dim PropertyTaxResult As Selenium.WebElement
Dim PropInfoTable As Selenium.WebElement
Dim PropTaxtable As Selenium.WebElement
Dim ValNo As String
Dim StratNo As String

Set chromebrowser = New Selenium.ChromeDriver

chromebrowser.Start 'The base URL in google chrome
chromebrowser.Get "https://ptsqueryonline.fsl.org.jm/PTSOnlineWeb/ptsquery.jsp" ' access the home page. If we wanted to access any page, we would just indicate the page name after the forward slash

Application.ScreenUpdating = False  'This line will turn off the screen flickering

Worksheets("PropTaxData").Activate ' Go back to the property Tax Sheet
Range("A2").Select
    
'-----------Loop to download the Property Tax Information
    
    Do Until ActiveCell.Value = ""
        
        ValNo = ActiveCell.Value
        StratNo = ActiveCell.Offset(0, 1).Value
        
    If chromebrowser.IsElementPresent(FindBy.Name("valuationno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find search input box", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByName("valuationno").SendKeys ValNo

    If chromebrowser.IsElementPresent(FindBy.Name("stratalotno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find search input box", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByName("stratalotno").SendKeys StratNo

    If chromebrowser.IsElementPresent(FindBy.Class("btn"), 5000) = False Then ' this is to account for any changes button name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find submit button", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByClass("btn", 3000).Click

'---------------Retriving the results of the property tax query

    Set PropertyOwnerQueryResults = chromebrowser.FindElementsByClass("form-horizontal")

    If PropertyOwnerQueryResults.Count = 0 Then      ' Condition to account for an empty result set
        'chromebrowser.Quit
        MsgBox "No Results Found" & " " & "for Valuation No" & " " & ValNo, vbExclamation
        Exit Sub
    End If

        For Each PropertyOwnerQueryResult In PropertyOwnerQueryResults ' loops through the web elements

            Set PropInfoTable = PropertyOwnerQueryResult.FindElementByTag("tbody")
             'Debug.Print PropInfoTable.Text ' print the webeleemnt texts to the immediate window
             
        Next PropertyOwnerQueryResult

    ProcessTable PropInfoTable
    
    chromebrowser.FindElementById("back").Click 'Select the back button
     
    Worksheets("PropTaxData").Activate
        
    ActiveCell.Offset(1, 0).Select ' Go to the next cell
    
    Loop
    
    'Application.ScreenUpdating = True

End Sub


Sub ProcessTable(TableToProcess As Selenium.WebElement)

    Dim AllRows As Selenium.WebElements
    Dim SingleRow As Selenium.WebElement
    Dim AllRowCells As Selenium.WebElements
    Dim SingleCell As Selenium.WebElement
    Dim OutputSheet As Worksheet
    Dim RowNum As Long, ColNum As Long
    Dim TargetCell As Range
    

    Set OutputSheet = ThisWorkbook.Worksheets("Results")
    
    Set AllRows = TableToProcess.FindElementsByTag("tr")

    For Each SingleRow In AllRows
    
        RowNum = RowNum + 1
        Set AllRowCells = SingleRow.FindElementsByTag("td")
        
        If AllRowCells.Count = 0 Then
            Set AllRowCells = SingleRow.FindElementsByTag("th")
        End If
        
        For Each SingleCell In AllRowCells
        
            ColNum = ColNum + 1
            
            Set TargetCell = OutputSheet.Cells(RowNum, ColNum)
            
            TargetCell.Value = SingleCell.Text
            
            'Debug.Print SingleCell.Text
        Next SingleCell
    
        ColNum = 0
        
    Next SingleRow

End Sub

27 May 21 at 21:55

Hi Andrew,

I am loving your content. The videos are insightful, simple and easy to understand. Using Selenium, I am writing a webscraping script to loop over an excel array of values to login a website, copy the particulars from that website and write the contents to an excel sheet.  The data is returned across as a table. I know the script works as the data is generated in the immediate window.  I need your guidance how to write the details to an excel file in a pivot table format with the appropriate headers.

Here is some sample values to test the script

16402008114

17006012198

00604005040

06901022020

031D5Y27115

19002019045

031D5Y27129

031D5Y27134

105D5Z02004

031D5Y27109

16301002229

Looking forward to your kind support

 

Option Explicit

Private chromebrowser As Selenium.ChromeDriver ' declaration at the module level. note we changed DIM to Private


Sub LoopScrapePropTaxWebsite()

Dim FindBy As New Selenium.By ' declaration of an auto instancing BY object to query any changes to HTML class name

Dim PropertyOwnerQueryResults As Selenium.WebElements ' Declaration of a variable that can take multiple web elements
Dim PropertyTaxResults As Selenium.WebElements   ' Declaration of a variable that can take only one web element
Dim PropertyOwnerQueryResult As Selenium.WebElement
Dim PropertyTaxResult As Selenium.WebElement
Dim PropInfoTable As Selenium.WebElement
Dim PropTaxtable As Selenium.WebElement
Dim ValNo As String
Dim StratNo As String

Set chromebrowser = New Selenium.ChromeDriver

chromebrowser.Start 'The base URL in google chrome
chromebrowser.Get "https://ptsqueryonline.fsl.org.jm/PTSOnlineWeb/ptsquery.jsp" ' access the home page. If we wanted to access any page, we would just indicate the page name after the forward slash

Application.ScreenUpdating = False  'This line will turn off the screen flickering

Worksheets("PropTaxData").Activate ' Go back to the property Tax Sheet
Range("A2").Select
    
'-----------Loop to download the Property Tax Information
    
    Do Until ActiveCell.Value = ""
        
        ValNo = ActiveCell.Value
        StratNo = ActiveCell.Offset(0, 1).Value
        
    If chromebrowser.IsElementPresent(FindBy.Name("valuationno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find search input box", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByName("valuationno").SendKeys ValNo

    If chromebrowser.IsElementPresent(FindBy.Name("stratalotno"), 5000) = False Then ' this is to account for any changes in class name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find search input box", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByName("stratalotno").SendKeys StratNo

    If chromebrowser.IsElementPresent(FindBy.Class("btn"), 5000) = False Then ' this is to account for any changes button name as time passes. the number is the timeout period in millisecond, ie 3000 milliseconds is 3 sec
        chromebrowser.Quit ' code to exit the chrome browser
        MsgBox "Could not find submit button", vbExclamation  ' Code to diplay message box with error
        Exit Sub  ' Exit sub routine
    End If ' end if statement

    chromebrowser.FindElementByClass("btn", 3000).Click

'---------------Retriving the results of the property tax query

    Set PropertyOwnerQueryResults = chromebrowser.FindElementsByClass("topInfo")

    If PropertyOwnerQueryResults.Count = 0 Then      ' Condition to account for an empty result set
        'chromebrowser.Quit
        MsgBox "No Results Found" & " " & "for Valuation No" & " " & ValNo, vbExclamation
        Exit Sub
    End If

        For Each PropertyOwnerQueryResult In PropertyOwnerQueryResults ' loops through the web elements
            Debug.Print PropertyOwnerQueryResult.Text ' print the webeleemnt texts to the immediate window
        
        Next PropertyOwnerQueryResult


    Set PropertyTaxResults = chromebrowser.FindElementsByClass("middleTable")
    
    If PropertyTaxResults.Count = 0 Then          ' Condition to account for an empty result set
        chromebrowser.Quit
        MsgBox "No Results Found", vbExclamation
        Exit Sub
    End If

        For Each PropertyTaxResult In PropertyTaxResults  ' This For Loop goes through the elements of the results table and prints the text in the immediate window
    
            Debug.Print PropertyTaxResult.Text
            

        Next PropertyTaxResult
        
    
    chromebrowser.FindElementById("back").Click 'Select the back button
     
    Worksheets("PropTaxData").Activate
        
    ActiveCell.Offset(1, 0).Select ' Go to the next cell
    
    Loop
    
   

End Sub

28 May 21 at 06:22

Hi there!

I think that the best thing to do is take a look at some of our videos on creating pivot tables! You can see the list of pivot table videos here https://www.wiseowl.co.uk/vba-macros/videos/vba-pivot-tables-charts/

I hope it helps!

29 May 21 at 22:04

Success! Thanks alot Andrew!

Andrew G  
30 May 21 at 06:05

Excellent!

28 May 21 at 07:17

Hi Andrew,

Thanks for the prompt response. I asked the wrong question. I need guidance on how to consolidate the data from the website to one sheet. The data is currently downloaded to multiple tabs.

Thanks

Andrew G  
29 May 21 at 19:11

OK, that makes more sense! There are plenty of ways to build a list by finding the end of the current list (your suggestion is a good one) and we even have a video dedicated to discussing different techniques! https://www.wiseowl.co.uk/vba-macros/videos/vba-advanced/last-row-column-cell/

A quick way to avoid testing if the sheet is empty is to start at the bottom of column A, end upwards (this will stop at the first cell containing a value or cell A1 if the column is empty) and then offset one row downwards. 

Range("A1048576").End(xlUp).Offset(1, 0)

This will leave row 1 of the worksheet empty but you could use that for column headings or just delete the row at the end.

I hope it helps!

29 May 21 at 13:52

Hi Andrew,

I took out the code that adds a new work sheet and replaced it with this line:

Set OutputSheet = ThisWorkbook.Worksheets("Results"). 

The challenge now is to prevent the results from being overwritten each time a new data set is added.

In other words, I need a dynamic way for the results sheet to be populated with all the results. I was thinking to code the output sheet using the following conditions:

Condition 1:-

If the results sheet is blank, then start writing the results to the output sheet starting at cell A1 

Condition 2:-

If the output sheet is not blank, then go to the first empty cell in column A and then beginning writing the results.

I have been getting an errors each time I tried writing the code.

Your guidance would be greatly appreciated 

Andrew G  
29 May 21 at 06:47

Hi! I was going to say that you can remove the code which creates the new worksheet for each table that you're copying but I can't see anything in the two sets of sample code you've posted which does this. How are you creating the new worksheets that your tables are being written to?

12 Mar 21 at 07:24

Hi Andrew,

Thank you so much for your reply to my post under Excel VBA Part 47.1. I've managed to apply and adjust the code scroll through all pages and tabs on the AustralianSuper site. When I tried to use a similar approach on another website, I seem to be running into a challenge. On this page https://www.pfzw.nl/over-ons/zo-beleggen-we/waarin-we-beleggen/overzicht-aandelen.html I want to copy the table (like with the AustralianSuper). But all of these FindElementByXXX don't seem to find the table, while in the inspector I can see it there clearly:

 Set PFZWTable = ch.FindElementByClass("data-table style-scope p-data-table")
 Set PFZWTable = ch.FindElementByCss(".data-table.style-scope.p-data-table")
 Set PFZWTable = ch.FindElementByTag("Table")
 Set PFZWTable = ch.FindElementsByXPath("//*[@id='tabledatasource-642984829']//div/div/div/div/table")

Inspector element: <table class="data-table style-scope p-data-table" style="min-width: 830px">, XPath //*[@id="tabledatasource-642984829"]//div/div/div/div/table

I've also given the page a little while to load (ch.Wait "3000"), just in case it needed time to load. I seem to be struggling with FindElementBy the above options in general. On other pages, the only FindElement that works for me is FindElementByLinkText, and in one case ch.FindElementByCss(".next") to click the Next button. 

What am I doing wrong? Am I missing something? If an element seems to have the tag 'table' why does ch.FindElementByTag("Table") when there is only one table tag on the page? The same for FindElementByClass("data-table style-scope p-data-table"). I've watch a lot of (but admittedly not all of) your video 57.4, but still can't seem find what I'm doing wrong.

Your guidance on this is really appreciated! 

Best,

19 Mar 21 at 10:40

I thought you'd like to know that I've added a video answer to the question which you can see here. Thanks for the question!

12 Mar 21 at 17:40

Well I've learnt something about web design that I didn't previously know!  Working code below but you might find this useful as background information first

https://developer.mozilla.org/en-US/docs/Web/Web_Components/Using_shadow_DOM

Here's the code which writes the table you're trying to get into a new worksheet:

Private ch As Selenium.ChromeDriver

Sub ScrapePFZW()

    Dim ShadowHost As Selenium.WebElement
    Dim ShadowRoot As Selenium.WebElement
    Dim PFZWTable As Selenium.WebElement
    Dim ws As Worksheet
    
    Set ch = New Selenium.ChromeDriver
    
    ch.Start
    ch.Get "https://www.pfzw.nl/over-ons/zo-beleggen-we/waarin-we-beleggen/overzicht-aandelen.html"
    
    'Get a reference to the parent element of the shadow root
    Set ShadowHost = ch.FindElementByCss("p-data-table")
    
    'Return a reference to the shadow root of the parent element
    Set ShadowRoot = ch.ExecuteScript("return arguments[0].shadowRoot", ShadowHost)
    
    'Search within the shadow DOM using normal FindElement methods
    Set PFZWTable = ShadowRoot.FindElementByCss("table")
    
    Set ws = ThisWorkbook.Worksheets.Add
    PFZWTable.AsTable.ToExcel ws.Range("A1")
    
End Sub

Hope that helps!