VBA - scraping websites videos | Excel VBA Part 46 - Querying Web Pages with Query Tables

Posted by Andrew Gould on 14 November 2016


Querying web pages in Excel VBA is simple when you use Query Tables. This video shows you how to create a query table linked to a web page, how to choose which tables from the web page to return, and how to pass values to a URL query string to create dynamic, updatable web queries in your Excel workbooks.

This video has the following accompanying files:

File name Type Description
Web Query Tables.xlsm Excel workbook with macros

Click to download a zipped copy of the above files.

There are no exercises for this video.

Making a  video bigger

You can increase the size of your video to make it fill the screen like this:

View full screen

Play your video (the icons shown won't appear until you do), then click on the full screen icon which appears as shown at its bottom right-hand corner.

 

When you've finished viewing a video in full screen mode, just press the Esc key to return to normal view.

Improving the quality of a video

To improve the quality of a video, first click on the Settings icon:

Settings icon

Make sure you're playing your video so that the icons shown appear, then click on this gear icon at the bottom right-hand corner.

 

Choose to change the video quality:

Video quality

Click on Quality as shown to bring up the submenu.

 

The higher the number you choose, the better will be your video quality (but the slower the connection speed):

Connection speed

Don't choose the HD option unless you have a fast enough connection speed to support it!

 

Is your Wise Owl speaking too slowly (or too quickly)?  You can also use the Settings menu above to change your playback speed.

This page has 7 threads Add post
27 Apr 20 at 22:34

Hi. I am trying to connect this portal. first I can not connect withh Data>Get and Transform>from web

secondly how can tell vba if internet connection lost or for any reason couldnt get data from link, ignore any error and ask me do you want try again

SourceDataFile > http://members.tsetmc.com/tsev2/excel/MarketWatchPlus.aspx?d=0

Connection >   OLEDB;Provider=Microsoft.ACE.OLEDB.12.0;User ID=Admin;Data Source=C:\Users\Nima\AppData\Local\Microsoft\Windows\INetCache\Content.MSO\518F1202.xlsx;Mode=Share Deny Write;Extended Properties="HDR=YES;";Jet OLEDB:System database="";Jet OLEDB:Registry Path="";Jet OLEDB:Engine Type=37;Jet OLEDB:Database Locking Mode=0;Jet OLEDB:Global Partial Bulk Ops=2;Jet OLEDB:Global Bulk Transactions=1;Jet OLEDB:New Database Password="";Jet OLEDB:Create System Database=False;Jet OLEDB:Encrypt Database=False;Jet OLEDB:Don't Copy Locale on Compact=False;Jet OLEDB:Compact Without Replica Repair=False;Jet OLEDB:SFP=False;Jet OLEDB:Support Complex Data=False;Jet OLEDB:Bypass UserInfo Validation=False;Jet OLEDB:Limited DB Caching=False;Jet OLEDB:Bypass ChoiceField Validation=False

Connection name > MarketWatchPlus

thanks

28 Apr 20 at 07:25

Hi there, you might find this set of videos on error-handling useful https://www.wiseowl.co.uk/online-training/excel-vba/error-handling/

I'm afraid I can't offer any insight into why you can't connect to the site you've mentioned. It's not one I'm familiar with, sorry!

20 May 19 at 16:51

Andrew,

Quick question here:

Assuming I have some data (10 rows) on Sheet1 via a query, if I execute this:

Sheet1.Range("A1").ListObject.QueryTable.Refresh BackgroundQuery:=False

will it clear the the entire contents of Sheet1 first before populating it?

My concern is if the webpage which the query is fetching from has been updated / changed and now only contains fewer than 10 rows (say only 3 rows), I don't want the query to simply grab those 3 rows and paste onto Range("A1") (and leaving the old rows 4 to 10 data still on the sheet).

Thanks

 

21 May 19 at 21:29

Hi Duggie,

From memory (it's been a while) the query table will grow and shrink according to the number of rows returned when it's refreshed so it won't leave historical data sitting in the worksheet if the query returns fewer rows the next time it's refreshed.

I hope that helps!

22 May 19 at 23:21

Thanks Andrew.

What I have noticed is the columns DO NOT alter, ie if the query only contains 10 columns and I manually add an eleventh, after clicking refresh, the eleventh column remains.

 

 

08 May 19 at 20:58

Andrew,

Do you know if it's possible to delete date from a worksheet but still preserving the query behind it?

 

09 May 19 at 08:10

Hi duggie,

Yes, if you create a query table and then subsequently delete the cells the connection will still exist.  You can test that pretty easily with the code in the sample workbook:

Sub ImportXratesThenDeleteTable()

    Dim ws As Worksheet
    Dim qt As QueryTable
    Dim URL As String
    
    URL = "http://www.x-rates.com/table/?from=USD&amount=1"
    
    Set ws = Worksheets.Add
    
    Debug.Print "Count of query tables at start " & ws.QueryTables.Count
    
    Set qt = ws.QueryTables.Add( _
        Connection:="URL;" & URL, _
        Destination:=Range("A5"))
        
    With qt
        .RefreshOnFileOpen = True
        .Name = "XRates"
        .WebFormatting = xlWebFormattingRTF
        .WebSelectionType = xlSpecifiedTables
        .WebTables = "1"
    
        .Refresh
    End With
    
    Debug.Print "Count of query tables after creating one " & ws.QueryTables.Count
    
    Range("A5").CurrentRegion.Delete
    
    Debug.Print "Count of query tables after deleting cells " & ws.QueryTables.Count
    
End Sub

Output:

Count of query tables at start 0
Count of query tables after creating one 1
Count of query tables after deleting cells 1

I hope that helps!

09 May 19 at 11:01

Andrew,

I think I stumbled across a solution myself instead!

Instead of

    Range("A5").CurrentRegion.Delete

I changed it to 

     Range("A5").CurrentRegion.Clear

and after the worksheet was wiped, I could stil right-click and click refresh to retrieve the table.

Thanks again for your help.

 

 

09 May 19 at 10:54

Andrew,

Thanks for your prompt response.

I tried your code and it worked - sort of!

I put a break point in this line:

Range("A5").CurrentRegion.Delete

and could see in the immediate window that the count of query tables is 1.

After executing the Delete line, the count of query tables is still 1.

By now the worksheet contains no data. However, right-clicking on cell A5, I do not see the option to refresh the query.

I tested this in Excel 2010.

 

 

Andrew G  
10 May 19 at 06:52

Oh, but that's what you asked for!  The cells are deleted but the query remains in the workbook.  You didn't mention that you wanted to retain the ability to right-click the cells to refresh the table!

Anyway, glad that you found a solution that worked for you!

02 May 19 at 14:12

Hi,

If the table on a webpage contains 100 records, yet only the first 50 is shown on the first webpage and the remaining on the second webpage, how can I adapt the code so it brings back the full 100 records onto my worksheet?

Thanks

 

03 May 19 at 07:23

Hi duggie,

It's difficult to answer that one without seeing the pages you mention but can you just import each table separately and place one below the other in a worksheet?  It sounds like the simplest approach to what you've described!

04 Apr 19 at 16:46

I have followed your code and produced this, which works perfectly:

 

Sub ImportWebPage()

    Dim qt As QueryTable
    Dim URL As String
   
    URL = "SomeWebPage"
   
    Set qt = Sheet1.QueryTables.Add(Connection:="URL;" & URL, Destination:=Sheet1.Range("A1"))
   
    With qt
   
        .RefreshOnFileOpen = True
   
        .Name = "SomeName"
       
        .WebFormatting = xlWebFormattingRTF
       
        .WebSelectionType = xlSpecifiedTables
       
        .WebTables = "1"
       
        .Refresh
   
    End With
   
End Sub

 

The problem is that I have additional code running afterwards but the code needs to run AFTER the code above has refreshed.

 

Adding Wait or DoEvents hasn't helped.

How can I ensure the web page has completely refreshed before I continue with the next part of the code?

Thanks

05 Apr 19 at 07:03

You could adapt the code shown here https://docs.microsoft.com/en-us/office/vba/api/excel.querytable.refreshing to wait until the refresh has finished.  Perhaps a simple Do Loop immediately after the .Refresh statement:

Do While .Refreshing

Loop

Hope it helps!

05 Apr 19 at 14:11

Thanks.

I found adding this line seemed to work around 90% of the time:

.BackgroundQuery = False

though I'm not sure if it was a fluke!

Andrew G  
08 Apr 19 at 08:15

Hopefully 90% is good enough!

07 Sep 18 at 10:21

Hi,

What can we do If the URL does not include the specific structure to concatenate the parameters and select specific values from a table? For example the URL for Google Finance bellow:

https://www.google.com/search?stick=H4sIAAAAAAAAA...q=finance&tbm=fin#wptab=s:H4sIA...

Thanks in advance for the assistance.

11 Sep 17 at 23:42

I am curious as to getting to a web page that requires a log in, in my case I am attempting to retrieve data from my online brokerage account. Primarily my watch lists and maybe a couple of other tables that I can reference in the Excel workbook.

After watching several of your tutorials I have a basic understanding of VBA (and growing). HTML is completely beyond my abilities, therefore, I am attempting to watch the videos again an adapt them to my specific instance.

While this isn't really required, it is an attempt to keep learning Excel and VBA.

Thank you for all of the great lessons.

13 Sep 17 at 07:57

Hi, I haven't done this myself but here's an example of someone attempting the same thing using QueryTables and VBA which you might find useful:

https://stackoverflow.com/questions/23657843/excel-vba-passing-username-password-to-web-query-connection

If it's not important to use VBA then I'd recommend using Excel's Power Query tool instead (it's called Get & Transform in Excel 2016). This will almost certainly be the easiest way to achieve what you're attempting! 

https://support.office.com/en-gb/article/Connect-to-a-web-page-Power-Query-b2725d67-c9e8-43e6-a590-c0a175bd64d8

I hope that helps!

14 Sep 17 at 10:28

Thanks for the links, I will look them over.