Read our blogs, tips and tutorials
Try our exercises or test your skills
Watch our tutorial videos or shorts
Take a self-paced course
Read our recent newsletters
License our courseware
Book expert consultancy
Buy our publications
Get help in using our site
559 attributed reviews in the last 3 years
Refreshingly small course sizes
Outstandingly good courseware
Whizzy online classrooms
Wise Owl trainers only (no freelancers)
Almost no cancellations
We have genuine integrity
We invoice after training
Review 30+ years of Wise Owl
View our top 100 clients
Search our website
We also send out useful tips in a monthly email newsletter ...
Advice on how to scrape tables of data or HTML from webpages using VBA Part two of a three-part series of blogs |
---|
You can use VBA to extract data from web pages, either as whole tables or by parsing the underlying HTML elements. This blog shows you how to code both methods (the technique is often called "web-scraping").
|
Pretty much anything you can do in Excel, you can do in VBA too - and that includes extracting information from a website's table.
Supposing that you want to create an Excel workbook with an up-to-date list of all of the Wise Owl Excel VBA courses (who wouldn't want to do this?). You could do this in 3 easy stages. First, choose to get data from a website:
Choose the option shown on the Data tab of the Excel ribbon.
Secondly, type or paste in a URL (website address) and click on Go:
Set the address you want to go to and click on Go.
Thirdly, click on the yellow-and-black arrow next to the table you want to import, and then click on the Import button:
The arrow gets a green background when you select it; the Import button is at the bottom right of the form.
After going through one more fairly self-explanatory dialog box, you'll see the imported data:
The standard import doesn't bring in any formatting.
The method shown above is for older versions of Excel, although the principle is the same for newer versions. You can find out much more about Excel on one of our online or classroom Excel courses or VBA courses.
To import this table of data in VBA, first tell your program where the website can be found:
Sub GetCourseList()
'add a table of courses into an Excel workbook
'the website containing the files listing courses
Const prefix As String = WiseOwlWebsiteAddress & "/training/dates/"
Const FileName As String = "microsoft-excel-advanced"
Dim qt As QueryTable
Dim ws As Worksheet
'using a worksheet variable means autocompletion works better
Set ws = ActiveSheet
The next thing to do is to create a query table, specifying where the results should go and where it should get its data from:
'set up a table import (the URL; tells Excel that this query comes from a website)
Set qt = ws.QueryTables.Add( _
Connection:="URL;" & prefix & FileName & ".htm", _
Destination:=Range("A1"))
Note the URL; prefix, to tell VBA that it should get the data from a website, rather than a database or any other file format.
You can now set various parameters:
Just a few of the many ways in which you can customise a query table (although the default settings usually work OK).
Here is an example of some customisation for your query table:
'tell Excel to refresh the query whenever you open the file
qt.RefreshOnFileOpen = True
'giving the query a name can help you refer to it later
qt.Name = "ExcelAdvancedCoursesFromWiseOwl"
'you want to import column headers
qt.FieldNames = True
'need to know name or number of table to bring in
'(we'll bring in the first table)
qt.WebSelectionType = xlSpecifiedTables
qt.WebTables = 1
The website might contain several tables; here we're choosing to import the first one only.
Finally, we need to run the query!
'import the data
qt.Refresh BackgroundQuery:=False
End Sub
When you run this macro, you should get a current list of all Wise Owl Excel VBA courses in your active worksheet.
Now that you've got a set of rows, it's up to you how you use them. Here's one example, which displays the imported course dates in a message box:
The code below would display this message at the time of writing (clearly by the time you read this, the list will be different!).
Here's some sample code to create and display this message:
Sub UsingCourseList()
Dim StartCell As Range
Dim c As Range
Dim msg As String
'go to top left cell
Set StartCell = Cells.find("Start date")
If StartCell Is Nothing Then
MsgBox "No start cell found"
Exit Sub
End If
'start message to show
msg = "Wise Owl Excel VBA courses are:" & vbCrLf
'loop over all cells in this column, adding to list
For Each c In Range(StartCell.Offset(1, 0), StartCell.End(xlDown))
'add this course into message, with blank line
msg = msg & vbCrLf & Format(c.Value, "dd/mm/yyyy")
Next c
'display results
MsgBox msg
End Sub
Importing data tables like this is much the easier of the two methods covered by this blog, but what happens when you want to get less structured data from a web page? For that, you need to parse the web page's HTML (a process known as scraping websites).
Parts of this blog |
---|
|
Some other pages relevant to the above blogs include:
From: | Flori |
When: | 28 Aug 18 at 07:06 |
Hi, thank you so much for your tutorilas. I'm a new user of VBA and HTML and this has helped me a lot!
I just tried your macro above and and it worked perfectly! I was so happy, but after trying other pages and creating them on a different tabs, suddenly the modules that worked didn't pull the information, just gave me a blank sheet. The only difference was that on the last page that I ran, I didn't have permisions to access the page, so they give me the permisions and after finishing that macro for that last page, nothing worked. Do you have any idea of what happened? Internet Explorer is open and shows the information but the excel sheet is null or empty - nothing on it.
From: | Andy B |
When: | 28 Aug 18 at 09:57 |
It's hard to get results from a website when there's an intermediate layer of security to pass through - see my colleague Andy Gould's answers to a similar question on this page.
From: | snoonan |
When: | 13 Jun 17 at 17:42 |
Would anyone know how to make this work if the website in question requires you to run a query before showing results? For example, it is a price comparison website, where you are required to enter in a post code to bring up results (aka a list of shops in a certain area that sell a product at a certain price). There is no option to view all the data for every post code. I would like to extract all the pricing info for each result, and for every post code. Help would be greatly appreciated!
From: | GoldenDaily |
When: | 11 Nov 16 at 12:57 |
Is there a way to extract more than one table or have this code loop through and bring in all tables?
From: | Andrew G |
When: | 11 Nov 16 at 18:22 |
Hi, probably the simplest way to do this is to set the WebSelectionType property to xlAllTables - here's a brief example to show you what I mean:
Sub GetCourseList()
Dim URL As String
Dim qt As QueryTable
Dim ws As Worksheet
Set ws = Worksheets.Add
URL = "https://www.wiseowl.co.uk/courses/"
Set qt = ws.QueryTables.Add( _
Connection:="URL;" & URL, _
Destination:=Range("A1"))
With qt
.RefreshOnFileOpen = True
.Name = "CoursesFromWiseOwl"
.FieldNames = True
.WebSelectionType = xlAllTables
.Refresh BackgroundQuery:=False
End With
End Sub
From: | ArunChipa |
When: | 26 Feb 21 at 04:51 |
Andrew, It is pulling the table only. is any code to pull header too..
From: | Andrew G |
When: | 26 Feb 21 at 08:28 |
Hi Arun,
QueryTables will only detect table elements on a page. The headers for each section sit outside the table so won't be included. If you wanted to extract the headers as well you'll need a more detailed approach - this video is a good place to start
https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/web-scraping-selenium-chrome/
I hope it helps!
From: | diegoalejo15 |
When: | 23 Mar 17 at 10:46 |
Thanks Andy, do you know how to apply this to a site that requires login and the table is in a asp link (i mean it´s not in the main page). I can do it through the "from web" excel button but after a while the session expires and have to do it again that´s why I want to use vba.
Regards
From: | Andrew G |
When: | 23 Mar 17 at 21:35 |
That depends on the website and how the username and password are being requested but query tables probably aren't the way to go here. Here's an example of one technique using an XMLHTTPRequest http://stackoverflow.com/questions/17686375/querytable-authentication
Alternatively, if the credentials are entered into textboxes on the webpage you may need something a little more like this using Internet Explorer http://stackoverflow.com/questions/10506833/data-from-password-protected-website-using-vba Of course, you'll need to identify the names of the user name and password text boxes for the specific page you're attempting to access.
We have videos explaning the basics of using XMLHTTPRequest and Internet Explorer (including how to follow links on a webpage) which you might find useful although you'll have to tailor the code to suit the page you're attempting to use:
https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/scraping-web-pages/
https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/scraping-multiple-web-pages/
https://www.wiseowl.co.uk/vba-macros/videos/vba-scrape-websites/downloading-files/
I hope that helps to point you in the right direction!
Kingsmoor House
Railway Street
GLOSSOP
SK13 2AA
Landmark Offices
99 Bishopsgate
LONDON
EC2M 3XD
Holiday Inn
25 Aytoun Street
MANCHESTER
M1 3AE
© Wise Owl Business Solutions Ltd 2024. All Rights Reserved.