LIVE Instructor-Led Courses
Dismiss

Web Scraping training course

Learn how to automate web data scraping from any type of website using Python, Beautiful Soup and Selenium.

JBI training course London UK

"The course was very comprehensive with clear examples and good exercises. It was a good to understand the capabilities and basics of Python. The material is very clear. I like the fact that additional material has been shared for practice."

EW, Electronic Engineer, Python, January 2021

Public Courses

01/04/24 - 3 days
£1795 +VAT
13/05/24 - 3 days
£1795 +VAT
24/06/24 - 3 days
£1795 +VAT

Customised Courses

* Train a team
* Tailor content
* Flex dates
From £1200 / day
EDF logo Capita logo Sky logo NHS logo RBS logo BBC logo CISCO logo
JBI training course London UK

  • Gain an introduction to Python and Web Scraping
  • Install and launch the Python Code Editor IDLE 
  • Create and run your first Python program - Hello World!
  • Learn what Python is used for and why you should care
  • Use basic datatypes to learn concepts such as statements, expressions and block structure 
  • Split string into a list and learn basic list operations including searching and sorting
  • Explore container types that give you real-world examples of reading, processing and saving data 
  • Use functions for code reuse and structure in Python and move from scripts to programs
  • Learn variables in detail with types and modules
  • Explore Python Standard Library 
  • Gain an introduction to the object-oriented features of Python
  • Acquire knowledge of Web Scraping JavaScript rendered sites with Python and third-party libraries
  • Learn Web Scraping with BeautifulSoup and Selenium
  • Launch Selenium in a real browser from Python, and navigate a website from a Python program

COURSE AGENDA

Day 1


Starting Out: The Python Code Editor IDLE
•    Getting started: does everyone have Python installed and can launch IDLE (or their preferred code editor)
•    Hello World! Creating and running your first Python program.
•    The command line. Running Python scripts from the command line.
•    The interactive interpreter, your best friend for experimenting with Python.
•    Resources: ensuring everyone has the course material and is able to find and use the exercises, example code and data.

 

What is Python
A brief introduction to Python, what it is used for and why you should care.

 

The Basic Datatypes
Along the way in this section we’ll come across concepts like statements and expressions, comments and block structure by indentation. All vital concepts when programming with Python.
Working with Numbers
•    Integers and floating point numbers
•    The dangers of floating point 
•    Basic maths operations
•    The math module
•    Converting numbers
•    A note about other number types (decimals and fractions)
Working with Text
•    The string data-type
•    The print function
•    Different kinds of quotes including multi-line strings
•    Escaping data in strings (with a note about string formatting, covered in more detail later)
•    Unicode, encodings (for sending, receiving and storing text) and fancy characters
•    String slicing and indexing
•    The len function and in operator
•    String methods
•    Asking for data from the user with the input function
•    Converting data to strings with str and repr
Basic Code Structure
•    Loops
•    If/else blocks 
•    A discussion of True and False
•    Comparisons
•    The while statement
•    The pass statement
•    Attributes and function calls

 

The Container Types


The List
•    Splitting strings into a list
•    Basic list operations
•    Iterating over a list
•    Searching a list
•    List methods
•    Sorting
Dictionaries
•    The dictionary type
•    When to use a dictionary rather than a list
•    Dictionary operations and methods
The file Type
•    Reading text to and from a file
•    Bringing it together, a real world example of reading, processing and saving data with files and the basic data types

 

Functions


Functions are the most basic element of code reuse and structure in Python. Moving from scripts to programs.
•    The def statements
•    Taking arguments
•    Returning values
•    Function scope, globals and local variables
Errors and Exception Handling
•    What happens when things go wrong
•    What is an exception
•    Catching exceptions
•    Raising exceptions
Data Handling
•    None
•    Tuples, how are they different from lists
•    Tuple packing and unpacking
•    Lists of tuples, real world data handling
•    Formatting output with string formatting, writing CSV files
•    Working with sequences (len, max, min, indexing and slicing)
Looping Revisited
•    The for loop and iterables
•    The loop variables
•    The break statement
•    The continue statement
•    Using range to loop over numbers
•    Keeping a count with enumerate
•    Looping with tuples and multiple variables
•    List comprehensions, a handy shortcut
 

Day 2

 

Variables in Detail
•    What happens with assignment
•    It’s a name not a variable
•    References, assignment never copies
•    Reassigning names
•    Identity and equality
•    Scope revisited
Types
•    Everything has a type
•    Type converter functions
•    Checking the type
•    Everything is an object
Program Structure: Functions Revisited and Modules
•    Organising your functions
•    Default values and keyword arguments for functions
•    Multiple return values
•    Functions don’t receive copies (mutable arguments)
•    Importing functions from modules
•    Module as namespace
•    The standard library and third party modules
•    Importing executes code
•    Scripts as programs and as modules (the “main” module)
The sys module
•    Introduction to sys
•    Module search path and command line handling
•    The input and output streams
A Quick Tour of the Python Standard Library
•    The os module and os.path (working with the underlying platform and files)
•    Shell operations with shutil (more working with files)
•    The time and datetime modules
•    The subprocess module
•    Regular expressions (a more powerful way to work with text)
•    json encoding and decoding
•    The random module for random numbers and data

 

Object Orientation


An introduction to the object oriented features of Python. Mostly to understand Python objects and libraries rather than to write new classes.
•    Object orientation in a nutshell
•    Objects for wrapping up data and methods to work on them
•    Using objects (hint: we’ve already done a lot of it)
•    The class statement
•    Functions in a class as methods
•    The self parameter
•    The __init__special method
•    Instance data and attributes
•    A brief discussion of inheritance
•    A practical example of inheritance, creating new exceptions
•    Other magic methods, using string conversion as the example
•    Attribute access from the outside (with getattr and friends)
•    The inner working of objects (objects as dictionaries – the deepest secret of Python) (optional topic dependant on time)
•    Properties (optional topic dependant on time)


Day 3

 

Web Scraping with BeautifulSoup and Selenium
The goal of this day is to teach the principles and nuts and bolts of web scraping Javascript rendered sites using Python and third party library. By the end of the day attendees will be able to create simple scripts and programs that can visit websites, interact with them by entering search terms and navigating the site, and pull out structured data that can be written to data files on the computer. The advanced details of using CSS selectors and XPATH to pull out the data won’t be covered in full, as these are huge topics, but enough information to work with most websites will be provided. Along the way tips and techniques will be delivered to enable attendees to self-teach and work out for themselves, with a little help from google, more complex scenarios. Sample data will be provided for attendees to wrestle with some moderately complex examples.
BeautifulSoup
•    Making a web request with the requests module
•    Reading an html page with BeautifulSoup
•    Extracting data by tag and the tag type
•    A note about parsers (html_parser, lxml and html5lib)
•    Navigating the DOM (Document Object Model)
•    Where’s my data? Attributes on tags
•    Pulling data out into Python objects and writing it to files
Selenium
•    Launching a real browser from Python and the interactive interpreter
•    The basic API: navigating a site, finding elements, extracting data
•    More advanced interactions: clicking links and buttons, entering text, selecting radio buttons, scrolling into view, iframes (etc)
•    Navigating a website from a Python program
•    CSS selectors and XPATH. View source is your friend

 

Cont... 


•    Why do we have to wait? Javascript rendered sites, some practical techniques
•    Show me the source: how we view real html (the DOM) from a Javascript website
Best of Friends: BeautifulSoup with Selenium
In this section we will pull together what we’ve learned of the Python programming language, and both BeautifulSoup and selenium to extract structured data from complex websites.
•    Fetching a page with selenium and handling it with BeautifulSoup
•    Pulling out data into Python collections and re-structuring the data with simple loops and data processing
•    Formatting the data for output and writing files
•    Debugging techniques and exploratory sessions
•    Further examples of using the selenium API to navigate and interact with websites and using the combination of CSS selectors, XPATH and the BeautifulSoup API to work only with the data we’re interested in
•    Better ways to work with structured data using the pandas library and different output formats (optional topic)
 

JBI training course London UK

Marketers, SEO practitioners, Lead generators and Data Vendors. 


5 star

4.8 out of 5 average

"The course was very comprehensive with clear examples and good exercises. It was a good to understand the capabilities and basics of Python. The material is very clear. I like the fact that additional material has been shared for practice."

EW, Electronic Engineer, Python, January 2021



“JBI  did a great job of customizing their syllabus to suit our business  needs and also bringing our team up to speed on the current best practices. Our teams varied widely in terms of experience and  the Instructor handled this particularly well - very impressive”

Brian F, Team Lead, RBS, Data Analysis Course, 20 April 2022

 

 

JBI training course London UK

Newsletter

 

Sign up for the JBI Training newsletter to stay updated with world-class technology training opportunities, including Analytics, AI, ML, DevOps, Web, Backend and Security. Our Power BI Training Course is especially popular.  Gain new skills, useful tips, and validate your expertise with an industry-leading organisation, all tailored to your schedule and learning preferences.



Our Web Scraping training course is an immersive three-day event with the goal of teaching technically minded non-programmers the basics of Python and modern Web Scraping technologies. They will leave the course ready and able to extract useful data from all types of websites, including those rendered in JavaScript. 

The course is fast-paced, interactive, practical and hands-on, and is presented in an approachable way for non-programmers.  

Erreur
Google Apps Script
Vous devez disposer des autorisations requises pour pouvoir effectuer cette action.

CONTACT
+44 (0)20 8446 7555

[email protected]

SHARE

 

Copyright © 2023 JBI Training. All Rights Reserved.
JB International Training Ltd  -  Company Registration Number: 08458005
Registered Address: Wohl Enterprise Hub, 2B Redbourne Avenue, London, N3 2BS

Modern Slavery Statement & Corporate Policies | Terms & Conditions | Contact Us

POPULAR

Rust training course                                                                          React training course

Threat modelling training course   Python for data analysts training course

Power BI training course                                   Machine Learning training course

Spring Boot Microservices training course              Terraform training course

Kubernetes training course                                                            C++ training course

Power Automate training course                               Clean Code training course