Exceptional training for PYTHON and R DEVELOPERS

Web Scraping Training Course

Learn how to automate web data scraping from any type of website using Python, Beautiful Soup and Selenium.

NEXT COURSE
8 Apr London
request info

Capita Marks and Spencer Telefonica Cisco BBC Lloyds Sony

Web Scraping training course (code: PYWEBSCR)

TRAINING COURSE OVERVIEW

Our Web Scraping training course is an immersive 3 day event with the goal of teaching technically minded non-programmers the basics of Python and modern web scraping technologies so that they leave the course ready and able to extract useful data from all types of websites, including those rendered in javascript. 

The course is a fast paced, interactive, practical and hands-on that is presented in an approachable way for non-programmers.  

AUDIENCE

Marketers, SEO practitioners, Lead generators and Data Vendors. 


DETAILHIGHLIGHTS

COURSE AGENDA

Day 1
Starting Out: The Python Code Editor IDLE
•    Getting started: does everyone have Python installed and can launch IDLE (or their preferred code editor)
•    Hello World! Creating and running your first Python program.
•    The command line. Running Python scripts from the command line.
•    The interactive interpreter, your best friend for experimenting with Python.
•    Resources: ensuring everyone has the course material and is able to find and use the exercises, example code and data.

What is Python
A brief introduction to Python, what it is used for and why you should care.

The Basic Datatypes
Along the way in this section we’ll come across concepts like statements and expressions, comments and block structure by indentation. All vital concepts when programming with Python.
Working with Numbers
•    Integers and floating point numbers
•    The dangers of floating point 
•    Basic maths operations
•    The math module
•    Converting numbers
•    A note about other number types (decimals and fractions)
Working with Text
•    The string data-type
•    The print function
•    Different kinds of quotes including multi-line strings
•    Escaping data in strings (with a note about string formatting, covered in more detail later)
•    Unicode, encodings (for sending, receiving and storing text) and fancy characters
•    String slicing and indexing
•    The len function and in operator
•    String methods
•    Asking for data from the user with the input function
•    Converting data to strings with str and repr
Basic Code Structure
•    Loops
•    If/else blocks 
•    A discussion of True and False
•    Comparisons
•    The while statement
•    The pass statement
•    Attributes and function calls

The Container Types
The List
•    Splitting strings into a list
•    Basic list operations
•    Iterating over a list
•    Searching a list
•    List methods
•    Sorting
Dictionaries
•    The dictionary type
•    When to use a dictionary rather than a list
•    Dictionary operations and methods
The file Type
•    Reading text to and from a file
•    Bringing it together, a real world example of reading, processing and saving data with files and the basic data types

Functions
Functions are the most basic element of code reuse and structure in Python. Moving from scripts to programs.
•    The def statements
•    Taking arguments
•    Returning values
•    Function scope, globals and local variables
Errors and Exception Handling
•    What happens when things go wrong
•    What is an exception
•    Catching exceptions
•    Raising exceptions
Data Handling
•    None
•    Tuples, how are they different from lists
•    Tuple packing and unpacking
•    Lists of tuples, real world data handling
•    Formatting output with string formatting, writing CSV files
•    Working with sequences (len, max, min, indexing and slicing)
Looping Revisited
•    The for loop and iterables
•    The loop variables
•    The break statement
•    The continue statement
•    Using range to loop over numbers
•    Keeping a count with enumerate
•    Looping with tuples and multiple variables
•    List comprehensions, a handy shortcut

Day 2

Variables in Detail
•    What happens with assignment
•    It’s a name not a variable
•    References, assignment never copies
•    Reassigning names
•    Identity and equality
•    Scope revisited
Types
•    Everything has a type
•    Type converter functions
•    Checking the type
•    Everything is an object
Program Structure: Functions Revisited and Modules
•    Organising your functions
•    Default values and keyword arguments for functions
•    Multiple return values
•    Functions don’t receive copies (mutable arguments)
•    Importing functions from modules
•    Module as namespace
•    The standard library and third party modules
•    Importing executes code
•    Scripts as programs and as modules (the “main” module)
The sys module
•    Introduction to sys
•    Module search path and command line handling
•    The input and output streams
A Quick Tour of the Python Standard Library
•    The os module and os.path (working with the underlying platform and files)
•    Shell operations with shutil (more working with files)
•    The time and datetime modules
•    The subprocess module
•    Regular expressions (a more powerful way to work with text)
•    json encoding and decoding
•    The random module for random numbers and data

Object Orientation
An introduction to the object oriented features of Python. Mostly to understand Python objects and libraries rather than to write new classes.
•    Object orientation in a nutshell
•    Objects for wrapping up data and methods to work on them
•    Using objects (hint: we’ve already done a lot of it)
•    The class statement
•    Functions in a class as methods
•    The self parameter
•    The __init__special method
•    Instance data and attributes
•    A brief discussion of inheritance
•    A practical example of inheritance, creating new exceptions
•    Other magic methods, using string conversion as the example
•    Attribute access from the outside (with getattr and friends)
•    The inner working of objects (objects as dictionaries – the deepest secret of Python) (optional topic dependant on time)
•    Properties (optional topic dependant on time)

Day 3

Web Scraping with BeautifulSoup and Selenium
The goal of this day is to teach the principles and nuts and bolts of web scraping Javascript rendered sites using Python and third party library. By the end of the day attendees will be able to create simple scripts and programs that can visit websites, interact with them by entering search terms and navigating the site, and pull out structured data that can be written to data files on the computer. The advanced details of using CSS selectors and XPATH to pull out the data won’t be covered in full, as these are huge topics, but enough information to work with most websites will be provided. Along the way tips and techniques will be delivered to enable attendees to self-teach and work out for themselves, with a little help from google, more complex scenarios. Sample data will be provided for attendees to wrestle with some moderately complex examples.
BeautifulSoup
•    Making a web request with the requests module
•    Reading an html page with BeautifulSoup
•    Extracting data by tag and the tag type
•    A note about parsers (html_parser, lxml and html5lib)
•    Navigating the DOM (Document Object Model)
•    Where’s my data? Attributes on tags
•    Pulling data out into Python objects and writing it to files
Selenium
•    Launching a real browser from Python and the interactive interpreter
•    The basic API: navigating a site, finding elements, extracting data
•    More advanced interactions: clicking links and buttons, entering text, selecting radio buttons, scrolling into view, iframes (etc)
•    Navigating a website from a Python program
•    CSS selectors and XPATH. View source is your friend
•    Why do we have to wait? Javascript rendered sites, some practical techniques
•    Show me the source: how we view real html (the DOM) from a Javascript website
Best of Friends: BeautifulSoup with Selenium
In this section we will pull together what we’ve learned of the Python programming language, and both BeautifulSoup and selenium to extract structured data from complex websites.
•    Fetching a page with selenium and handling it with BeautifulSoup
•    Pulling out data into Python collections and re-structuring the data with simple loops and data processing
•    Formatting the data for output and writing files
•    Debugging techniques and exploratory sessions
•    Further examples of using the selenium API to navigate and interact with websites and using the combination of CSS selectors, XPATH and the BeautifulSoup API to work only with the data we’re interested in
•    Better ways to work with structured data using the pandas library and different output formats (optional topic)
 

  • Introduction to Python & Web Scraping
  • The Python Code Editor IDLE
  • What is Python
  • The Basic Datatypes
  • The Container Types
  • The List
  • Functions
  • Variables in Detail
  • Python Standard Library
  • Object Orientation
  • BeautifulSoup
  • Selenium
Receive the latest version of this course into your inbox


PUBLIC COURSES (LONDON, UK)
 

8th Apr 2019 - 3 days £1795
Book

see all dates


X

Show Discount for this course


PRIVATE COURSES


  Bring a JBI course to your office
  and train a whole team onsite
  0800 028 6400
or request quote


  You can customise this course to
  suit your exact needs here
  0800 028 6400 or request quote


Get in touch
0800 028 6400




Excellent feedback, consistently !

►"great tips help reduce build times"
► "we got access to exclusive content"
► "Short course meant less time off"

►"what an inspiring trainer !"
► "colleagues at 2 sites joined via web"
► "I passed my exam the next day"


Newsletter ! Get exclusive news about upcoming programs, technical insights & special offers