1 November 2017
What is Python?
Python courses are quite popular, this is because python is a clear and powerful object oriented programming language that runs on all platforms. Due to the simplicity of the language and the countless applications of its built-in functions, Python has simplified the ability to retrieve and manipulate data from the web. As a result, there has been a surplus in the use of Python for web development, specifically for search engines. This is made evident when looking at Organisations Using Python page, which includes both Google and Ultraseek.
The concept behind a basic search engine:
Lets walk through the steps in building a basic search engine to better understand the applications of Python for developing a search engine. There are several stages when building a search engine. These include:
- Creating the Index
- Querying the Index
Creating the Index:
When creating a search engine, the first step is to create an inverted index of all the web pages that will be searched. This will involve parsing and tokenising all of the information on each web page. Punctuation and whitespace must be removed, as well as irrelevant words (a, I, or, and, etc.) if necessary. The parsed words must also be converted to lowercase and stored in order to allow for phrases to be searched. Once the index is complete, all the websites being used by the search engine should be completely mapped out. The aforementioned ability of Python to retrieve data from the web and manipulate text makes it extremely useful for parsing and tokenising the data from the relevant web pages.
Querying the Index:
The next stage involves accepting queries, formatting them accordingly, and then searching the index for the web pages containing the word or phrase that has been queried. Queries can be formatted in a similar way that words are parsed for the index, i.e. words are separated and made lowercase, whitespace is removed, etc. The search engine should then perform a search of the index and return the positions and web pages where each word appears. This task is also simplified due to the built-in functions provided by Python that are often not included in other programming languages.
The final stage of creating a search engine involves ranking the results that were returned by the query using several formulas. These formulas are used to determine a score for each web page based off its relevance to the original query. The links to each page can then be displayed in the order of their score.
For more more information about our range of courses:
Subscribe to our Newsletter – Receive the latest info on Tech courses & insights Subscribe