Building a GUI for scraping Wikipedia using Python and PySide/PyQT

Posted by


In this tutorial, we will be building a Python GUI application using PySide/PyQT to scrape data from Wikipedia. We will be using the BeautifulSoup library to parse the HTML data from the Wikipedia page.

Step 1: Installing the necessary libraries
Before we start building our Python Wikipedia scraping GUI application, we need to install the necessary libraries. Open your command prompt or terminal and run the following commands:

pip install PySide2
pip install bs4

Step 2: Importing the required modules
Next, we need to import the necessary modules in our Python script. Create a new Python script and import the following modules:

import sys
from PySide2 import QtWidgets
from urllib.request import urlopen
from bs4 import BeautifulSoup

Step 3: Creating the GUI application
Now, let’s create the GUI application for our Wikipedia scraper. We will create a simple application with a text input field for the user to input the search term and a button to initiate the scraping process. Add the following code to your Python script:

class WikipediaScraper(QtWidgets.QWidget):
    def __init__(self):
        super().__init__()

        self.setWindowTitle("Wikipedia Scraper")
        self.setGeometry(100, 100, 400, 200)

        self.search_input = QtWidgets.QLineEdit(self)
        self.search_input.setGeometry(10, 10, 200, 30)

        self.scrape_button = QtWidgets.QPushButton("Scrape", self)
        self.scrape_button.setGeometry(220, 10, 70, 30)
        self.scrape_button.clicked.connect(self.scrape_wikipedia)

        self.result_label = QtWidgets.QLabel(self)
        self.result_label.setGeometry(10, 50, 380, 140)

        self.show()

Step 4: Implementing the Wikipedia scraping logic
Now, let’s implement the logic for our Wikipedia scraper. When the user clicks on the "Scrape" button, we will fetch the Wikipedia page for the user-input search term and display the page content in the result label. Add the following code to your Python script:

    def scrape_wikipedia(self):
        search_term = self.search_input.text()
        url = f"https://en.wikipedia.org/wiki/{search_term}"

        response = urlopen(url)
        html = response.read()

        soup = BeautifulSoup(html, "html.parser")

        paragraphs = soup.find_all("p")

        result_text = ""
        for p in paragraphs:
            result_text += p.get_text() + "nn"

        self.result_label.setText(result_text)

Step 5: Running the GUI application
To run the GUI application, create an instance of the WikipediaScraper class and start the Qt event loop. Add the following code to your Python script:

if __name__ == "__main__":
    app = QtWidgets.QApplication([])
    window = WikipediaScraper()
    sys.exit(app.exec_())

Save your Python script and run it. You should see a window with a text input field and a "Scrape" button. Enter a search term and click the button to scrape the Wikipedia page for that term.

Congratulations! You have successfully built a Python GUI application using PySide/PyQT to scrape data from Wikipedia. You can further enhance the application by adding features such as error handling, displaying images, or saving the scraped data to a file. Happy coding!

0 0 votes
Article Rating

Leave a Reply

4 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
@Topguns100
3 hours ago

only scraping wikipedia, or u can scrap other website?

@ibukunokunoye4795
3 hours ago

The links are not showing. They lead to "Error 444… Not found"

@warham1884
3 hours ago

sahi hai

@AllAboutCode
3 hours ago

Hope You Like It
Check Description For Code And UI file.
Please Comment any issues or any suggestions or improvements.
Thank's A lot and keep supporting me.

4
0
Would love your thoughts, please comment.x
()
x