Tutorial on Web Scraping with Python, NoSQL, and FastAPI with Scheduled Execution

Posted by

In this tutorial, we will cover how to build a web scraping application using Python, NoSQL, and FastAPI. We will use FastAPI to create a simple web server that scrapes a website on a schedule and saves the data to a NoSQL database.


To follow along with this tutorial, you will need the following tools installed on your machine:

  • Python 3.x
  • FastAPI
  • MongoDB (or any other NoSQL database of your choice)
  • Requests library
  • APScheduler

Step 1: Setup the environment

First, create a new directory for your project and create a virtual environment inside it. You can do this by running the following commands:

mkdir web_scraping_app
cd web_scraping_app
python3 -m venv venv

Step 2: Install dependencies

Activate the virtual environment by running:

source venv/bin/activate

Then, install the required libraries using pip:

pip install fastapi uvicorn pymongo requests apscheduler

Step 3: Create a scraper module

Create a new Python module called scraper.py and add the following code to it:

import requests
from bs4 import BeautifulSoup

def scrape_website(url):
    response = requests.get(url)
    soup = BeautifulSoup(response.text, 'html.parser')

    # Add your scraping logic here

    return data

Step 4: Set up the FastAPI app

Create a new Python module called main.py and add the following code to it:

from fastapi import FastAPI
from apscheduler.schedulers.background import BackgroundScheduler
from scraper import scrape_website

app = FastAPI()

scheduler = BackgroundScheduler()

def start_scheduler():
    scheduler.add_job(scrape_website, "interval", hours=1, args=[url])

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="", port=8000)

Step 5: Connect to the NoSQL database

Modify the scraper.py module to save the scraped data to a NoSQL database. For example, if you are using MongoDB, you can add the following code to the module:

import pymongo

client = pymongo.MongoClient("mongodb://localhost:27017/")
db = client["web_scraping_app"]
collection = db["scraped_data"]

def save_to_database(data):

Step 6: Run the application

Start the FastAPI server by running:

python main.py

Your web scraping application will now run on a schedule, scraping the website every hour and saving the data to the NoSQL database.


In this tutorial, we have seen how to build a web scraping application using Python, NoSQL, and FastAPI. You can customize this application by adding more complex scraping logic, using a different NoSQL database, or scheduling the scraping at different intervals. Happy coding!

0 0 votes
Article Rating

Leave a Reply

Newest Most Voted
Inline Feedbacks
View all comments
11 days ago

0:00:00 Welcome
00:00:58 Demo
00:12:58 Overview & Requirements
00:15:22 Project Setup
00:19:26 Start the Python & Cassandra Integration
00:25:11 Configure Python cassandra-driver
00:30:13 Your First Cassandra Model
00:36:08 Create Data using our Cassandra Model
00:43:20 Adding a New Column to an Existing Model
00:46:26 Using UUID1 as Primary Key
00:55:19 Using Jupyter with Cassandra Models
01:06:35 Using Pydantic for Data Validation and Cleaning
01:14:22 FastAPI & Environment Variables
01:21:26 FastAPI + Cassandra & Pydantic
01:36:01 Convert Cassandra UUID Field to Pydantic Datetime Strv
01:44:50 Endpoint to Ingest Data for FastAPI & AstraDB
01:56:41 Celery, Redis & Basic Task Offload
02:13:18 Integrate Cassandra Driver with Celery
02:23:47 Running Periodic Tasks
02:35:24 Basic Scraping with Selenium
02:45:35 Selenium & JavaScript Endless Scrolling
02:52:52 requests-html & Parsing Data
03:11:34 Implement the Scrape Client Parser
03:27:20 Putting it all together
03:36:58 Thank you

11 days ago

@CodingEntrepreneurs is there a easy way fixing issue on mac M1 with how creds are extracded from zip into temporary folder?

11 days ago

Hey how do I delete all that Old data from the DB?

11 days ago

This is pure gold dude, also breaking it up on phases.

11 days ago

Please how do i drop a table if i made an error

11 days ago


11 days ago

very very goood.

11 days ago

I really like how you split such a long tutorial for short parts. Especially, this music transition really draws my attention back. Love this beat, amazing job!

11 days ago

Its amazing, I learned a lot

11 days ago

Awesome! it's my wish to learn more from you. Thanks!

11 days ago

Justin for real a web scraping project (price comparison site) what you prefer to use : this configuration or a Django Rest Api + React Frontend ?

11 days ago

I had some issues with Apple's new architecture, like some dependencies and packages on different architecture.
But the tutorial is insane, thanks

11 days ago

It would be supberb if you could provide an alternative for Celery because Celery seems to be not working on Windows. I had to stop following the video because I couldn't make it work. But thanks a lot, the other things helped me big time

11 days ago

Hi, Could you please advise me some online/pdf book to learn FastAPI along with Cassandra? Thanks so much.

11 days ago

Hey Justin I'm from India and I start recently watching your videos.your videos are amazing love from India.🙂

11 days ago

Hey! Thank you so much! Can you show an example how to scrape AJAX pagination with selenium?

11 days ago

Amazing tutotial, except the music between chapters ))

11 days ago

Hello sir i really love your Videos. Sir i can't solve django rest framework email verification and reset password problem. Can you please help us about it?

11 days ago

is it really necessary to clone git repo

11 days ago

About the UUID1 datetime : that day 'started' the gregorian (current) calendar. BTW, thanks for this great tutorial !!!

Would love your thoughts, please comment.x