ETL with Apache Airflow: PostgreSQL and S3 connection

Yesi Days
3 min readMar 22, 2023
Apache Airflow

Managing and processing data are crucial tasks for businesses of all sizes. Extract, Transform, and Load (ETL) pipelines enable organizations to move data between systems and transform it into useful information. In this post, we’ll explore how to configure the connection from PostgreSQL and S3.

If you need more information about “How to build a data pipeline," you can read my previous post.

Now if we go step by step to build our ETL.

Install and set up Airflow: First, install Apache Airflow on your system by following the official installation guide.

Initialize the Airflow database: After installation, initialize the Airflow database by running the following command:

airflow db init

Start the Airflow web server and scheduler: Run the following commands in separate terminals to start the Airflow web server and scheduler:

airflow webserver --port 8080
airflow scheduler

To use S3 is important to run the next command; for more information, you can review the documentation:

pip install 'apache-airflow[amazon]'

Create a new DAG (Directed Acyclic Graph): In Airflow, workflows are represented as DAGs. Create…

--

--

Yesi Days
Yesi Days

Written by Yesi Days

GDE Machine Learning | Data Scientist | PhD in Artificial Intelligence | Content creator | Ex-backend