About

I’m Henry Okonkwo, a dedicated Data Engineer with 2 years of expertise in designing scalable ETL pipelines, optimizing data warehouse architectures, and developing dimensional data models to support data-driven decision-making. With a decade of experience working with data and an MBA, I combine technical proficiency in SQL, Python, and Airflow with business acumen gained from the power utility and renewable energy sectors.

Skilled in integrating diverse data sources, ensuring data quality, and building self-serve analytics platforms, I excel at collaborating with cross-functional teams to translate business needs into actionable data solutions. Passionate about emerging technologies, I aim to drive innovation and create impactful solutions that enable sustainable growth.

Projects

Project 1: Olympic Data Analytics Project

This project leverages Microsoft Azure to create an end-to-end data pipeline for analyzing the Tokyo Olympics 2020 dataset. It integrates Azure Data Factory, Azure Data Lake, Azure Databricks, and Azure Synapse Analytics to transform raw data into actionable insights. Key highlights include visual dashboards in Power BI showcasing medal tallies, athlete performance, and participation trends, demonstrating scalable cloud-native data engineering techniques.

Blog Post | Source Code

Project 2: E-Commerce Data Platform Capstone

This capstone project involves building a comprehensive data platform for retail analytics, integrating OLTP (MySQL), NoSQL (MongoDB), and a data warehouse. The pipeline automates incremental data synchronization, designs reporting dashboards, and deploys a SparkML machine learning model for sales projections. It highlights advanced ETL workflows, BI dashboard creation, and the use of big data tools for real-world business challenges.

Blog Post | Source Code

Project 3: NYC Yellow Taxi Trip ETL Pipeline

This project processes over 6 million taxi trip records using Mage AI and Google Cloud. The pipeline extracts, transforms, and loads data into Google BigQuery for analysis. Key insights include revenue trends, peak demand hours, and operational metrics like trip duration and fare per mile. It showcases the application of cloud-based tools to create a scalable, efficient ETL workflow for large-scale data processing.

Blog Post | Source Code

Skills

Python

Mastery in Python for data engineering, leveraging its robust libraries and frameworks to build efficient ETL pipelines, automate workflows, and perform complex data analysis. Skilled in integrating Python with modern tools like Apache Airflow and Apache Spark for big data processing and analytics. Certified in "Python for Data Engineering" .

Course Link

Data Visualization

Expertise in crafting impactful dashboards and visualizations using Tableau, Power BI, Looker Studio and Excel to drive data-driven decision-making. Skilled in deploying visualization solutions on Microsoft Azure and Google Cloud platforms. Certified in "Data Warehouse Fundamentals".

Course Link

SQL

Proficient in designing, querying, and optimizing relational and non-relational databases using MySQL, PostgreSQL,Microsoft Access, SQLite, IBM Db2, and NoSQL systems. Skilled in creating scalable data pipelines and managing data warehouses on Azure and Google Cloud. Certified in "Databases and SQL for Data Science" and "Relational Database Administration Essentials" .

Course Link

Data Storytelling

Adept at translating complex datasets into compelling narratives using data visualization tools like Tableau and Power BI. Skilled in presenting insights from ETL pipelines, big data, and analytics projects across cloud platforms. Experienced in combining storytelling techniques with actionable insights to guide strategic decisions.

Course Link

Microsoft Azure

Expertise in leveraging Microsoft Azure services for building, deploying, and managing cloud-based data engineering and analytics solutions. Skilled in Azure Data Factory, Azure Synapse and Databricks to streamline ETL pipelines, data lake creation, and big data management. Proficient in integrating Azure services with open-source tools like Apache Kafka and Airflow for seamless data workflows.

Course Link

Mage ai

Skilled in utilizing Mage AI for building and managing data pipelines with minimal coding. Proficient in creating modular workflows for data transformation, ETL, and orchestration. Experienced in integrating Mage AI with big data tools like Spark, Hadoop, and cloud platforms like Microsoft Azure and Google Cloud to optimize performance and scalability.

Course Link

Google Cloud

Proficient in Google Cloud tools like BigQuery, Dataflow, and Cloud Storage to handle large-scale data engineering tasks. Expertise in deploying scalable ETL pipelines, managing data lakes, and optimizing query performance on Google Cloud Platform. Certified in Google Cloud Big data and Machine learning Fundamentals.

Course Link

Data Engineering

Extensive experience in designing and implementing data engineering solutions using ETL tools, big data frameworks, and cloud platforms. Proficient in managing data workflows with Apache Kafka, Apache Spark, and SQL/NoSQL databases. Skilled in building data lakes, warehouses, and pipelines across Azure, Google Cloud, and on-premise systems. Certified as an IBM Data Engineering Professional.

Course Link

About

Projects

Project 1: Olympic Data Analytics Project

Project 2: E-Commerce Data Platform Capstone

Project 3: NYC Yellow Taxi Trip ETL Pipeline

Skills

Python

Data Visualization

SQL

Data Storytelling

Microsoft Azure

Mage ai

Google Cloud

Data Engineering

Contact Me