GCP Data Engineering-End to End Project-Healthcare Domain

Industry Standard Project in Healthcare Domain using GCP services like GCS, BigQuery, Dataproc, Composer, GitHub, CICD

GCP Data Engineering-End to End Project-Healthcare Domain
GCP Data Engineering-End to End Project-Healthcare Domain

GCP Data Engineering-End to End Project-Healthcare Domain udemy course

Industry Standard Project in Healthcare Domain using GCP services like GCS, BigQuery, Dataproc, Composer, GitHub, CICD

  • This project focuses on building a data lake in Google Cloud Platform (GCP) for Revenue Cycle Management (RCM) in the healthcare domain.

  • The goal is to centralize, clean, and transform data from multiple sources, enabling healthcare providers and insurance companies to streamline billing, claims processing, and revenue tracking.

  • GCP Services Used:

    • Google Cloud Storage (GCS): Stores raw and processed data files.

    • BigQuery: Serves as the analytical engine for storing and querying structured data.

    • Dataproc: Used for large-scale data processing with Apache Spark.

    • Cloud Composer (Apache Airflow): Automates ETL pipelines and workflow orchestration.

    • Cloud SQL (MySQL): Stores transactional Electronic Medical Records (EMR) data.

    • GitHub & Cloud Build: Enables version control and CI/CD implementation.

    • CICD (Continuous Integration & Continuous Deployment): Automates deployment pipelines for data processing and ETL workflows.

  • Techniques involved :

    • Metadata Driven Approach

    • SCD type 2 implementation

    • CDM(Common Data Model)

    • Medallion Architecture

    • Logging and Monitoring

    • Error Handling

    • Optimizations

    • CICD implementation

    • many more best practices

  • Data Sources

    • EMR (Electronic Medical Records) data from two hospitals

    • Claims files

    • CPT (Current Procedural Terminology) Code

    • NPI (National Provider Identifier) Data

  • Expected Outcomes

    • Efficient Data Pipeline: Automating the ingestion and transformation of RCM data.

    • Structured Data Warehouse: gold tables in BigQuery for analytical queries.

    • KPI Dashboards: Insights into revenue collection, claims processing efficiency, and financial trends.