Journey into Data Engineering
1. Introduction: From Intern to Data Engineer
My journey as a Data Engineer began with curiosity — how do banks turn millions of data points into insights that drive lending or risk strategy? Over time, I found myself designing ETL pipelines, automating workflows, and even building machine learning pipelines to predict loan delinquency.
Before becoming a full-time Data Engineer, I started as a DE intern for 3 months. That experience helped me understand the fundamentals of data pipelines, governance, and business reporting. Later, I joined a bank where I spent 1.5 years as a Data Engineer, supporting business intelligence, automation, and analytics for multiple departments.
2. Core Projects & Technical Contributions
2.1 Building Data Models for Business Intelligence
- Tools: SQL Server, Power BI, ETL scripting
- Summary: Built data models and conducted analysis to support BI and decision-making across departments.
- Impact: Contributed to 10+ Power BI reports, improving data accessibility and operational visibility.
- Reflection: Learned how to balance data accuracy with performance and how visualization drives real business decisions.
2.2 Machine Learning for Loan Delinquency Prediction
- Tech: Python, Scikit-learn, Power BI integration
- Summary: Developed and deployed ML models to predict SFR loan delinquencies.
- Impact: Improved prediction accuracy by 15% compared to traditional methods.
- Reflection: Learned how to integrate ML workflows into production pipelines and how model outputs can empower risk management.
2.3 PowerShell-Based ETL Optimization
- Tech: PowerShell, SQL Server
- Summary: Re-engineered 20+ ETL pipelines for better performance and maintainability.
- Impact: Achieved 42% faster data model updates and more reliable scheduling.
- Reflection: Gained experience in optimizing scripts, automating data refreshes, and designing modular ETL components.
2.4 Bloomberg API Data Pipeline
- Tech: Python, Bloomberg API
- Summary: Designed a pipeline to process Bloomberg financial data and automate ingestion workflows.
- Impact: Reduced third-party data dependency, saving $23,760 annually.
- Reflection: Learned how to integrate external data sources securely and manage cost efficiency in data design.
3. Expanding into Cloud Data Engineering (AWS)
3.1 Using AWS EMR Studio with PySpark
At daily work, I explored distributed data processing on AWS EMR Studio to handle larger datasets and parallelize data transformations.
Learned how EMR differs from local Spark setups and practiced writing PySpark scripts to scale data workflows efficiently.
3.2 Exploring AWS SageMaker
Currently learning AWS SageMaker for end-to-end machine learning model training and deployment.
My goal is to bridge Data Engineering and Data Science — from building clean datasets to automating model pipelines in the cloud.
4. Reflection: From Data Movement to Data Impact
Working as a Data Engineer in banking taught me that data work goes beyond pipelines and dashboards.
It’s about building trustworthy data systems that allow analysts, strategists, and executives to make informed decisions.
I’ve learned:
- Data quality and observability are critical foundations.
- Automation saves time, but clarity and documentation save teams.
- Collaboration with business users is key to delivering meaningful insights.
5. Future Roadmap & Learning Goals
Short-Term
- Master AWS services like SageMaker, Glue, and Redshift.
- Build a small personal project using AWS data lake architecture.
Long-Term
- Contribute to open-source data tools.
- Write deep-dive case studies on production-scale data systems.
- Mentor others transitioning into data engineering.
⚙️ My Data Engineer Toolkit
- Languages: Python, SQL, PowerShell
- Frameworks: PySpark, Scikit-learn
- Tools: Power BI, AWS EMR, SageMaker, Bloomberg API
- Practices: Modular ETL design, pipeline optimization, logging, documentation
🪞 Lessons Learned
- Clean, well-documented data pipelines scale better than complex ones.
- Performance tuning is as much art as engineering.
- The best Data Engineers understand business context as deeply as data architecture.
Last updated: October 2025
Author: Grace L.