Employment Experience
Cred-IQ
Senior Software/Data Engineer(August 2023 - Present)
Cred-IQ is a small commercial real-estate data startup that aggregates mainly CMBS related data on properties accross the country. At time of writing, the company employs around 20 people- 6 of those (including myself) being responsible for the engineering. Because we have such a small team, I am responsible for a wide range of initiatives. Initially, 90% of my work surrounded data pipelines and 10% was typically focussed on infrastructure to support these pipelines. During 2024, my role shifted to focus more on sys-admin, cloud infrastructure, architecture design, and backend development. I am also typically the engineer who interfaces with clients during custom development initiatives. With these new responsibilities, I still maintain several data pipelines and help with data architecture.
The main technologies I work with on a day to day basis:
- AWS (Many different services)
- Rust (API / Backend)
- Python (Data Pipelines)
- Airflow (Orchestration)
- Docker
- Postgres
- Snowflake
- Linux Environments
Navigator
Data Engineer(April 2022 - August 2023)
When I first joined Navigator, the company had just under 50 people, and as of July 2023 that number is around 25. Working for a startup has been a great experience. I have had the opportunity to take on entire projects myself, provide insight into best practices within software development, and guide strategy for long term efforts I believe the data team should embark on. Navigator provides a visualization platform to sit on top of a client’s commercial real estate data. We build dashboards for clients across the industry including but not limited to, multifamily, office, industrial, retail, investment, brokerage, development, and occupier. Because of this, I have had the opportunity to build system integrations with large commercial real estate data warehouse platforms like Yardi, MRI, Onesite, and Snowflake. Our infrastructure primarily resides on Azure, utilizing a number of different Azure products. For the most part, my job is to build custom ETL pipelines to ingest data from various systems that a client may use to house their data in an effort to centralize this data inside of our platform so that we can build a visualization layer on top of it. A few key projects I have been involved with:
Main Responsibilities
Designing, Scoping, and Implementing Custom ETL Pipelines and Ingestion Systems
Because the platform sat atop Azure, I built integrations around Azure’s blob storage system as the staging area for raw data, with initial data pulls being performed via sftp pushes and pulls, data received via email, data pulled via API, data scraped from websites, via box or Sharepoint, and several others. I primarily utilized python to take this raw data and transform it into a useable form for the systems and then utilized SQL to ingest the data into the SQL servers. These processes were typically built ontop of Azure Functions to provide a cost effective, serverless (microservices) architecture for ingestion, but I also utilized tools like Azure Logic Apps, Azure Data Factory, and MongoDB for this task.
Designing, Developing, and Implementing Standard CRE Data Architectures
During my time at Navigator, I was fortunate enough to be part of an initiative we call cornerstone. Cornerstone is a concept that moves away from custom development of client data needs by providing a standardized architecture that is tailored to the specific portfolio use case of a particular client (multifamily, office, industrial, etc). This approach allowed us to bring down the turnaround time from inception to deployment significantly by offering the ability to ‘plug and play’ a client’s data with standard dashboards. I designed and built the entire ingestions system for our cornerstone offering and it has since become the most popular product that is offered. The idea here was to build a system that takes in client data that can be very different from other client data, and conforms it to a standard so that at the other end of my processes the structure of the data looks the same across all clients. There was still nuance from client to client mainly around custom account trees and finance structures that have to be dealt with on a case by case basis, but for the most part this approach has opened up the possibilities within the industry.
Deployed Custom Apache Airflow Instance
During my time at Navigator, I regularly encountered datasets that pushed the limits of the existing Azure infrastructure. Free tier Azure Function have significant limits on runtime and memory usage, so once datasets hit a certain size a normal Azure Function will timeout when attempting to ingest the data. To solve this, I designed and built a system I call the Hefty Data Loader. It gets around these restrictions by utilizing a queue trigger system for Azure functions. It consists of an orchestrator and a queue processing function. The orchestrator determines how to split up a particular dataset, the transformations that are needed, the destination, etc, and then splits up the dataset into manageable chunks to process individually. The queue processing function will transform the data based on the orchestrator’s instructions and send it to the required destination, but it does this for each chunk of the dataset individually. Utilizing this system we can easily split a 5 million row dataset into 10 thousand row chunks and process each individual chunk without hitting the limits of Azure Functions. I could have opted to pay more for premium functions to account for this, but as we are a startup I designed for a cost effective solution that gets around this issue.
Cognizant
AI&A Data Engineer(June 2021 - April 2022)
3M CNN
My first and shortest project, I was brought in to design an image classification machine learning model to classify on a scale of 1 (bad) to 5 (good / passable for delivery) images of paint jobs done by 3M robots. The client had several problems with rejected products due to poor paint job quality from their automated robots. The idea was to catch these during the painting process as to reduce overhead on the tail end of their pipeline. I worked for 3 weeks on several models for different classification metrics (paint texture, color, and consistency). The project was put on hold before we really got off the ground due to budget constraints on the client’s end.
HBO Max Data Transformation Automation
This project was not my main project for HBO Max, but is notable because I was able to produce a measurable impact very quickly. I was brought in over Christmas break to aide another team with their daily tasks while my whole team and much of the other team took time off to travel. HBO Max’s pipeline for getting content onto the platform is much more disparate and complex than you would think, and this team’s job was to manually aggregate data across 5 different systems, from Snowflake to Google BigQuery to excel documents generated and sent over email, and a few others. Every week they would spend hours determining the stage and status of content in the pipeline by aggregating all of this data into SmartSheets (basically online excel) for the client.
When I joined the team on a temporary basis, I quickly realized a significant amount of their work could be automated with a few python scripts and got to work on developing those. At the end of January I was able to train the team on these scripts and deploy them to aide their work-stream. This saved the team 10 hours a week on average usually spent doing manual ETL. Because of this work, I won the Cognizant Silver award.
HBO Max Dashboard Data Aggregation Project
My main project with HBO Max was building a system that aggregates data across the entire content pipeline into a single data warehouse (Snowflake) and to build a dashboard so that the client could track any piece of content on its journey through the pipeline. During my time on this project, I had 4 projects managers (at the same time) and one other technical resource.
The project managers all had other projects under their wing, so I ended doing almost all of the:
- Sprint Planning
- Client Interfacing
- Requirement Discovery
- Infrastructure Design
- Implementation
My other technical resource was not familiar with python or data warehousing techniques, and was there primarily for building the dashboard in PowerBI. Because of this, a large portion of the technical workload fell to me as well as the project management portion. I met with teams on the client’s end from around the globe to understand each system that I needed to integrate as well as gain access to these systems. The access portion proved to be the biggest headache of the project, as it would often take weeks for access to make its way through all of the red tape.
The client already had an enterprise Apache Airflow instance that I was allowed to build my ETL processes on top of. I was able to successfully integrate around half of the needed systems into their Snowflake Warehouse before I took a job at Navigator.
Cognizant Silver award
I was awarded the Cognizant Silver Award in January of 2022 for my work in aiding another project when there was a lack of bandwidth by designing and implementing scripts to automate their ETL processes- measurably reducing workloads on that team and improving turnaround for the client (see above for detailed description).
Bastian Solutions - Mobile Robotics
Software Engineering Intern(June 2019 - August 2019)
The Mobile Robotics division of Bastian Solutions is an R&D facility in Dallas, Texas focussed on specific use cases for mobile robots. For example, the primary focus during my time with the company was a robot called Ultra, whose goal was to autonomously load and unload 18 wheelers. Several working units had already shipped to clients in Japan whenever I first started, and 2 more were being worked on with additional features.
My primary responsibilities during this internship were:
- Aiding in testing / debugging the main Ultra robot to get it ready to ship to the customer
- Repairing, updating, and improving upon an old, smaller robot
- My main intern project: solving an design flaw / issue with the facility’s waterjet (used to cut out parts for the robots)
Testing / Debugging
Fault Detection Logging System
I developed a script that constantly cycles image and position data through a buffer and dumps the relevant data into a formatted document and displays it on the Human Machine Interface (HMI) when a fault occurs. This allows a technician or engineer to quickly and easily determine the problem that occurred, especially if they were not watching the robot at the time of fault.
Loading Boxes, Debugging Faults
I also did allot of manual labor, moving boxes onto conveyor belts for the robot to load into an 18 wheeler during tests. This was certainly strenuous but was super fun because I was also learning allot about the robot’s operation and helping debug faults when they occurred.
Small Robot
There was a small robot that was one of the first robots built at the facility as a test bed, and to test ROS on. It had not been booted or been working for a long time, and had been stripped for parts mostly. My task was to rebuild it so that it can be used as a testing platform again.
I added Lidar sensors, repaired what electronics needed to be repaired, got it booted and working again, wiped the computer and re-installed a newer version of Ubuntu, as well as a new version of ROS (Robot Operating System). During my free time I was able to learn and practice ROS in Python and C++ on this platform, and was super fun to play around with.
Intern Project
I was the Software Engineering Intern, and there was 1 EE and 1 ME intern as well. As a group we were to solve an issue with the facility’s waterjet.
The waterjet was used to cut out robot parts, and used a tank of sand as an abrasive material during this process. The sand was essential to this process and if the sand tank became empty during a cut, the waterjet would not stop or detect this and therefore continue on its job without cutting. In these instances the parts would have to be completely scrapped and started over, wasting a significant amount of time and money (this problem occurred allot). There was no fill indicator or any way of determining how much sand was in the tank as it came.
We designed a simple, yet elegant system using a few 3d printed parts, an arduino, some load sensors, and a solenoid to build a scale that sits underneath the sand tank and can be calibrated to its empty level as well as the full level with a few buttons. An LCD screen output the % fill and would change color to yellow if under 30% fill and red if under 15% fill. Each color change would also be accompanied by a unique series of loud chimes to alert technicians. At 5% the system would actuate a solenoid that was mounted above an E-Stop and would halt the waterjet so that the tank could be refilled and the job resumed from where it was paused without scrapping the part. I wrote all the code for this in C (obviously because it was an arduino). This project was super fulfilling because we were able to build and complete it and see a measurable impact with our work in solving a major issue the facility constantly ran into.
CitiGroup
Software Engineering Intern(July 2020 - August 2020)
My CitiGroup internship was fairly short, because this was just after COVID began and they were not prepared to do the program completely remotely. Each week we were put in a different group of interns and given a project to be completed by the end of the week. There were about 5 projects over 6 weeks.
The most notable project for me was the last one, in which I built a tool to generate 3D visualizations of options chains across all expiration dates for several metrics including implied volatility, Beta, Theta, Open Interest, Strike Price, Volume, etc. I then deployed this to an AWS EC2 instance with some basic HTML so that we could feed in a ticker and dynamically generate and render the charts. It was pretty barebones but very fun to build. I built the visualizations using python and plotly and flask to serve it up.