969 days in the life of a Data Analyst

Before my Master’s, I worked as a Data Analyst at One Mount Group (leading tech ecosystem with 3 digital products) from Oct 2021 until May 2024. I was recruited by the Talent Incubation Program (Fresh Geeks) and was mentored by Vu Hoang (now PhD Student in Information Systems, CMU). I started in the VinID analytics team (lifestyle & fintech app, 13+ million users) and then worked simultaneously in the OneHousing team (proptech, 1-2 million monthly traffic) from mid 2022.

My employee card. Looks very cool doesn't it?

My work at One Mount revolved around three pillars: Business Intelligence (analytics & reporting), Data Engineering and Machine Learning. Some tasks were unconventional for an analyst, due to my tendency to gravitate towards more technical and experimental work. Below is an non-exhaustive list of the projects I contributed to along with what I did and what I learnt. I do my best to describe them without revealing sensitive information.

Machine Learning:

  • (2023) VinID Customer Income Prediction (xgboost): I leveraged our existing features store and experimented with XGBoost to classify customer into 3 income ranges (multi-class classification). The project was unsuccessful due to the lack of meaningful predictors, and time mismatch between labels (collected in 2019) and features (no data from 2019, so we used data from 2020-2022 as proxy).

  • (2023) VinID Voucher attributes (decision tree): I fitted a decision tree on vouchers with high/low redemption rate and interpreted the tree to identify the most important attributes of a voucher that would affect its redemption.

  • (2022) Onehousing x VinID Lookalike customers (catboost): I used a catboost model (binary classification) to identify customers that are similar to existing homebuyers, using features store from VinID. I also engineered some new features that was considered of high importance by the model. I learnt how to formalize business questions into data science problems, to diagnose the model’s performance, and to automate steps in the machine learning pipelines to facilitate experimentations. This was also my first exposure to the imbalanced learning problem.

  • (2022) VinID Notification Interaction Prediction (catboost): I used a catboost model to identify customers that are likely to interact with a notification. I also engineered some new features. I learnt how to quickly experiment with different model configurations and feature combinations.

  • (2021) VinID Winmart holiday sales prediction (Prophet): I attempted to predict 2022 Tet holiday item-level sales using the Facebook’s Prophet library. The model was unsuccessful due to the lack of representative data, as the 2021 data was heavily skewed by the COVID-19 pandemic. This is my first exposure to predictive analytics and time series problems.  

Analytics Projects:

  • (2024) Onehousing Customer Journey Analysis: We analyzed the common paths (each step is a feature on the site) customers took after entering our website. We learnt that there was not a clear common path due to a lack of internal links between pages.

  • (2023) Onehousing Non-Listing Content Problem: We explored the behavior of organic users (i.e, they found our website via Google) and attempted to find patterns that would identify high-likelihood house buyers. I led the initiative along with two other analysts, proposed ideas to track the behavior, proposed a metric that corresponded with high retention, and did the early exploratory analysis. I also informed the data tracking template and data warehouse design for this problem.

  • (2022) Onehousing x VinID Growth Project: We linked customer attributes (demographic, socio-economic, spatial data, etc.) to real estate purchasing behavior to identify key customer segments. My team and I provided early insights on customer profile informing acquisition strategy, I proposed data collection and experimentation method, and built a dashboard to monitor key project metrics.  

Data Engineering:

  • (2023) Onehousing Alert Engine (Python, SQL, dbt, Airflow): I designed and developed a system that automatically detects mismatched records between two data sources and sends alerts to the Operations and Sales teams. This helps significantly reduce the time spent on data reconciliation. I learnt to thinking in systems.

  • (2023) Onehousing CEO Daily Update Bot (Python, SQL, dbt, Airflow): I built a script that sends daily Slack updates to the CEO about real estate deals in OneHousing. I learnt how to work with the Slack and Tableau API, as well as PyODBC.

  • (2023) VinID Voucher, Notification, Ticketing Datamart (dimensional datawarehouse design): I designed and built dimensional datamarts for the various VinID business functions, which contain data about vouchers, app notification, and ticketing. I learnt a lot about dimensional modelling and data warehouse design in the process.

  • (2022) VinID Data Platform Migration (BigQuery -> Dremio): We changed our data platform and query engine from Bigquery to Dremio. I re-wrote and optimized SQL queries and data pipelines to fit the new platform. I learnt ELT best practices, most of my subsequent pipelines adhered to dbt style guide.

  • All dashboards/reports/models data pipelines (SQL, dbt, Airflow): I built data processing pipelines (partially or entirely) for all projects that I was involved in. I learnt how write readable code and manage my code with Git.  

Dashboards:

  • (2024) Onehousing Marketing Dashboard (Power BI): We built executive dashboard for high-level metrics (acqured users, MAU, lead funnel, etc.) of the OneHousing website, with detailed analytical views for specific marketing functions. I learnt how to work with Power BI. 

  • (2023) Onehousing Online-to-Offline Dashboard (Tableau): I built an operational dashboard to monitor detailed lead generation and conversion activities by salespeople.

  • (2022) VinID Ticketing Dashboard (Superset): I built an operational dashboard about on-app ticket sales (concerts, football matches, recreational parks etc.) and conversion funnel.

  • (2021) VinID OneView (web-based & Looker Studio)
    : We built a centralized business intelligence platform, containing company-wise key metrics for C-level executives (MTU, MAU, etc.). I built 2 high-level dashboards in 2021: Merchant (about voucher metrics such as claims/redeems) and Product (about north-star app metrics) and I took over as sole maintainer of all related data pipelines in 2022 (200+ tables). I learnt how to debug and track down data errors in a complex pipelines.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • What’s it like to study in a Master’s?
  • Renting GPUs to do Deep Learning homework
  • Quick introduction to Federated Learning