My work experience
Table of Contents
Morgan Stanley DAL Project
At Morgan Stanley, our Data Access Layer (DAL) relies heavily on a bitemporal time-series datastore. This is essential because the firm's risk systems must be able to reproduce any past calculation exactly as it was known at that moment in time. This level of historical accuracy is critical for audit, regulatory reporting, and replaying past risk numbers.
Traditional time-series databases only track one timeline—when the data was written. But in finance, this is not enough. We need to track two independent timelines:
- Valid Time (Business Time): When the data logically applies. Example: “This delta value is valid for 2 November.”
- Transaction Time (System Time): When the system actually stored or learned the data. Example: “We wrote this data on 10 November.”
These two timelines can differ, and both must be stored. For example, on 2 November we may calculate a delta value and store it (valid time = 2 Nov, transaction time = 2 Nov). If on 10 November we discover the 2 November calculation was wrong and reinsert the corrected number, the valid time remains 2 Nov but the transaction time becomes 10 Nov. The system must keep both versions—what we originally believed, and the corrected value. A single-temporal datastore would overwrite the old value, losing the historical truth.
My project involved introducing a new single-temporal time-series datastore to complement our existing bitemporal datastore. This required extending multiple layers of the DAL pipeline. The first task was implementing new serialization and deserialization logic using Protobuf so that the broker could distinguish a new putTimeSeries command from a traditional put command.
We introduced a custom @timestampannotation, and required all timeseries entities to extend a TimeSeries trait. This allowed the serialization layer to correctly identify timeseries writes and package them into the new protobuf command format.
On the broker side, we added new routing logic to map these commands to dedicated execution actions. These actions interact with our custom timeseries executor, which connects to the SingleStore database and performs the write operations. This end-to-end integration enabled the DAL to support a new type of time-series storage alongside our existing bitemporal engine.
Binance Internship
During my internship at Binance, I primarily worked on a knowledge graph system designed to monitor fraudulent transactions and identify relationships between user accounts through vertices and edges. The system stores to and from edges within a SQL database and queries the graph dynamically based on user activity. For simple two-hop queries, SQL was sufficient, but for deeper graph traversals, we leveraged a specialized graph database called Nebula.
My main responsibility was to integrate AI-driven analysis into the graph to assist risk analysts. The AI module was tasked with identifying isolated or abnormal nodes and providing a high-level overview of the knowledge graph. To support this, I implemented an asynchronous task-handling system using PriorityBlockingQueue. Tasks were queued and continuously polled by a running thread, which created runnable jobs and dispatched them to a thread pool for execution. Once a task completed, the results were persisted and made available to the frontend, which retrieved them through periodic polling.
In addition to the AI integration, I also contributed to the implementation of authorization for graph snapshots. Each graph snapshot could be saved and accessed based on user roles — such as viewer, editor, or admin — defined in a SQL table that mapped users to their respective permissions. This ensured that users could only create, edit, or view snapshots according to their assigned roles.
Another feature I developed was manual node creation within the graph, allowing users to add new nodes directly through the frontend. The system verified whether a node already existed in the database before insertion. I also implemented CSV parsing for bulk import of nodes and edges, which included validation to ensure that all referenced nodes existed in the database before being added to a snapshot.
Morgan Stanley Internship
During my internship at Morgan Stanley, I contributed to the modernization of an existing risk control dashboard system. Previously, the team used a QlikView dashboard that relied on Autosys jobs to pull data from a kdb+ database, all hosted on on-premise servers. My task was to help migrate this setup to a cloud-based environment using Snowflake and Power BI.
I developed a new Power BI dashboard to visualize trade rejection and risk control data, implementing row-level entitlements to restrict data visibility based on user attributes such as region or country. This ensured that users could only access data relevant to their specific roles or markets.
As part of the migration effort, I rewrote the legacy kdb+ scripts in SQL for Snowflake, allowing Power BI to directly connect to the Snowflake data warehouse. I also created Autosys jobs to call stored procedures on a daily schedule, automatically loading the latest trade rejection data into Snowflake for real-time dashboard updates.
This migration project significantly streamlined the data workflow by moving analytics from on-premise systems to the cloud, improving maintainability, scalability, and accessibility for the global risk management team.
Illumina Internship
During my internship at Illumina, I worked on the indexing and search system that powers metadata retrieval for laboratory workflows. The system relies on Elasticsearch and RabbitMQ to track changes across various entities such as samples, projects, steps, files, and containers.
The search indexer listens to multiple RabbitMQ queues —sampleIndexingQueue,stepIndexingQueue,projectIndexingQueue,fileIndexingQueue, andcontainerIndexingQueue. Each message includes an entityType, which determines which index service (e.g.,analyteIndexService,projectIndexService) should process the update. A separateindexingQueue is used for operations like index creation and mapping updates.
Each index service handles document updates in Elasticsearch. Bulk operations are used for initial indexing or large refreshes, whileupdateByQuery is used for modifying existing documents. Update behavior depends on field type: non-list fields are overwritten, while list fields require selective add/remove handling. One technical challenge we faced was resolving mapping explosion caused by deeply nested structures and custom UDFs, which required careful restructuring of index mappings.To resolve this, we make use of flattened fields provided by elasticsearch.
To support fast and flexible search, I helped enhance a custom search engine built on Elasticsearch rather than SQL. Elasticsearch allowed much faster queries for nested data through inverted indexes and boolean queries combiningterm,wildcard, andrange queries.
Concurrency was another major area of focus. Elasticsearch uses optimistic locking based on version numbers, which can lead to version conflicts during bulk indexing or concurrent updates. To handle this, a retry queue is used for failed updateByQuery operations, increasing the retry count with each failure and discarding tasks after five attempts. During retries, a pessimistic locking mechanism was applied, ensuring that only one process updated a specific set of documents at a time by pre-querying the affected document IDs.
J&T Internship
During my internship at J&T Express, I worked on several internal systems that supported part-time workforce operations and logistics configuration. One of my main responsibilities was enhancing the Part-Time Management System, which handled part-timer onboarding, supplier binding, job allocation, and check-in / check-out tracking.
I also developed a Country Configuration module that managed the list of countries, provinces, and cities that J&T delivers to. For this feature, I built the backend endpoints using Spring Boot, handling full CRUD operations and ensuring clean data relationships without relying on heavy SQL joins.
One of the major technical challenges I solved involved concurrency on the Order Management page. Each country, province, and city stored an orderId, which determined its display sequence. The orderId is always assigned as the previous maximum value plus one. However, because multiple users could insert data at the same time, this occasionally caused duplicate orderId values, which must be unique.
To prevent this race condition, I implemented a distributed semaphore lock using Redis. When a user begins an insert operation, the system writes a lock key into Redis. Other users who attempt to insert data will detect the lock and temporarily sleep until the lock is released. Once freed, they acquire the lock and safely compute the next orderId before inserting their record. This ensures that all services in the cluster coordinate consistently and no duplicate ordering values occur.
Our schema design is that all countries, provinces and cities are stored within a single SQL table. This is because is the dataset for countries, provinces, and cities is relatively small and stable. By storing this information all in one table, we avoid expensive joins, the system achieves faster read performance. Since orderId is not a primary key, SQL does not help us handle concurrency, hence the need for manual locking.
Oceanus Internship
During my internship at Oceanus, I developed two proof-of-concept solutions for the finance and customer service departments. The first project focused on streamlining financial workflows through CSV import and export functionality. The finance team frequently handled transaction records manually, so I built a system that allowed them to easily upload or download CSV files connected to the Prisma-managed MySQL database.
To support this, I used the papaparse library to convert CSV data into JSON during imports and JSON back into CSV for exports. For imports, the system parsed the CSV file, validated the transformed JSON records, and inserted them into the MySQL database. For exports, the application retrieved the relevant records, converted them to CSV format, and generated a downloadable file for the finance team. This significantly reduced manual data entry and improved the accuracy of transaction tracking.
The second project involved building a live chat application for the customer service department. I implemented real-time messaging using Pusher WebSockets and stored all chat messages in a MySQL database for auditability and future retrieval. The application also supported web push notifications, allowing agents to receive alerts whenever new messages arrived. Together, these features provided a smoother and more responsive communication channel between customers and support staff.