← Back Home
INDs Discovery Using Akka
INDs Finder Using Akka is a distributed data profiling system
designed to compute Inclusion Dependencies (INDs) across large scale datasets
using the Akka actor framework. The project focuses on scalable
distributed processing, asynchronous task scheduling,
and efficient dependency discovery for high volume attribute sets.
Responsibilities
Designed and implemented a distributed system
for computing Inclusion Dependencies (INDs)
across multiple datasets using Akka.
Developed actor based workflows using the Akka Actor Model
to enable concurrent and asynchronous task execution. Tasks shared to Worker actors once
they join the Master actor based on pull propagation.
Optimized scalability and performance through
partitioning strategies - Consistent Hash Ring, Naive approaches like sharing columns failed
with OOM errors.
Worked on improving execution efficiency,
concurrency handling, and distributed coordination.
Technologies & Domains
Akka
Actor Model
Distributed Systems
Concurrency
Asynchronous Processing
IND Detection
Data Profiling
Scalable Computing
Parallel Processing
Master Slave Architecture