← Back Home

INDs Discovery Using Akka

Github Repository Unavailable Project Report

INDs Finder Using Akka is a distributed data profiling system designed to compute Inclusion Dependencies (INDs) across large scale datasets using the Akka actor framework. The project focuses on scalable distributed processing, asynchronous task scheduling, and efficient dependency discovery for high volume attribute sets.

Responsibilities

Designed and implemented a distributed system for computing Inclusion Dependencies (INDs) across multiple datasets using Akka.

Developed actor based workflows using the Akka Actor Model to enable concurrent and asynchronous task execution. Tasks shared to Worker actors once they join the Master actor based on pull propagation.

Optimized scalability and performance through partitioning strategies - Consistent Hash Ring, Naive approaches like sharing columns failed with OOM errors.

Worked on improving execution efficiency, concurrency handling, and distributed coordination.

Technologies & Domains

Akka Actor Model Distributed Systems Concurrency Asynchronous Processing IND Detection Data Profiling Scalable Computing Parallel Processing Master Slave Architecture