We are looking for a hands-on data Architect to design, build, and optimize a cutting-edge data Lakehouse solution from the ground up. This role requires deep technical expertise in Big Data architectures, data modeling, ETL /ELT pipelines, cloud on-prem solutions, and Real-Time analytics.
As part of a fast-paced startup, youll be directly involved in coding, implementing, and scaling the platformnot just designing, but building alongside the team. Youll take ownership of data strategies, governance, and architecture to enable high-performance analytics, AI, and business intelligence.
As part of a fast-paced startup, youll be directly involved in coding, implementing, and scaling the platformnot just designing, but building alongside the team. Youll take ownership of data strategies, governance, and architecture to enable high-performance analytics, AI, and business intelligence.
Requirements:
Key Responsibilities
Hands-On Architecture Development:
Build and deploy a scalable, open-source data Lakehouse integrating structured, semi-structured, and unstructured data.
Design Real-Time and batch data processing pipelines using open-source frameworks (Apache Spark, Flink, Trino, Iceberg, Delta Lake, etc.).
Develop cost-effective, high-performance data Storage strategies (columnar formats: Parquet, ORC).
Implement best practices for data security, governance, access control, and compliance (GDPR, CCPA, etc.).
Ensure seamless data integration across cloud and on-prem environments.
data Engineering ETL Pipelines:
Develop high-performance ETL /ELT pipelines to ingest data from diverse sources (APIs, databases, IoT, logs).
Optimize query performance using indexing, caching, materialized views, and distributed computing.
Key Responsibilities
Hands-On Architecture Development:
Build and deploy a scalable, open-source data Lakehouse integrating structured, semi-structured, and unstructured data.
Design Real-Time and batch data processing pipelines using open-source frameworks (Apache Spark, Flink, Trino, Iceberg, Delta Lake, etc.).
Develop cost-effective, high-performance data Storage strategies (columnar formats: Parquet, ORC).
Implement best practices for data security, governance, access control, and compliance (GDPR, CCPA, etc.).
Ensure seamless data integration across cloud and on-prem environments.
data Engineering ETL Pipelines:
Develop high-performance ETL /ELT pipelines to ingest data from diverse sources (APIs, databases, IoT, logs).
Optimize query performance using indexing, caching, materialized views, and distributed computing.
This position is open to all candidates.












