Platform Engineer

Job ID: 14652

Date: 11 Jul 2025

Location:

Dholera, GJ, IN

Location:

Dholera, GJ, IN

Department: AI.Digital

Business: Semifab

About The Business -

Tata Electronics Private Limited (TEPL) is a greenfield venture of the Tata Group with expertise in manufacturing precision components.

Tata Electronics (a wholly owned subsidiary of Tata Sons Pvt. Ltd.) is building India’s first AI-enabled state-of-the-art Semiconductor Foundry. This facility will produce chips for applications such as power management IC, display drivers, microcontrollers (MCU) and high-performance computing logic, addressing the growing demand in markets such as automotive, computing and data storage, wireless communications and artificial intelligence.

Tata Electronics is a subsidiary of the Tata group. The Tata Group operates in more than 100 countries across six continents, with the mission 'To improve the quality of life of the communities we serve globally, through long term stakeholder value creation based on leadership with Trust.’

Job Responsibilities -

• Architect and implement a scalable, offline Data Lake for structured, semi-structured, and unstructured data in an on-premises, air-gapped environment.
• Collaborate with Data Engineers, Factory IT, and Edge Device teams to enable seamless data ingestion and retrieval across the platform.
• Integrate with upstream systems like MES, SCADA, and process tools to capture high-frequency manufacturing data efficiently.
• Monitor and maintain system health, including compute resources, storage arrays, disk I/O, memory usage, and network throughput.
• Optimize Data Lake performance via partitioning, deduplication, compression (Parquet/ORC), and implementing effective indexing strategies.
• Select, integrate, and maintain tools like Apache Hadoop, Spark, Hive, HBase, and custom ETL pipelines suitable for offline deployment.
• Build custom ETL workflows for bulk and incremental data ingestion using Python, Spark, and shell scripting.
• Implement data governance policies covering access control, retention periods, and archival procedures with security and compliance in mind.
• Establish and test backup, failover, and disaster recovery protocols specifically designed for offline environments.
• Document architecture designs, optimization routines, job schedules, and standard operating procedures (SOPs) for platform maintenance.
• Conduct root cause analysis for hardware failures, system outages, or data integrity issues.
• Drive system scalability planning for multi-fab or multi-site future expansions.

Essential Attributes (Tech-Stacks) -

• Hands-on experience designing and maintaining offline or air-gapped Data Lake environments.
• Deep understanding of Hadoop ecosystem tools: HDFS, Hive, Map-Reduce, HBase, YARN, zookeeper and Spark.
• Expertise in custom ETL design, large-scale batch and stream data ingestion.
• Strong scripting and automation capabilities using Bash and Python.
• Familiarity with data compression formats (ORC, Parquet) and ingestion frameworks (e.g., Flume).
• Working knowledge of message queues such as Kafka or RabbitMQ, with focus on integration logic.
• Proven experience in system performance tuning, storage efficiency, and resource optimization.

Qualifications -

• BE/ ME in Computer science, Machine Learning, Electronics Engineering, Applied mathematics, Statistics.

Desired Experience Level -

• 4 Years relevant experience post Bachelors
• 2 Years relevant experience post Masters
• Experience with semiconductor industry is a plus

About The Business -

Tata Electronics Private Limited (TEPL) is a greenfield venture of the Tata Group with expertise in manufacturing precision components.

Tata Electronics (a wholly owned subsidiary of Tata Sons Pvt. Ltd.) is building India’s first AI-enabled state-of-the-art Semiconductor Foundry. This facility will produce chips for applications such as power management IC, display drivers, microcontrollers (MCU) and high-performance computing logic, addressing the growing demand in markets such as automotive, computing and data storage, wireless communications and artificial intelligence.

Tata Electronics is a subsidiary of the Tata group. The Tata Group operates in more than 100 countries across six continents, with the mission 'To improve the quality of life of the communities we serve globally, through long term stakeholder value creation based on leadership with Trust.’

Job Responsibilities -

• Architect and implement a scalable, offline Data Lake for structured, semi-structured, and unstructured data in an on-premises, air-gapped environment.
• Collaborate with Data Engineers, Factory IT, and Edge Device teams to enable seamless data ingestion and retrieval across the platform.
• Integrate with upstream systems like MES, SCADA, and process tools to capture high-frequency manufacturing data efficiently.
• Monitor and maintain system health, including compute resources, storage arrays, disk I/O, memory usage, and network throughput.
• Optimize Data Lake performance via partitioning, deduplication, compression (Parquet/ORC), and implementing effective indexing strategies.
• Select, integrate, and maintain tools like Apache Hadoop, Spark, Hive, HBase, and custom ETL pipelines suitable for offline deployment.
• Build custom ETL workflows for bulk and incremental data ingestion using Python, Spark, and shell scripting.
• Implement data governance policies covering access control, retention periods, and archival procedures with security and compliance in mind.
• Establish and test backup, failover, and disaster recovery protocols specifically designed for offline environments.
• Document architecture designs, optimization routines, job schedules, and standard operating procedures (SOPs) for platform maintenance.
• Conduct root cause analysis for hardware failures, system outages, or data integrity issues.
• Drive system scalability planning for multi-fab or multi-site future expansions.

Essential Attributes (Tech-Stacks) -

• Hands-on experience designing and maintaining offline or air-gapped Data Lake environments.
• Deep understanding of Hadoop ecosystem tools: HDFS, Hive, Map-Reduce, HBase, YARN, zookeeper and Spark.
• Expertise in custom ETL design, large-scale batch and stream data ingestion.
• Strong scripting and automation capabilities using Bash and Python.
• Familiarity with data compression formats (ORC, Parquet) and ingestion frameworks (e.g., Flume).
• Working knowledge of message queues such as Kafka or RabbitMQ, with focus on integration logic.
• Proven experience in system performance tuning, storage efficiency, and resource optimization.

Qualifications -

• BE/ ME in Computer science, Machine Learning, Electronics Engineering, Applied mathematics, Statistics.

Desired Experience Level -

• 4 Years relevant experience post Bachelors
• 2 Years relevant experience post Masters
• Experience with semiconductor industry is a plus

Platform Engineer

Platform Engineer

Learn More about TATA Electronics

Learn More About Tata Electronics