About The Company
DeepAware AI (YC S25) is at the forefront of developing secure, efficient, and autonomous infrastructure tailored for the rapidly evolving AI landscape. Focused on creating innovative solutions that optimize data center operations, energy consumption, and security, DeepAware AI combines cutting-edge research with practical engineering to address some of the most pressing challenges in AI infrastructure management. The company's mission is to build scalable, reliable, and intelligent systems that support the deployment and sustainability of AI technologies worldwide. With a team of passionate professionals and a culture that fosters innovation, DeepAware AI is committed to making a significant impact on the future of AI-powered infrastructure.
About The Role
As an AI/ML Engineer at DeepAware AI, you will play a pivotal role in designing, developing, and deploying advanced machine learning models that power our next-generation Data Center Infrastructure Management (DCIM) platform. Your work will focus on leveraging reinforcement learning to optimize GPU workload scheduling and energy efficiency, implementing anomaly detection systems to enhance security and prevent failures, and developing algorithms that contribute to cost savings and operational excellence. You will collaborate closely with cross-functional teams including data engineers, infrastructure specialists, and robotics experts to create scalable solutions that address real-world AI infrastructure challenges. This role offers an exciting opportunity to work at the intersection of AI, energy systems, and robotics, contributing to innovative solutions that have a tangible impact on the sustainability and reliability of AI data centers.
Qualifications
-
Strong background in machine learning, with proven experience in reinforcement learning techniques
-
Proficiency in Python programming
-
Experience with deep learning frameworks such as PyTorch or TensorFlow
-
Hands-on experience with distributed training and deployment in production environments
-
Knowledge of energy systems, scheduling algorithms, or operations research is a plus
-
Ability to work effectively in a fast-paced startup environment with a proactive ownership mindset
-
Excellent problem-solving skills and collaborative spirit
-
Familiarity with cloud computing platforms and containerization technologies
Responsibilities
-
Develop and refine reinforcement learning models tailored for GPU workload placement and power optimization
-
Implement real-time anomaly detection pipelines for threat detection and failure alerts
-
Collaborate with data engineers to ensure the availability of high-quality, production-ready datasets
-
Benchmark machine learning models against industry standards and integrate them into production systems
-
Contribute to the design and implementation of scalable architecture and deployment strategies for AI infrastructure
-
Stay updated with the latest advancements in AI, reinforcement learning, and infrastructure optimization
-
Participate in code reviews, documentation, and knowledge sharing within the team
Benefits
-
Competitive base salary range: $130,000 - $170,000 per year
-
Flexible work arrangements, including remote work for exceptional candidates
-
Health, dental, and vision insurance coverage
-
Generous paid time off and holidays
-
Opportunities for professional growth and development
-
Collaborative and innovative work environment
-
Participation in cutting-edge projects impacting AI infrastructure sustainability
Equal Opportunity
DeepAware AI is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status. We believe that diverse perspectives foster innovation and drive our mission forward.