Cloudera logo

Staff Software Engineer - Agentic AI Platform

Cloudera
9 days ago
Full-time
Remote
United States

Business Area:

Engineering

Seniority Level:

Mid-Senior level

Job Description: 

Our Data Services Pillar is the heart of data innovation. We don’t just work with technology; we build it. Our mission is to empower data practitioners by creating seamless, enterprise-grade experiences for data engineering, warehousing, streaming, operational databases, and AI.

Join Cloudera’s Machine Learning Platform team as a Staff Software Engineer. You will be a core technical leader responsible for designing, building, and delivering our next-generation AI and MLOps platform. You will design, build, and deliver an enterprise-grade platform enabling enterprises to create, deploy, and orchestrate agentic applications and multi-agent platforms using foundation models with enterprise data at scale in a hybrid cloud environment.

As a Staff-level engineer, you will drive technical architecture, advocate for engineering best practices, and collaborate closely with cross-functional teams (Product, Design, Frontend, and Field Engineering) to enhance developer velocity and platform agility.

Our Core Tech Stack:

  • Backend: Python, gRPC, SQL

  • Infrastructure: Kubernetes, Knative, Keda, Docker, Hybrid Cloud (AWS, GCP, Azure, OpenShift)

  • GenAI & ML: LangChain, CrewAI, LlamaIndex, Closed and open source LLMs

  • Data & Vector DBs: Quadrant, Pinecone or Milvus, Redis, Postgres

In this role you will..

  • Architect and Build: Design, code, and implement elegant, scalable, enterprise-quality platform and application services / SDKs that support autonomous agents, multi-agent collaboration, and advanced tool-use capabilities.

  • Implement Agentic Evaluation & Observability: Build automated evaluation pipelines and trajectory tracking to continuously bench, audit, and evaluate agent reasoning loops, tool-calling accuracy, and guardrail compliance.

  • Architect Memory Management Systems: Develop and optimize robust stateful memory architectures for enterprise agents, handling short-term context window strategies, long-term semantic/episodic memory persistence, and secure cross-session state management.

  • Enable RAG use cases:

  • Lead by Example: Advocate for and establish engineering best practices, coding standards, and rigorous system design methodologies.

  • Enhance Platform Velocity: Work to enhance developer velocity, framework abstraction, and team agility across the AI platform ecosystem.

  • Collaborate Cross-Functionally: Build strong relationships and collaborate with platform and UI engineers, data scientists, quality engineers, UX designers, Product Management, Field Engineering, and external enterprise partners.

  • Cross-Functional Collaboration: Build strong technical relationships with platform engineers, UI developers, UX designers, and Product Management to deliver cohesive user experiences.

  • Mentorship: Act as a senior technical leader on the team, mentoring junior engineers and actively contributing to a culture of engineering excellence and craftsmanship.

What You Bring (Required Experience):

  • Experience: 8+ years of software engineering experience building scalable backend microservices and distributed systems.

  • Core Languages: Deep expertise in Python Go, Java, or C#/C++, along with gRPC and SQL.

  • Cloud & Containerization: Extensive hands-on experience designing and developing microservices on Kubernetes, plus expertise in at least one major cloud platform (AWS, GCP, or Microsoft Azure).

  • Agentic Frameworks & Evaluation: Proven experience with open-source agentic frameworks((e.g., LangGraph, AutoGen, CrewAI) and evaluation tooling (LangSmith/Langfuse or Phoenix).

  • Agentic & Generative AI Mastery:  Proven experience building applications utilizing advanced LLM orchestration paradigms (e.g., ReAct loops, planning/reflection frameworks, multi-agent systems) alongside standard foundational models, prompt engineering, and RAG architectures using vector databases (e.g., Pinecone, Milvus).

  • Memory & State Infrastructure: Deep understanding of caching layers, relational databases, and vector databases optimized for agentic state persistence and memory retrieval.

  • System Design: Demonstrated ability to go deep into complex distributed systems, crafting both high-level architecture and low-level technical designs.

  • Education: BS/MS in Computer Science, Software Engineering, or a related field (or equivalent professional experience).

  • Soft Skills: Self-driven with a strong sense of ownership, paired with excellent written and verbal communication skills.

Bonus Points (Preferred Experience)

  • AI/ML Orchestration: Experience with ML orchestration and serving software ( KServe, Knative).

  • Big Data: Familiarity with distributed data technologies like Apache Spark, Hive, etc.

  • Full-Stack Exposure: Experience with React, HTML, and CSS to better collaborate with UI teams.

  • Data Science Ecosystem: Experience building applications alongside data scientists using tools like Python, TensorFlow, PyTorch, MLflow, or R.

  • Agile Environments: A proven track record of collaborating with agile teams across geographically dispersed and remote locations.


This role is not eligible for immigration sponsorship or relocation
 

What you can expect from us:

  • Generous PTO Policy 

  • Support work life balance with Unplugged Days

  • Flexible WFH Policy 

  • Mental & Physical Wellness programs 

  • Phone and Internet Reimbursement program 

  • Access to Continued Career Development 

  • Comprehensive Benefits and Competitive Packages 

  • Paid Volunteer Time

  • Employee Resource Groups

EEO/VEVRAA

#LI-BV1

#LI-REMOTE