OneOncology logo

Sr. AI Software Engineer

OneOncology
1 day ago
Full-time
Remote
United States

OneOncology is positioning community oncologists to drive the future of medical care through a patient-centric, physician-driven, and technology-powered model to help improve the lives of everyone living with cancer and other diseases. Our team is bringing together leaders to the market place to help drive OneOncology’s mission and vision.

Why join us? This is an exciting time to join OneOncology. Our values-driven culture reflects our startup enthusiasm supported by industry leaders in oncology, urology, technology, and finance. We are looking for talented and highly-motivated individuals who demonstrate a natural desire to improve and build new processes that support the meaningful work of independent physicians and the patients they serve.

Job Description:

We are seeking a Senior AI Engineer to lead the design, development, and production deployment of advanced AI systems on Azure and Databricks, reporting to the Manager, Software Engineering & AI. You will own the technical direction for agentic AI platforms, Retrieval Augmented Generation (RAG) architectures, and large-scale model training and serving across the organization. This is a senior individual-contributor role with significant architectural authority: you will set standards, mentor engineers, partner closely with product and clinical stakeholders, and be accountable for the reliability, cost, and clinical safety of the AI systems you deliver. Success in this role requires deep machine learning expertise, proven production experience on Azure and Databricks, and strong judgment to make sound trade-offs in a regulated healthcare environment. 

 

Responsibilities:  

  • Architect and lead end-to-end agentic AI systems enabling autonomous decision-making, multi-step planning, and tool use; design planning, reasoning, memory, and orchestration layers with strong evaluation, observability, and guardrails 

  • Drive performance, safety, and cost optimization for agents operating in complex, dynamic environments 

  • Own production RAG architecture across structured and unstructured oncology data; optimize for retrieval quality, latency, and grounding 

  • Design scalable indexing, hybrid retrieval strategies, and evaluation frameworks; lead fine-tuning, adapters, and prompt engineering to improve LLM performance 

  • Lead training and deployment of LLMs and deep learning models on Azure ML and Databricks, including distributed GPU training 

  • Define scalable data preprocessing, feature engineering, and labeling pipelines; drive optimization across hyperparameters, architecture, quantization, and serving 

  • Own full model lifecycle: versioning, evaluation, monitoring, drift detection, and retraining 

  • Set technical direction for AI workloads across Azure and Databricks; design secure, scalable, HIPAA-compliant data and ML pipelines 

  • Establish standards for cost, performance, reliability, and capacity planning, including FinOps for AI infrastructure 

  • Define and enforce MLOps practices including CI/CD, automated evaluation, A/B testing, canary deployments, and incident response 

  • Build reusable platform tooling to accelerate safe and efficient AI feature delivery 

  • Mentor engineers, lead design reviews, and elevate code quality and evaluation rigor 

  • Collaborate cross-functionally to deliver AI solutions aligned with clinical and business outcomes 

  • Translate ambiguity into clear technical roadmaps and communicate trade-offs to executive, technical, and clinical stakeholders 

  • Shape organizational AI strategy, standards, and responsible AI practices 

  • Evaluate emerging models and techniques, driving adoption where valuable 

  • Represent the organization in industry forums, conferences, and partner engagements 

  • Additional responsibilities as assigned to help drive our mission of improving the lives of everyone living with cancer. 

 

Required or Preferred Qualifications (specify within): 

  • Bachelor’s or Master’s in Computer Science, AI, or related field (or equivalent experience)  

  • 7+ years software/ML engineering, including 3+ years building and operating production ML/LLM systems  

  • Proven track record delivering AI/ML systems from prototype to scaled production  

  • Deep expertise in LLMs, transformers, fine-tuning, and inference optimization  

  • Expert Python; strong foundations in distributed systems, data engineering, and software design  

  • Required experience with Azure and Databricks (Azure ML, Databricks workflows, MLflow, Spark pipelines)  

  • Hands-on experience with vector databases (e.g., Azure AI Search, Databricks Vector Search, pgvector, FAISS) and RAG patterns  

  • Experience with LLM orchestration and agent frameworks (e.g., LangChain, LlamaIndex, Semantic Kernel)  

  • Strong evaluation practices (offline/online evals, golden datasets, quality metrics for LLM/agent systems)  

  • Excellent problem-solving, communication, and cross-functional collaboration skills 

Preferred: 

  • Experience designing and shipping agentic AI systems in production. 

  • Experience implementing RAG over regulated or sensitive data (healthcare, finance, legal). 

  • Experience with MLOps tooling and platform engineering for AI (model registries, feature stores, serving infrastructure, observability). 

  • Experience with healthcare data standards (FHIR, HL7) and operating within HIPAA-compliant environments. 

  • Experience mentoring engineers and leading technical initiatives across teams. 

  • Contributions to open-source AI projects, publications, or patents. 

#LI-AN1

#LI-REMOTE