Sr. AI Software Engineer

OneOncology

1 day ago

Full-time

Remote

United States

OneOncology is positioning community oncologists to drive the future of medical care through a patient-centric, physician-driven, and technology-powered model to help improve the lives of everyone living with cancer and other diseases. Our team is bringing together leaders to the market place to help drive OneOncology’s mission and vision.

Why join us? This is an exciting time to join OneOncology. Our values-driven culture reflects our startup enthusiasm supported by industry leaders in oncology, urology, technology, and finance. We are looking for talented and highly-motivated individuals who demonstrate a natural desire to improve and build new processes that support the meaningful work of independent physicians and the patients they serve.

Job Description:

We are seeking a Senior AI Engineer to lead the design, development, and production deployment of advanced AI systems on Azure and Databricks, reporting to the Manager, Software Engineering & AI. You will own the technical direction for agentic AI platforms, Retrieval Augmented Generation (RAG) architectures, and large-scale model training and serving across the organization. This is a senior individual-contributor role with significant architectural authority: you will set standards, mentor engineers, partner closely with product and clinical stakeholders, and be accountable for the reliability, cost, and clinical safety of the AI systems you deliver. Success in this role requires deep machine learning expertise, proven production experience on Azure and Databricks, and strong judgment to make sound trade-offs in a regulated healthcare environment.

Responsibilities:

Architect and lead end-to-end agentic AI systems enabling autonomous decision-making, multi-step planning, and tool use; design planning, reasoning, memory, and orchestration layers with strong evaluation, observability, and guardrails

Drive performance, safety, and cost optimization for agents operating in complex, dynamic environments

Own production RAG architecture across structured and unstructured oncology data; optimize for retrieval quality, latency, and grounding

Design scalable indexing, hybrid retrieval strategies, and evaluation frameworks; lead fine-tuning, adapters, and prompt engineering to improve LLM performance

Lead training and deployment of LLMs and deep learning models on Azure ML and Databricks, including distributed GPU training

Define scalable data preprocessing, feature engineering, and labeling pipelines; drive optimization across hyperparameters, architecture, quantization, and serving

Own full model lifecycle: versioning, evaluation, monitoring, drift detection, and retraining

Set technical direction for AI workloads across Azure and Databricks; design secure, scalable, HIPAA-compliant data and ML pipelines

Establish standards for cost, performance, reliability, and capacity planning, including FinOps for AI infrastructure

Define and enforce MLOps practices including CI/CD, automated evaluation, A/B testing, canary deployments, and incident response

Build reusable platform tooling to accelerate safe and efficient AI feature delivery

Mentor engineers, lead design reviews, and elevate code quality and evaluation rigor

Collaborate cross-functionally to deliver AI solutions aligned with clinical and business outcomes

Translate ambiguity into clear technical roadmaps and communicate trade-offs to executive, technical, and clinical stakeholders

Shape organizational AI strategy, standards, and responsible AI practices

Evaluate emerging models and techniques, driving adoption where valuable

Represent the organization in industry forums, conferences, and partner engagements

Additional responsibilities as assigned to help drive our mission of improving the lives of everyone living with cancer.

Required or Preferred Qualifications (specify within):

Bachelor’s or Master’s in Computer Science, AI, or related field (or equivalent experience)

7+ years software/ML engineering, including 3+ years building and operating production ML/LLM systems

Proven track record delivering AI/ML systems from prototype to scaled production

Deep expertise in LLMs, transformers, fine-tuning, and inference optimization

Expert Python; strong foundations in distributed systems, data engineering, and software design

Required experience with Azure and Databricks (Azure ML, Databricks workflows, MLflow, Spark pipelines)

Hands-on experience with vector databases (e.g., Azure AI Search, Databricks Vector Search, pgvector, FAISS) and RAG patterns

Experience with LLM orchestration and agent frameworks (e.g., LangChain, LlamaIndex, Semantic Kernel)

Strong evaluation practices (offline/online evals, golden datasets, quality metrics for LLM/agent systems)

Excellent problem-solving, communication, and cross-functional collaboration skills

Preferred:

Experience designing and shipping agentic AI systems in production.

Experience implementing RAG over regulated or sensitive data (healthcare, finance, legal).

Experience with MLOps tooling and platform engineering for AI (model registries, feature stores, serving infrastructure, observability).

Experience with healthcare data standards (FHIR, HL7) and operating within HIPAA-compliant environments.

Experience mentoring engineers and leading technical initiatives across teams.

Contributions to open-source AI projects, publications, or patents.

#LI-AN1

#LI-REMOTE

Apply now

Sr. AI Software Engineer

More jobs

Power Platform Developer

Arete Incident Response

Senior Software Engineer, Linux Kernel

Arista Networks