Job Description

About the Role

We’re looking for an AI Researcher focused on training optimization to help us push the efficiency, stability, and scalability of large-scale model training. You’ll work at the intersection of research and systems, developing novel techniques to reduce training cost, accelerate convergence, and improve model quality—while validating ideas through rigorous experiments and publications.This role is ideal for someone who enjoys turning research insights into practical training wins, and who has a track record (or strong ambition) of publishing applied ML research.What You’ll Work OnDesign and evaluate training optimization techniques for large models (e.g. optimization algorithms, schedulers, normalization, curriculum strategies)Improve training efficiency and stability across long runs and large datasetsResearch and implement methods such as:Optimizer and scheduler innovationsMixed-precision, low-precision, and memory-efficient trainingGradient noise reduction, scaling laws, and convergence analysisTraining-time regularization and robustness techniquesRun large-scale experiments, analyze results, and translate findings into actionable improvementsAuthor or co-author research papers, technical reports, or blog postsCollaborate closely with infrastructure and inference teams to ensure training decisions translate to real-world performanceWhat We’re Looking ForStrong background in machine learning research, with emphasis on training dynamics and optimizationExperience training large neural networks (LLMs, multimodal models, or large sequence models)Publication experience in ML venues (e.g. NeurIPS, ICML, ICLR, ACL, EMNLP, COLM, arXiv) or equivalent high-quality open researchSolid understanding of:Optimization theory and practiceBackpropagation, gradient flow, and training stabilityDistributed and large-batch trainingProficiency in Python and modern ML frameworks (PyTorch preferred)Ability to independently design experiments and reason from dataNice to HaveExperience with non-standard architectures (e.g.

RNN variants, long-context models, hybrid systems)Experience optimizing training on GPUs at scale (FSDP, ZeRO, custom kernels)Contributions to open-source ML or research codebasesComfort operating in fast-moving, ambiguous startup environmentsWhy This RoleReal influence over core model training decisionsFreedom to pursue and publish novel researchDirect access to large-scale experiments and real production constraintsA small, senior team that values thinking deeply and shipping thoughtfully

Originally posted on Himalayas

AI Researcher — Training Optimization

Job Description

Open Positions You Might Like

Sign in to apply