The Data Scientist is responsible for working with our data science and data engineering teams to develop AI and machine-learinging algorithms to support the Company’s product roadmap and growth strategies.
Work with data engineers to define data collection/reporting requirements, engineering metrics, and aggregates for analysis, ETLs, and automate data pipelines to scale experimentation.
- Analyze various datasets associated with the business problem using statistical & data mining techniques using python scripting language.
- Develop hypotheses from data & domain understanding, and statistically test those hypotheses to provide actionable insights.
- Build data models and infrastructure to version data used for model training.
- Utilize education in machine learning and statistics to find appropriate scientific techniques for a business problem.
- Research previous published work in similar domains, understand how those algorithms are designed. Optimize those algorithms and apply them at scale.
- Research and implement emerging technologies which can be relevant to current problem statement.
- Design natural language processing pipelines using deep neural networks, and other machine learning techniques using python scripting language.
- Develop training & tuning methodologies to train machine learning models in most efficient way.
- Design and implement evaluation strategies for various models on baseline datasets to measure progress on various metrics.
- Build scalable data science pipelines using Big data tools & techniques to efficiently ingest and preprocess large amounts of data before training a machine learning model with it.
- Progressively approach and frame problems with appropriate approximation about the data.
- Develop solution to the business problem by disintegrating the problem into small and specific tasks that can be independently developed and evaluated, in plug and play fashion.
- Enable integrations of various tools and technologies to streamline data science pipelines like amazon web services, python scripting language, PostgreSQL database and apache spark.
- Develop code infrastructure to enable continuous improvement, continuous development pipeline.
- Prepare documentation and visualization for intra team communication and correspondence on project progress.
- Enable and develop computer science/data science best practices and reporting templates.
- Designing and developing codebase with appropriate testing and coding hygiene.
- Utilizing the various capabilities of hardware and software resources to enable efficient processing of data and training of deep neural networks.
- Understanding the architectural design of various machine learning libraries like pytorch, tensorflow, etc to find the best fit for the business problem as well as the hardware resources available.
- Implement deep neural architectures and machine learning algorithms on mathematical level. Tweak these algorithms and architectures according to the problem at hand.
Bachelor Degree is required in Computer Science or Math or Data Science
- Bachelors Degree in Computer Science or Information Systems
- Environment – Office or remote office environment, with minimum travel
- Physical Requirements – Sitting, standing, walking, and using keyboard, mouse, computer and telephone.