Research & Open Source

The following section discusses my previous engagements with Social Impact Organizations

Odia Gen AI, Research Collaborator
  • Working on building the Large Language Models (LLMs) and AI agents for Indian Languages
  • I lead a team that has open-sourced more than 1M Instructions of more than 5 Indian Languages
  • I have also released SFT, PEFT (Lora/Qlora) models for Bengali, Hindi, Odia, and Punjabi.
  • Currently working on the creation of better Indian language embedding models and tokenizers
  • Check out our work here
  • Working on couple of Papers :grin: Hope to publish them soon.
DriQ, Research Scientist and Consultant
  • Creating novel AI system to manage sensor data to monitor humans diaper’s state in nursing home.
  • Working in Collaboration with Dr Gregory Dean and his team to deliver potential AI product line up that they can employ at hospitals
  • Some Products include: Deployed XGBoost Model to predict wet and dry state of human along with percentage wetness; Unirary Tract Infection Predictor
  • LLM powered chatbot hosted on their website for customer interaction;
  • Currently our aim is to also release one of the largest Pediatric Urology based instruction sets and Preference dataset
Google Summer of Code Fellow , CERN Switzerland Geant 4 Team, 2022
  • Worked in a team of 3 where I was mentored by 2 CERN Physicists
  • Performance Optimization:
    • Experimented with various parameters (float precisions, event numbers, angles, and energy ranges) to maximize data handling capabilities of the pipeline on an 8GB CPU setup.
    • Successfully restructured Python code into Kubeflow function format for improved efficiency and maintainability.
  • Streamlined Workflow:
    • Implemented a user-friendly, one-click pipeline solution within Kubeflow, abstracting complex workflow details for ease of use.
    • Configured Persistent Memory with EOS in Kubeflow Pipeline, ensuring data integrity and accessibility.
  • Scaling and Resource Efficiency:
    • Adapted the training loop to efficiently handle large datasets (1 TB) on limited hardware resources (8GB CPU), enhancing scalability and reducing computational costs.
  • Hyperparameter Tuning:
    • Integrated Katib Hyperparameter tuning seamlessly into the pipeline, optimizing model performance and resource utilization.
  • Documentation and Knowledge Sharing:
    • Produced well-designed, thoroughly documented code that serves as a valuable resource for users seeking to implement Kubeflow methodologies for diverse workflows.
CORD.AI, Core Founding Member, NLP Community Lead
  • Cord AI is a research community led by a group of students and early career professionals to promote research and professional participation in AI.
  • Our task is to help young students and working professionals enjoy the flavor of research and work on impactful problems.
Save the Children (USA), AI for Social Good [Fight against Online Child Grooming]
  • Currently leading the technical department of an Online Anti Child Grooming system through which young children can be saved from groomers and predators on platforms like Youtube, Instagram, Twitter, Steam, Discord, etc.
  • Working in collaboration with Github, Save the Children Sweden.
  • Recently got a funding!!!