Hello, I am
KUNAL MISHRA
I am a software developer and machine learning engineer. With 3+ years of experience in the field, I have gained expertise in a wide range of technologies. My portfolio showcases a selection of my work, including projects that demonstrate my abilities in building and deploying machine learning models and developing software applications.
My journey is more than coding; it's about innovatively bridging machine learning and software development to forge new paths in technology.
Work Experience 🧑💻
Machine Learning Software Engineer
Hometap (May 2023 - Dec 2023)
Utilized Apache Spark and Hadoop to gather historical data on home equity investments and trained a predictive model using Xgboost to increase the accuracy of risk assessment to 93%
Gathered user interaction data using Pandas and Numpy library from the fintech platform including user profiles, investment transactions, and engagement metrics
Worked with the frontend team to enhance user experience and increased session time by 10% using A/B testing
Senior Machine Learning Engineer
Applied Roots (Jun 2020 - Dec 2021)
Assignment Plagiarism Detector: Created an LLM model using PyTorch from ground up to compare similarities between multiple notebooks. Embedded each document in a 300-dim vector using Doc2Vec
Created ETL pipeline for extracting information from 500,000 ipython notebooks and trained the model using Autoencoders training algorithm
The False positive rate was reduced to 18% and inference latency to compare two notebooks was reduced to 3 seconds
Containerized the app using Docker, and deployed the application as a scalable API using AWS ECS and AWS S3 , AWS ELB and Amazon RDS for managed database services
Machine Learning Engineer
Applied Roots (Apr 2019 - May 2020)
• Leveraged Apache Airflow, orchestrated a data pipeline for fetching information and transferring it to Snowflake enhancing data reliability and accessibility
• Finally integrated it with EC2 instance for further analysis and processing of user behavioral data
Projects 🚀
Trained Tiny Tales GPT (30 million parameter) model from scratch and deployed it in production for $15
Tiny Tales GPT is a 30 million-parameter language model trained on 1 billion tokens from scratch. The training was done using Distributed Data-Parallel on two A-100 GPUs
After the training is done, an inference script is created to predict the tokens from the trained model given the input context vector.
Developed REST-based API service using Flask framework to interact with the inference service to the end user and deployed the web service in Google Cloud Platform
AWS-Hosted Flask Microservice Ecosystem
Engineered a Flask-driven microservice ecosystem on AWS, leveraging EC2 with AMI for rapid deployment, alongside S3, RDS, ELB, Lambda, SES, and SNS for a resilient, scalable infrastructure.
Image Recognition Model for American Sign Language
Image Recognition Model for American Sign Language: Developed a hybrid CNN and ResNet50 model, optimized for ASL gesture classification, leveraging PyTorch's DDP for multi-GPU training.
Image Processing application with GUI
Developed an advanced, interactive GUI and command-line application using Java SDK and JUnit, supporting extensive image enhancement and manipulation. Integrated command design pattern and object-oriented architecture for scalable, efficient code structure.
Context-based Question Answering model
Build a seq-seq automatic answer-generating model given the context and question on a collection of 100k datapoint using Bidirectional GRUs and fine-tuning.