Mayank Kumar Singh

Web development for Autonise AI

Last Updated: 5th Mar 2023

For our startup Autonsie in the edtech domain, we developed its website using Ionic, a cross-platform mobile apps and Progressive Web Apps (PWAs) framework. It was a course management portal along with payment systems and user management system. Currently it is inactive but online at http://autonise.com.

End-to-End Automatic Speech Recognition based on Hybrid CTC/Attention mechanism

Last Updated: 5th May 2021

Reimplemented the paper Hybrid CTC/Attention Architecture for End-to-End Speech Recognition in pure python and PyTorch. Earlier implementation existed on ESPNET but the pre-processing step uses Kaldi which is written in C++. Also the pipeline seemed complicated for people looking for just training and implementing the ASR. The code is available on github.

Web Development

Last Updated: 26th April 2018

The website https://www.primeacademypune.com was created by me using pure html, css and javascript for the client side and Django for the server side. The server was deployed on Amazon Web Services. I also deployed moodle on AWS and hosted it on https://online.primeacademypune.com

Hyperspectral Tissue Image Segmentation and classification

Last Updated: 26th April 2019

Given a tissue Hyperspectral image it is very time consuming for the doctors to annotate the epithilium, stromal and goblet cells. So we tried to automate this process. We used Non Negative Matrix Factorisation (NMF) and Semi Supervised NMF (SSNMF) for dimentionality reduction because the spectra contained too much redundant data. Then using different classifiers like SVM, NN, and Random Forest we tried to classify each and every pixel. Then we moved on to using spatial information by taking windows of different length. This improved the accuracy by 2-3 percent. Using Regularisation in NN we could further increase the accuracy by 2-3 percent.

Diabetic Retinopathy Lesion Segmentation

Last Updated: 26th April 2019

Due to the advances in Medical science it has become possible to detect Diabetic Retinopathy at an early stage. But due to unavailability of the humongous number of doctors required for reliable diagnosis, and the increasing number of Diabetes affected people, there is a high requirement for automation. We tried to approach the challenge of automating the segmentation of different type of lesions like Microaneurysms, Haemorrhage, Soft and Hard Exudates using a new novel architecture of Fusion Net.

CRAFT Text-Detection

Last Updated: 26th April 2019

Traditional Text Detection methods have used word-level bounding boxes for target and getting true positives. Character-Region Awareness For Text-Detection (CRAFT) uses weak-supervision for predicting character level predictions using word-level annotation. This is achieved by first training the model on a Synthetic Dataset and then fine-tuning it on a real dataset and using the transcriptions to get the character predictions.
The full method can be understood in the original paper . I reimplemented the training of CRAFT on github.

Pixel-Link Text Detection

Last Updated: 26th April 2019

Traditional Text-Detection methods have focused on bounding box regression to detect text in real images. This inherently has many problems like curved text, aspect-ratio sensitivity of regression models, density of predictions and so on. Pixel-Link treats text-detection as a pixel level instance segmentation problem with each pixel being clubbed to its neighbouring pixel if joined by a "link" which is also predicted by the model. For more in-depth analysis you can read the paper on Link to Paper. I reimplemented Pixel-Link on github.

Ping - Pong

Last Updated: 26th April 2019

I thought of implementing the basics of reinforcement learning and convolutional neural networks for training a neural network to predict the movement of the bat to reflect back the ball. I implemented the entire graphical interface using basic python and numpy array and the code for the same has been uploaded on git-hub for reference.

Image Stitching

Last Updated: 26th April 2019

The task was to stitch a heavily blurred frames of a video to get the entire image of the tissue because the microscope can only see a limited part of it at a time. The project is really challenging in the aspect that there is some random noise, some smudge on the lens of the microscope, the image is heavily blurred and also the brightness has been autoadjusted in input. A challenging but a fruitfull process for applying the things I have learnt in signals and systems.

Socket Programming

Last Updated: 26th April 2019

Due to severe lack of funds (Well I spent all the money on eating :p) I wanted to use the inbuilt camera in my laptop for my Real Time Face detection project. Also having learnt socket programming recently in my CS 224 minor, I thought why not implement it. So Using python I made a server and client program to send the images as bytes and to reconstruct those images on my PC. I sent the data using UDP as I needed fast response rather than reliability. For increasing the transfer speed and using less bandwidth, I first convert the images to JPEG format and then send the bytes and decode on the other side as a numpy image which saves the bandwidth upto 40 times. The implementation has been uploaded on github.

RSA key encryption

Last Updated: 26th April 2019

Wouldn't it be ideal if nobody stole anything in this world nor had any malicious intent? Well we do not live in such an idealized world. So to Save my projects from all those trying to steal it, I made an RSA file encryption program. The basics are simple, the implementation not so.

How it works??

So suppose Cooper wants to send a message to Murph from some other Galaxy. But there are some aliens trying to intercept this message. So first Cooper tells Murph, "Hey I want to send a message to you". Then Murph sends a public key to Cooper. But these aliens in between also intercept this key. They are like "Haha, whatever you send will be recieved by us. Even if you send a way to change code, we will understand it" Obviously Cooper does not know whether the key has been seen by the aliens or not. He used the public key and encodes the entire message and sends it to Murph. And alas the aliens get the message. But wait, it is gibberish. And even though they are giving it their all, they can't seem to use the public key to decode the message.

The message goes to Murph and Murph too recieves some gibberish message. Now how will she decode this message?? Well she has an advantage over the aliens. She has a private key which enables her to decode the message easily. This is how any encryption works in principle.

I am currently working as a Research Scientist in Sony Research India after my 2 year stint in Sony Japan R&D labs. I have been intrigued by machine learning since my high school and have a vision to strive to understand the complex decision process of humans and implement it with the technologies we have, to push humans one step further.

I started my journey with Hyper Spectral Images for cancer detection, pixel level segmentation and since have worked on speech source separation, text-detection and recognition, Quant algoroithms, Reinforcement Learning, Automatic Speech Recognition, Singing/Emotional Speech Voice Conversion, Vocoders and Audio Steganography. It has been a great 6 years exploring this field.

- 5th March 2023