Mayank Kumar Singh

Publications

Last Updated: 5th March 2023

2022


Hierarchical Diffusion Models for Singing Voice Neural Vocoder

N. Takahashi, M. K. Singh and Y. Mitsufuji, "Hierarchical Diffusion Models for Singing Voice Neural Vocoder," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023.

Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing

M. K. Singh, Nirmesh Shah and Y. Mitsufuji, "Nonparallel Emotional Voice Conversion For Unseen Speaker-Emotion Pairs Using Dual Domain Adversarial Network & Virtual Domain Pairing," ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 2023.

Cross-modal Face-and Voice-style Transfer

N. Takahashi, M. K. Singh and Y. Mitsufuji, "Cross-modal Face-and Voice-style Transfer"

Robust One-Shot Singing Voice Conversion

N. Takahashi, M. K. Singh and Y. Mitsufuji, "Robust One-Shot Singing Voice Conversion"

2021


Source Mixing and Separation Robust Audio Steganography

N. Takahashi, M. K. Singh and Y. Mitsufuji, "Source Mixing and Separation Robust Audio Steganography," ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, Singapore, 2022, pp. 241-245, doi: 10.1109/ICASSP43922.2022.9746486.

Hierarchical disentangled representation learning for singing voice conversion

N. Takahashi, M. K. Singh and Y. Mitsufuji, "Hierarchical disentangled representation learning for singing voice conversion," 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 2021, pp. 1-7, doi: 10.1109/IJCNN52387.2021.9533583.

2020


NENET: An Edge Learnable Network for Link Prediction in Scene Text

Mayank Kumar Singh, Sayan Banerjee, Subhasis Chaudhuri, "NENET: An Edge Learnable Network for Link Prediction in Scene Text"

2019


Improving Voice Separation by Incorporating End-To-End Speech Recognition

N. Takahashi, M. K. Singh, S. Basak, P. Sudarsanam, S. Ganapathy and Y. Mitsufuji, "Improving Voice Separation by Incorporating End-To-End Speech Recognition," ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Barcelona, Spain, 2020, pp. 41-45.

I am currently working as a Research Scientist in Sony Research India after my 2 year stint in Sony Japan R&D labs. I have been intrigued by machine learning since my high school and have a vision to strive to understand the complex decision process of humans and implement it with the technologies we have, to push humans one step further.

I started my journey with Hyper Spectral Images for cancer detection, pixel level segmentation and since have worked on speech source separation, text-detection and recognition, Quant algoroithms, Reinforcement Learning, Automatic Speech Recognition, Singing/Emotional Speech Voice Conversion, Vocoders and Audio Steganography. It has been a great 6 years exploring this field.

- 5th March 2023
Contact Information