Raja Kumar

I am currently pursuing my Masters in Computer Science at the University of California Santa Cruz. I am currently a part of VIS group at UCSC guided by professor James Davis

Previously, at Samsung Research, I worked with the on-device-AI team on developing and deploying Deep Learning Models on Samsung devices. I also worked on model compression techniques, DNN quantization in particular. Currently, I am doing research on 3D face reconstruction.

I obtained my Bachelor's degree in Electrical Engineering from IIT Kharagpur.

Research Interest: 3D vision and its intersection with Computer Vision, Computer Graphics, and Machine Learning

Email  /  GitHub  /  Google Scholar  /  LinkedIn  /  Twitter  /  CV

profile photo

Projects/Papers

Paper: Disjoint Pose and Shape for 3D Face Reconstruction
Raja Kumar, Jiahao Luo, Alex Pang, James Davis
IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023

Existing methods for 3D face reconstruction from a few casually captured images employ deep learning based models along with a 3D Morphable Model(3DMM) as face geometry prior. Structure From Motion(SFM), followed by Multi-View Stereo (MVS), on the other hand, uses dozens of high-resolution images to reconstruct accurate 3D faces.However, it produces noisy and stretched-out results with only two views available. In this paper, taking inspiration from both these methods, we propose an end-to-end pipeline that disjointly solves for pose and shape to make the optimization stable and accurate. We use a face shape prior to estimate face pose and use stereo matching followed by a 3DMM to solve for the shape. The proposed method achieves end-to-end topological consistency, enables iterative face pose refinement procedure, and show remarkable improvement on both quantitative and qualitative results over existing state-of-the-art methods.


Paper: 3D Face Reconstruction: Is model-based classic passive stereo competitive?
Jiahao Luo, Raja Kumar, Alex Pang, James Davis
Under Review

The highest quality 3D face reconstructions are produced using multi-view stereo methods, reporting errors below 0.5mm. Unfortunately, these methods typically employ dozens of high-resolution cameras in a large laboratory capture gantry. In contrast, state-of-the-art 3D face reconstructions using sophisticated deep learning models and few views are suited for casual mobile phone imaging outside the lab and report a mean error of 1-2mm. This paper investigates whether classic passive stereo methods can be used in scenarios with only a few low-resolution images available. We expect to find that it cannot since multi-view stereo performs well only when many high-resolution images are provided. When only two low-resolution images are available, stereo produces very noisy results which are not directly usable. However, our analysis shows that this visually noisy data has lower error than comparison state-of-the-art methods. We find that the visual artifacts from stereo can be removed using a morphable face model to constrain face shape

Pdf

Project: Dense 2D facial landmarks detector using 3D face model

Most of the existing face-alignment models detect the standard 68 face landmarks. However, there is a need of dense facial landmarks for tasks such 3d face reconstruction, face recognition etc. DAD-3DHeads proposed a dense 2d landmark detector based on 3D face reconstruction. However, their performance deteriorates when face is only a part of the image. To improve on this, I used mediapipe facial detector to detect the face, use the detected face to DADNet and then project back the detected landmarks in the original image coordinate. The Proposed pipeline is robust to all kind of input images.


Project: Task Oriented Conversational Modelling With Subjective Knowledge

Existing conversational models are handled by a database(DB) and API based systems. However, very often users’ questions require information that cannot be handled by such systems. Nonetheless, answers to these questions are available in the form of customer reviews and FAQs. DSTC-11 proposes a three stage pipeline consisting of knowledge seeking turn detection, knowledge selection and response generation to create a conversational model grounded on this subjective knowledge. In this paper, we focus on improving the knowledge selection module to enhance the overall system performance. In particular, we propose entity retrieval methods which result in an accurate and faster knowledge search. Our proposed Named Entity Recognition (NER) based entity retrieval method results in 7X faster search compared to the baseline model. Additionally, we also explore a potential keyword extraction method which can improve the accuracy of knowledge selection. Preliminary results show a 4 % improvement in exact match score on knowledge selection task.

pdf / code

Paper: Hybrid and non-uniform quantization methods using retro synthesis data for efficient inference
Raja Kumar, Tejpratap GVSL, Pradeep NS,
IJCNN, 2021

We proposes a data-independent post-training quantization scheme that eliminates the need for training data. This is achieved by generating a faux dataset referred to as ‘Retro- Synthesis Data'. We also introduced two futuristic variants of post- training quantization methods namely ‘Hybrid Quantization’ and ‘Non-Uniform Quantization’ for efficient and accurate inference.

pdf / code

Project: Automatic Classification and Localisation of Defects in Hot-Rolled Steel Surfaces

This is my Bachelor's thesis aimed at developing an efficient DL solution for real-time detection and segmentation of defects in hot-rolled steel Surfaces. We train an R-CNN model in the frequency domain using DCT to improve the accuracy achieving a mAP@0.5 score of 82.4.



Teaching Assistant
  • [Fall 2023] CSE12: Computer Systems and Assembly Programming
  • [Summer 2023] CSE102: Introduction to Analysis of Algorithms
  • [Spring 2023] CSE120: Computer Architecture
  • [Winter 2023] CSE12: Computer Systems and Assembly Programming
  • [Fall 2022] CSE101: Introduction to data structures and algorithms

News/Updates
  • [Oct 2023] Attended ICCV 2023 in Paris
  • [Aug 2023] Our Paper on 3D face reconstruction got accepted at ICCV workshops 2023
  • [Sep 2022] I Joined UCSC as CSE Master's student starting in Fall 2022
  • [March 2022] I joined Silverlabs recommendation team as an ML Engineer
  • [April 2021] Our paper on quantization got accepted by IJCNN 2021
  • [June 2019] I joined Samsung Research on-device-AI team as an ML Engineer

Design and source code from Jon Barron's website