Research Statement

Most of my research during undergrad was focused on document image processing. I also worked on deep learning approach to computer vision problems. Currently, my focus is on applying machine learning techniques to solve various real world problems like treatment planning and solving games

Abstract

PDF format of scanned document images is not searchable. OCR tries to remedy this adversity by converting document images into editable and searchable data, but it has its own limitations in presence of equations – both mathematical and chemical. OCR system for mathematical equation is already a major research area and has provided successful result. However, chemical equation segmentation has been a less ventured road. In this paper, we present a novel method for automated generation of searchable PDF format of segmented chemical equations from scanned document images by performing chemical symbol recognition and auto-correction of OCR output. We use existing OCR system, pattern recognition technique, contextual data analysis and a standard LATEX package to generate the chemical equation in searchable PDF format. The effectiveness of the proposed method is verified through exhaustive testing on 234 document images.

PDF   Publisher

Publication

Proceedings of the 2016 ACM Symposium on Document Engineering
Vienna, Austria — September 13 – 16, 2016
Pages 147-156

Abstract

Segmentation of mathematical equations from document images is already a major research area for improved performance of OCR systems. Though chemical equations are also sharing similar spatial properties as that of non-chemical equations (for example, mathematical equations), efforts to segment those are still to be explored. This paper presents a novel method for segmenting and identifying chemical and any other equations in heterogeneous document images that may contain graphics, tables, text and the classifying them into two categories; chemical and non-chemical equations. This study, a first of its kind, as far our knowledge goes, not only improves the OCR performance, but also leads to creation of chemical database and formation of bond electron matrix from chemical equations or formulae. In our proposed method we extracted the equations using morphological operators and histogram analysis and the extracted equations are classified using an open source OCR engine. The effectiveness of the proposed method is demonstrated by testing it on 152 document images. Test results show an accuracy of 97.4% and 97.45% for segmentation and classification, respectively.

PDF   Publisher

Publication

Eighth International Conference on Advances in Pattern Recognition (ICAPR)
Kolkata, West Bengal, India
4-7 Jan. 2015

Abstract

Indian classical dance has existed since over 5000 years and is widely practised and performed all over the world. However, the semantic meaning of the dance gestures and body postures as well as the intricate steps accompanied by music and recital of poems is only understood fully by the connoisseur. The common masses who watch a concert rarely appreciate or understand the ideas conveyed by the dancer. Can machine learning algorithms aid a novice to understand the semantic intricacies being expertly conveyed by the dancer? In this work, we aim to address this highly challenging problem and propose deep learning based algorithms to identify body postures and hand gestures in order to comprehend the intended meaning of the dance performance. Specifically, we propose a convolutional neural network and validate its performance on standard datasets for poses and hand gestures as well as on constrained and real-world datasets of classical dance. We use transfer learning to show that the pre-trained deep networks can reduce the time taken during training and also improve accuracy. Interestingly, we show with experiments performed using Kinect in constrained laboratory settings and data from Youtube that it is possible to identify body poses and hand gestures of the performer to understand the semantic meaning of the enacted dance piece.

PDF   Publisher

Publication

Signal Processing: Image Communication, Elsevier
Volume 47, September 2016
Pages 529-548

Close Menu
Scroll Up