Arpan Sheetal

Data Scientist @ Innoplexus

About Arpan Sheetal

Arpan Sheetal is a Data Scientist at Innoplexus in Pune, India, where he has worked since 2016. He specializes in enhancing entity disambiguation and has engineered solutions that significantly improve processing speed and reduce costs.

Work at Innoplexus

Arpan Sheetal has been employed at Innoplexus as a Data Scientist since 2016, contributing to the company's operations in the Pune Area, India. Over the course of eight years, he has engineered a hybrid ElasticSearch cluster for the disambiguation pipeline, which resulted in a 50% increase in processing speed and a 68% reduction in operational costs. His work includes enhancing entity disambiguation capabilities significantly, achieving an 85% improvement through the application of a knowledge graph containing 3.4 billion nodes and 1.7 billion connections.

Education and Expertise

Arpan Sheetal completed his Master of Technology (MTech) in Mathematics and Computing at the Indian Institute of Technology, Delhi, from 2011 to 2016. His academic background has equipped him with a strong foundation in data science and computational techniques. His expertise includes the use of PyTorch-BigGraph for graph embedding and RandomForestClassifier for classification, which he applied to enhance entity disambiguation processes.

Background

Prior to his role at Innoplexus, Arpan Sheetal worked as a Software Engineer at ElectronicTender for a brief period of two months in 2014. He also spent five years at the Indian Institute of Technology, Delhi, where he gained practical experience as an Engineering Student. This combination of education and early career experience has contributed to his skills in data science and software engineering.

Achievements in Data Science

Arpan Sheetal has made significant contributions to data science through various initiatives. He improved clustering accuracy in the disambiguation pipeline by 76% by integrating organization hierarchy and affiliation phrase extraction. He also automated the extraction of 14 variables from web pages using a hybrid system of heuristic and Named Entity Recognition (NER) models, which reduced manual effort by 90%. Additionally, he incorporated Dask to enhance the data transformation process, achieving a tenfold performance improvement.

People similar to Arpan Sheetal