VISHAL KESWANI

DATA SCIENTIST

I hear and I forget. I see and I remember. I do and I understand. ~ Confucius

Internship Projects

Streamlined the pipeline: converting raw images to tfrecords, training, conversion of checkpoints to tflite format. Customized Object detection (COCO dataset) and Face Detection (Open Images dataset) for Raspberry Pi 4. Trained quantized MobileNet V2 on LFW dataset & user faces, obtained Face Recognition accuracy upto 93%. Linked face detection and recognition systems, used frame-averaging and thresholding for robust predictions.

Obtained word documents containing the chat threads between the clients and the care engineers (CE). Extracted e-mail bodies to csv, followed by tokenization, removal of stop words, stemming and lower casing. Used keyword search to create ’queries and CE questions repository’, mapped the two with each other. Vectorized each row using tf-idf scores, utilized for calculation of cosine-similarity measure. Performed k-means and k-medoids on client queries; for a new query, found the order of similar centroids. Reported inquiry questions for first 3 clusters (based on mapping) in decreasing order of similarity.

Designed a web-based betting game-cum-experiment using HTML, CSS, JavaScript consisting of 4 types of trials corresponding to low and high cognitive sample size in the two settings (hedonic and utilitarian). Used the Logistic function to simulate outcomes of bets, applied Box-Muller method to generate a Random Gaussian sample used to calculate percentile of players based on total amount after every bet. Computed mean reaction times and performed outlier detection using IQR method in MATLAB, utilitarian choices seem to converge for both sample sizes while hedonic ones diverge from each other.

Research Projects

Click + to view description.

Used Word2vec word-embeddings trained on corpus from scratch and pre-trained BERT word-embeddings. Classified embeddings using distance metrics (L2, L1), cosine-similarity and simple classifiers (Logistic, Naïve Bayes). Word2vec with Naïve Bayes & BERT with Logistic Regression gave best test accuracy of 88% & mean rank 1.2 (FinSim'2020 shared task).

Obtained memes annotated by organizers having both text and image modalities. Implemeted CNN (image-only) and ANN (text-only), combined using SVM, RF and Neural Network. Fine-tuned BERT (text-only) and Multimodal Bitransformer (text+image). Also used Naive Bayes (text-only). ANN+Word2vec perfomred best, beat baseline Macro-F1 by 63% for the sentiment analysis task under Memotion Analysis (Task 8, SemEval'2020).

Explored different algorithms from the extreme multi-label classification repository (manikvarma.org). Parabel, PfastreXML and FastXML were the top picks based on performance on most evaluation metrics. Trained the three on a dataset of 10k users, represented by ~16k features, with ~3k labels of interest. FastXML performed overall best with model size 1.42mb, prec@1 80.4% & prec@3 49.7% on test dataset.

Extracted twitter data (~10m tweets) filtered by keyword ‘tradewar’ from Jan 2018 (beginning of US-China trade war). Performed data preprocessing - lowercasing, removing stop words etc, & replacing emoticons with words. Trained Naive Bayes, Maximum Entropy and Decision Tree classifiers. Naive Bayes gave the best accuracy (~74%) and least training time on sample. Classified tweets as positive/negative, classification probabilities gave a polarity score for each tweet. Aggregated polarity scores to get daily trend, hence, constructed a subjective proxy for policy decisions.

Obtained seasonally adjusted real US GDP (quarterly) from the U.S. Bureau of Economic Analysis. Autocorrelation function, Dickey-Fuller & Phillips-Perron tests supported first differencing for stationariy. Used 9:1 train-test split due to high prediction time and walk-forward validation (rolling forecasts). Used MSE for tuning and model evaluation, ARIMA(1,1,1) performed best among econometric models. LSTM showed ~23% reduction in test MSE with 2 neurons & 18% with just 1 (batch-size = 1, epochs = 10).

Implemented PCA & NMF from scratch, performed dimensionality reduction on MNIST digits for K=2,3. Plotted transformed data in 2D and 3D, basis vectors and reconstructed samples for the two cases. Used NMF for image classification and obtained an accuracy of around ~80% on test data for K=3.

Contact

vkeswani@iitk.ac.in

keswanivishal1997@gmail.com

Facebook

Ajmer, Rajasthan

About me

Hi, I am Vishal. I am a dual degree student at IIT Kanpur. I have a minor degree in Machine Learning. I work in the fields of Natural Language Processing, Machine Learning and Econometrics. I aspire to make a career in the same and have done many relevant projects and courses. My projects have been in Sentiment Analysis, Recommendation Systems, Image Processing, Matrix Factorization, Time Series forecasting etc. If you want to collaborate with me, please visit the Contact section.

I have worked in Python, MATLAB, R, C and C++ depending on the course and project requirements. I am also familiar with web technologies, including HTML, CSS, JavaScript and SQL. I have used many python packages including TensorFlow, Pytorch, Pm4Py, NumPy, Pandas, Nltk, Keras, TextBlob, ScikitLearn etc. I am comfortable with both Linux and Windows and did a course in bash scripting.

I took part in international challenges under SemEval-2020 and FinNLP-2020 in which my team stood first. Check out the corresponding PUBLICATIONS.

Resume

CV

Academics

Ranked 1st in the FinSim Shared Task organised under FinNLP-2020 International Workshop
Stood 1st in Task A of Memotion Analysis organised under SemEval-2020 International Workshop
Received Commendation Certificate for excellent performance under the SURGE’2018 program
Bagged the Best Staff’s Pick Project Award in the Technical Arts course in a batch of 380
Ranked in top 0.05% in JEE Mains (City Rank 1) and top 0.15% in JEE Advanced (City Rank 2)
Received Certificate of Merit from K.V.S. for standing in the top 1.5% in C.B.S.E. examination

Extra-curriculars

Secured the 1st position in British Parliamentary Debate in Galaxy'2017 (Inter-pool cultural fest of IITK)
Became School Captain of Kendriya Vidyalaya 1, Ajmer, also won Best Student Award in 12th standard
Best Speaker from region in Youth Parliament Competition'2013-14 at the Zonal level (5 zones in India)
Captained School Cricket Team of E.P.S. Ajmer and won Fair-Play Award in an open cricket tournament
Stood in top 150 in Tata Building India Essay Competition among 2 million participants in 2010-11 & 11-12