VISHAL KESWANI

DATA SCIENTIST


I hear and I forget. I see and I remember. I do and I understand. ~ Confucius

Internship Projects

Click + to view description.

Streamlined the pipeline: converting raw images to tfrecords, training, conversion of checkpoints to tflite format. Customized Object detection (COCO dataset) and Face Detection (Open Images dataset) for Raspberry Pi 4. Trained quantized MobileNet V2 on LFW dataset & user faces, obtained Face Recognition accuracy upto 93%. Linked face detection and recognition systems, used frame-averaging and thresholding for robust predictions.

Obtained word documents containing the chat threads between the clients and the care engineers (CE). Extracted e-mail bodies to csv, followed by tokenization, removal of stop words, stemming and lower casing. Used keyword search to create ’queries and CE questions repository’, mapped the two with each other. Vectorized each row using tf-idf scores, utilized for calculation of cosine-similarity measure. Performed k-means and k-medoids on client queries; for a new query, found the order of similar centroids. Reported inquiry questions for first 3 clusters (based on mapping) in decreasing order of similarity.

Designed a web-based betting game-cum-experiment using HTML, CSS, JavaScript consisting of 4 types of trials corresponding to low and high cognitive sample size in the two settings (hedonic and utilitarian). Used the Logistic function to simulate outcomes of bets, applied Box-Muller method to generate a Random Gaussian sample used to calculate percentile of players based on total amount after every bet. Computed mean reaction times and performed outlier detection using IQR method in MATLAB, utilitarian choices seem to converge for both sample sizes while hedonic ones diverge from each other.

Research Projects

Click + to view description.

Used Word2vec word-embeddings trained on corpus from scratch and pre-trained BERT word-embeddings. Classified embeddings using distance metrics (L2, L1), cosine-similarity and simple classifiers (Logistic, Naïve Bayes). Word2vec with Naïve Bayes & BERT with Logistic Regression gave best test accuracy of 88% & mean rank 1.2 (FinSim'2020 shared task).

Obtained memes annotated by organizers having both text and image modalities. Implemeted CNN (image-only) and ANN (text-only), combined using SVM, RF and Neural Network. Fine-tuned BERT (text-only) and Multimodal Bitransformer (text+image). Also used Naive Bayes (text-only). ANN+Word2vec perfomred best, beat baseline Macro-F1 by 63% for the sentiment analysis task under Memotion Analysis (Task 8, SemEval'2020).

Explored different algorithms from the extreme multi-label classification repository (manikvarma.org). Parabel, PfastreXML and FastXML were the top picks based on performance on most evaluation metrics. Trained the three on a dataset of 10k users, represented by ~16k features, with ~3k labels of interest. FastXML performed overall best with model size 1.42mb, prec@1 80.4% & prec@3 49.7% on test dataset.

Extracted twitter data (~10m tweets) filtered by keyword ‘tradewar’ from Jan 2018 (beginning of US-China trade war). Performed data preprocessing - lowercasing, removing stop words etc, & replacing emoticons with words. Trained Naive Bayes, Maximum Entropy and Decision Tree classifiers. Naive Bayes gave the best accuracy (~74%) and least training time on sample. Classified tweets as positive/negative, classification probabilities gave a polarity score for each tweet. Aggregated polarity scores to get daily trend, hence, constructed a subjective proxy for policy decisions.

Obtained seasonally adjusted real US GDP (quarterly) from the U.S. Bureau of Economic Analysis. Autocorrelation function, Dickey-Fuller & Phillips-Perron tests supported first differencing for stationariy. Used 9:1 train-test split due to high prediction time and walk-forward validation (rolling forecasts). Used MSE for tuning and model evaluation, ARIMA(1,1,1) performed best among econometric models. LSTM showed ~23% reduction in test MSE with 2 neurons & 18% with just 1 (batch-size = 1, epochs = 10).

Implemented PCA & NMF from scratch, performed dimensionality reduction on MNIST digits for K=2,3. Plotted transformed data in 2D and 3D, basis vectors and reconstructed samples for the two cases. Used NMF for image classification and obtained an accuracy of around ~80% on test data for K=3.

Contact

  vkeswani@iitk.ac.in
  keswanivishal1997@gmail.com
  LinkedIn
  Facebook
  Ajmer, Rajasthan

About me

Hi, I am Vishal. I am a dual degree student at IIT Kanpur. I have a minor degree in Machine Learning. I work in the fields of Natural Language Processing, Machine Learning and Econometrics. I aspire to make a career in the same and have done many relevant projects and courses. My projects have been in Sentiment Analysis, Recommendation Systems, Image Processing, Matrix Factorization, Time Series forecasting etc. If you want to collaborate with me, please visit the Contact section.

I have worked in Python, MATLAB, R, C and C++ depending on the course and project requirements. I am also familiar with web technologies, including HTML, CSS, JavaScript and SQL. I have used many python packages including TensorFlow, Pytorch, Pm4Py, NumPy, Pandas, Nltk, Keras, TextBlob, ScikitLearn etc. I am comfortable with both Linux and Windows and did a course in bash scripting.

I took part in international challenges under SemEval-2020 and FinNLP-2020 in which my team stood first. Check out the corresponding PUBLICATIONS.

Resume

CV

Academics

Extra-curriculars

work