Open Source
Over the last few years, I have developed several open-source packages that used widely by the community. You can find an overview of these projects and packages here.
BERTopic
BERTopic is a novel topic modeling technique that leverages BERT embeddings and c-TF-IDF to create dense clusters allowing for easily interpretable topics.
PolyFuzz
PolyFuzz performs fuzzy string matching, string grouping, and contains extensive evaluation functions.
PolyFuzz is meant to bring fuzzy string matching techniques together within a single framework.
KeyBERT
KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.
Concept
Concept introduces the concept of Concept Modeling. It takes inspiration from topic modeling techniques to cluster images, find commonalities (i.e. concepts) and create a multimodal representation.
SoAn
Created a package that allows in-depth analyses (sentiment analysis, topic modelling, etc.) on whatsapp conversations.
VLAC
Leveraging clusters of word embeddings to create features from a collection of documents allowing for classification of documents.
ReinLife
Using Reinforcement Learning, entities learn to survive, reproduce, and make sure to maximize the fitness of their kin.
Analyses
A small selection of projects and analyses I have done in the past to further develop my Data Science skills in my early career. It's nice to see where I started and how I ended.
Reviewer
A package for scraping user reviews from IMDB, generate C-TF-IDF based word clouds, and extract popular characters from reviews.
Disney
Tournament brackets are generated based on a seed score calculated through scraping data from IMDB and RottenTomatoes.
Hurdle Model
Used Apple Store data to analyze which business model aspects (entry timing and technological innovation) influence performance of mobile games.
Boardgame Exploration
Created an application for exploring board game matches that I tracked over the last year. Streamlit and Heroku were used for deployment.