Data Science Projects

DBSCAN-Clustering

Article to understand how the clustering algorithm DBSCAN works, using RNA sequences of patients with various types of tumor. (Written in pt_BR).

Yelp Heatmap

Project to demonstrate how Apache Spark can be used to manipulate data, with an end goal of generating a heatmap based on the reviews of users.

Comedy is Dead

Study of the comedy genre through the years using graph theory and an API from IMDB to download the movie data automatically and pandas to arrange all the data (Article pt_BR).

Forrest Cover Classifier

This repo hosts an analysis of the "Forest Cover type" dataset comparing results of the Random Forrest Classier with Hyperparameter Tuning and a SGD classifier.