Pushap Gandhi Portfolio

DonorsChoose.org Application Screening

Project Overview

Established in 2000, DonorsChoose.org empowers public school teachers nationwide to request essential materials and experiences for their students. The organization receives an overwhelming number of project proposals each year, with a current need for a large number of volunteers to manually review and approve submissions before they can be featured on the DonorsChoose.org website.

The objective is to develop predictive algorithms capable of determining whether a DonorsChoose.org project proposal submitted by a teacher will be approved.

View Dataset

GitHub Repository

Performace Metric

The evaluation of submissions is based on the Area under the Receiver Operating Characteristic (ROC) curve, measuring the predictive accuracy of the algorithm in comparison to the observed target of project approval.

Technologies Used

Project Details

The problem is formulated in form of binary classification where where '0' denotes not-accepted and '1' denotes accepted project proposals. Three distinct approaches—Naive Bayes, Decision Tree, and Gradient Boosting Decision Trees (GBDT)—are employed to address this challenge.

Approach 1-Naive Bayes

For this approach following operations are performed

Featurization :

Set 1: categorical, numerical features + preprocessed_eassay (BOW)

Set 2: categorical, numerical features + preprocessed_eassay (TFIDF)

Hyperparameter tuning

Training with best hyperparameters

Results

Set 1

Set 2

Approach 2-Decision Tree

For this approach following operations are performed

Featurization :

Set 1: categorical, numerical features + preprocessed_eassay (BOW)

Set 2: categorical, numerical features + preprocessed_eassay (TFIDF)

Hyperparameter tuning

Training model with best hyperparameter

selecting features which are having non-zero feature importance

Trainging machine learning model on these features

Results

Set 1

Set 2

Approach 3-XG Boost

For this approach following operations are performed

Featurization :

Set 1: categorical (Response coding use probability values), numerical features + Project title(TFIDF)+ Essay (TFIDF)+ Essay Sentiment Score

Set 2: categorical( response coding use probability values), numerical features + project_title(TFIDF W2V)+ preprocessed_eassay (TFIDF W2V)

Hyperparameter tuning

Training with best hyperparameters

Results

Set 1

Set 2

Get in Touch

Connect with me through the following platforms: