
Showing posts from September, 2014

Great on article on the difference between L1 and L2 regularization.

I find this to be a pretty complex topic, but I think that this article explains the differences very intuitively. What is the simple intuition? I am by no means an expert, but here is some basic intuition for why: 1. L1 regularization (Least absolute errors) produces sparse solutions, and therefore has built in feature selection 2. L2 regularization (Least squares error) does not Suppose you had 6 weights, and your L1 regularization term  had a choice between a few sparse weights, or many smaller weights: L1 = |0|+|0|+|0|+|-5|+|0|+|1.4| = 4.8 OR  L1 = |1.2|+|1.3|+|.8|+|2.4|+|1.8|+|1.4| = 8.9 In this case, your optimization algorithm would converge towards fewer sparse weights because of the absolute value term. Now lets take a look at the situation with the L2-norm: L2 = 0^2 +0^2+0^2+-5^2+0^2+1.4^2 = 26.96 OR  L2 = 1.2^2+1.3^2+.8^2+2.4^2+1.8^2+1.4^2 = 14.73 ...

Simple logistic regression model and ROC Curve with R + intuitive explanation of ROC curve

I recently discovered a cool package in R called pROC that is very convenient for making ROC curves. Here is an example of how to implement in R after creating a model: I've included the entire process starting from splitting the data into training and test, fitting the model, validating by predicting the test set, and finally drawing the ROC curve. library(sqldf) library(ggplot2) library(reshape) library(pROC) setwd('/home/willie/data_science/XXXXX') system('ls) ###generate training and test set using 80/20 split######## splitdf <- function(dataframe, seed=NULL) { if (!is.null(seed)) set.seed(seed) index <- 1:nrow(dataframe) trainindex <- sample(index, trunc(length(index)/5)) trainset <- dataframe[-trainindex, ] testset <- dataframe[trainindex, ] list(trainset=trainset,testset=testset) } splits <- splitdf(mydata, seed = 888) training_data <- splits$trainset test_data <- splits$testset ###create a logistic regression model m...