Programing/R- programming

AI Performance metrics CI 구해주는 R 코드

sosal 2024. 7. 22. 02:15
반응형

매번 똑같은거 Metrics 구해놓고

CI 다시 계산한다고 시간을 보내고 있는 나를 발견....

 

그냥 블로그에 올려놓고 생각나면 편하게 와서 긁어 쓰려구 포스팅

 

함수 정의

 
library(boot)
library(Epi)
library(pROC)

calc_metrics <- function(data, indices) {
    d <- data[indices, ]
    tp <- sum(d$predictions == 1 & d$label == 1)
    tn <- sum(d$predictions == 0 & d$label == 0)
    fp <- sum(d$predictions == 1 & d$label == 0)
    fn <- sum(d$predictions == 0 & d$label == 1)
   
    acc <- (tp + tn) / (tp + tn + fp + fn)
    sens <- tp / (tp + fn)
    spec <- tn / (tn + fp)
    ppv <- tp / (tp + fp)
    npv <- tn / (tn + fn)
   
    return(c(acc, sens, spec, ppv, npv))
}

get_CI <- function(final, nPos, nNeg){
    tp <- round(final$sens * nPos, 0)
    fn <- round(nPos - tp,0)
    tn <- round(final$spec * nNeg,0)
    fp <- round(nNeg - tn,0)
    data_boot <- data.frame(predictions = c(rep(1, tp + fp), rep(0, tn + fn)),
                                            label = c(rep(1, tp), rep(0, fp), rep(1, fn), rep(0, tn)))
    results <- boot(data_boot, statistic = calc_metrics, R = 1000)
   
    target_metrics = c("acc", "sens", "spec", "ppv", "npv")
    for(i in 1:length(target_metrics)){
        if(final[target_metrics[i]] == 1) CI = c(1.000, 1.000)
        else CI = boot.ci(results, type = "perc", index = i)$percent[4:5]
        final[target_metrics[i]] = paste0(round(final[target_metrics[i]], 3), " (", round(CI[1], 3),"-", round(CI[2], 3), ")")
    }
    return(final)
}

getMetrics <- function(data){
    res = Epi::ROC(data$pred, data$label==1, plot=F)
    AUC = round(res$AUC, 3)
    AUC_CI = pROC::ci(pROC::roc(data$label==1 ~ data$pred))
    AUC = paste0(AUC, ' (', round(AUC_CI[1], 3),'-',round(AUC_CI[3], 3),')')
    res = data.frame(res$res)
    res$npv = 1 - res$pvp
    res$ppv = 1 - res$pvn
    res$AUC = AUC
    res$acc = (res$sens * sum(data$label == 1) + res$spec * sum(data$label == 0)) / nrow(data)
    final = res[which.max(res$sens + res$spec), c("AUC", "acc", "sens", "spec", "ppv", "npv")]
    nPos = sum(data$label==1)
    nNeg = sum(data$label==0)
    return(get_CI(final, nPos, nNeg))}
 

 

데이터 정의

data = data.frame(
    label = sample( c(0, 1), 300, replace=T),
    pred <- runif(300, min = 0, max = 1))

getMetrics(data)

 

 

AUC는 roc의 CI 활용,

Accuracy, Sensitivity, Specificity, PPV, NPV는 Bootstrap으로 계산

 

잘 동작함!