In- and Out of- sample validation results using MAE and Hit Rate

Dear Sawtooth Experts,

The data collection was complete for my study and I wanted to conduct In-Sample and Out-of-Sample validation for my CVA study.

I had 15 CVA questions and 1 test retest reliability CVA question, identical to the first CVA question. I also included 3 hold out choice tasks involving 3 alternative options.

My overall results seem good and highlighted some interesting things.

I did a very simple test retest reliability using SPSS with a pair of CVA task and correlation was significant presenting that the reliability is quite good.

However, I am really concerned about my sample validation.

My sample size is 424, not big.

First, for MAE, I obtained the following results:

(I estimated utility using HB OLS and for RFC/Genetic for simulation and none of my attributes were price related, I applied correlated error for all the attributes used in the study.)

In sample

39.00% 44.10% 5.10%
29.00% 23.60% 5.40%
32.00% 32.30% 0.30%

Holdout1 3.60%

35.70% 22.60% 13.10%
31.50% 25.50% 6.00%
32.80% 51.90% 19.10%

Holdout2 12.73%

30.10% 20.80% 9.30%
33.50% 54.20% 20.70%
36.40% 25% 11.40%

Holdout3 13.80%

Out of sample
(random split using SPSS, resulted in 207 & 217 samples)

Predicted (n = 217) vs Actual (n=207)

39.10% 43.50% 4.40%
28.70% 21.30% 7.40%
32.20% 35.30% 3.10%

Holdout1 4.97%

34.50% 21.70% 12.80%
32.60% 29% 3.60%
32.90% 49.30% 16.40%

Holdout2 10.93%

29.80% 19.80% 10.00%
33.00% 55.60% 22.60%
37.20% 24.60% 12.60%

Holdout3 15.07%

Predicted (n = 207) vs Actual (n=217)

Holdout1 4.07%
Holdout2 14.23%
Holdout3 12.83%

As presented, it seems that Only Holdout 1 seems alright and the validation is quite low for Holdout 2 and 3, if I understood the technical papers and other forum posting. Am I right? Is there a good reference range for MAE percentage? It seems that for CBC studies with 3 alternative products, the value of MAE is quite alright for 4-5. I wonder what could have caused this and what should I tell my audience for my study, rather than saying my conjoint model seems not so successful? I really need your help and advice for figuring this out.

Secondly, although Hit rate is not as good measure as MAE or MSE, I still wanted to check it, so, this is what I did:

For Sawtooth Software choice simulator, with HB OLS utility estimation, I Selected the "First Choice" rule for each scenario using an individual-level utility run (resulting in Individual Results) to get the 1/0 hit rate metric. I exported the result to Excel and compared it with the actual choice for the 3 holdout choice tasks.

Thus, for hit rate, I obtained the following results:
Holdout set 1 156/424 36.79%
Holdout set 2 165/424 38.92%
Holdout set 3 191/424 45.05%

The usual percentage for 3-4 product alternatives, 55-75% hit rate is the usual range, and I am quite sure that my range, 36.79% to 45.05% is not good enough.

Given the results of MAE and Hit rate, I am considering reporting this element as a limitation of conjoint portion in my study. That my conjoint model should be interpreted with caution as it has relatively low hit rate and high MAE.

At this time, the data collection is already complete and this is the actual results of my study.

Is there any other advice or suggestion for this issue?

I have looked for many peer reviewed journal articles that used Conjoint analysis, and it seems that most of the researchers omit to report reliability and validity of the conjoint analysis.

However, I wanted to do as much as I can and be transparent with my study.

Any comments are appreciated.

I really appreciate your valuable time for looking into my very lengthy questions.