There's a mistake in the paper: Equation 4 marginalizes in the wrong
way. It should be

p(I|theta) = \prod_t \int p(I_t,z_t)|theta) dz_t
= \prod_t \int \prod_w,j p(I_w(p_jt),z_t,W_wjt|theta) dz_t

If you work through the derivation, the only impact is that we're giving
way too much weight to the prior (in effect, reducing the prior
variance). We haven't tried fixing it yet.