Bob randomly chooses to measure each photon in either one of two bases: the horizontal-vertical basis or the $45^o$ basis. The former can distinguish a horizontal from a vertical photon, while the latter can distinguish a $+45^{o}$ photon from a $-45^o$ photon.
What Bob doesn't know, is whether he measured the photon via the same basis it is aligned by. For example, Bob may measure a $-45^o$ photon via the horizontal-vertical detector, which will give him an unreliable result.
That is why, as you notice in the line before last of your diagram there is a "compatibility" property. After sending all the necessary photons to Bob, Alice uses a classical channel to communicate to Bob the bases she used when sending each photon, and Bob tells her for each one whether he used the compatible detector or not.
Since the bases themselves don't reveal the orientation of the photon (For example: knowing it is the horizontal-vertical basis doesn't tell you whether the photon was sent with the horizontal or vertical polarization) there's in theory no security problem in doing that. So, Alice and Bob throw away all the photon values that weren't sent and measured with the same basis (hence, such photons are incompatible) so they are left only with the photon values (bits) that are identical (have the same $1/0$ value) which they can then use as a key.
Of course, if very few photons are sent, an attacker listening on this channel might get lucky and orient his detector exactly like Bob did, and hence recover the key also. So in general such a protocol becomes more secure the more photons are sent. But there are probably many other attacks which are more complicated, involving weak measurements and so on. This is a separate topic.