I don't have any insight to Koyo's DL implementation, but I suspect they have a second copy of the image register to implement edges. That would give the "once everywhere" behavior rather than the "once per instruction" implementation that Do-more has. We considered that approach, but with Do-more's larger image register the performance hit to do that was unacceptable. I actually like theirs better conceptually if the hit wasn't so bad, but engineering is all about values and trade-offs.
And wow...I just realized I completely misspoke about our implementation. We don't use workspace for that, because it doesn't work right during bumpless run mode updates. We started with that and weren't happy with the results. We use edge memory, which is assigned on a per instruction basis by DmD. Behavior is the same, but where we store the prior state is different. Critical issue is run mode update behavior...needs to work perfectly.