## Circle floodfill ideas?

### Re: Circle floodfill ideas?

Ok so the constraints are that we have to work with MOS calls, which are what are slowing us down; so we need to minimise the number of OS calls we make!

I got the grouping of vertical line segments into single DRAWs working. Code. Hopefully as this version calls the OS less than half as many times as previously, the code might go twice as fast? I can't work out how to do 45-degree line segments, but vertical line segments was fairly straight forward.

Edit: Added version with alternating colours of vlines, for illustration.

I got the grouping of vertical line segments into single DRAWs working. Code. Hopefully as this version calls the OS less than half as many times as previously, the code might go twice as fast? I can't work out how to do 45-degree line segments, but vertical line segments was fairly straight forward.

Edit: Added version with alternating colours of vlines, for illustration.

Last edited by mike12f on Sat Nov 27, 2021 5:50 pm, edited 2 times in total.

### Re: Circle floodfill ideas?

Thinking about this more, I suspect there is a neat diagonal algorithm that could be found, that could cover more angles than just 0, 45 and 90 degrees. Basically, if the previous hline length on row y was DeltaX, and on the next y row we find we need another hline length of the same DeltaX, then we should hold off drawing the first one, and join them together into a single draw statement. This could be repeated for n consecutive DeltaX values on consecutive rows - group them into a single straight line DRAW. This strategy could cover vlines (i.e. where DeltaX is 0) and pure 45-degree diagonal lines (where DeltaX is 1 on each consecutive row) but also other gradients (e.g. where deltaX is 2 on each consecutive row). If we could program this then we'd be grouping a much larger number of OSWRCH calls into single calls, and hopefully largely nullify the OS-calling time overheads.

Edit: Concept-demo, not finished.

Version 2 (Seems to be working, but needs testing properly.)

Edit3: It's working pretty well but in this example the longest diagonal yellow line segment is a little gnarled, compared to an earlier version where the pixels do not line up exactly; I'm not sure why.

Edit: Concept-demo, not finished.

Version 2 (Seems to be working, but needs testing properly.)

Edit3: It's working pretty well but in this example the longest diagonal yellow line segment is a little gnarled, compared to an earlier version where the pixels do not line up exactly; I'm not sure why.

- TobyLobster
**Posts:**216**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

### Re: Circle floodfill ideas?

Perhaps related, I did just notice that if Exclusive OR plotting your example of Version 2 (replace eeach GCOL 0,X with GCOL 3,X) then there's a place where the same pixel is drawn twice.

### Re: Circle floodfill ideas?

Thanks. That GCOL3,X idea is a good diagnostic idea. For example, here is the behaviour on the vertical-line batch algorithm, which clearly was wrong. Here is an almost-fix to it (which I can fully fix if we decide to put that version into 6502 code).

I'll try to investigate and fix that GCOL3,X bug in the diagonal batcher algorithm next... Edit, Ok I see it's just on the line y=0, which is caused by the top-to-bottom reflection code. We'd have to add an IF statement somewhere to catch this, but that is do-able. I'll see if I can diagnose that gnarled bug next as this could affect which version we convert next to 6502 - the diagonal-batch version or the vline batch version. Would you mind converting the vline-batcher version (or the 45 degree version below) to see how fast it is? Can we beat the MOS ellipse routine for speed (even with that EOR at y=0 bug for now)?

Edit: I've investigated the "gnarled" bug and I think it's probably because of the way DRAW works, Unfortunately it does not break a line segment into equal-length horizontal chunks Example. This means it probably can't be trusted to do the trick I wanted it to do, except for lines which are oriented at strict multiples of 45 degrees. Hence this version is possibly the best I can do as it only aggregates line segments which are oriented at multiples of 45 degrees, and so hopefully avoids the gnarled edge problem.

I'll try to investigate and fix that GCOL3,X bug in the diagonal batcher algorithm next... Edit, Ok I see it's just on the line y=0, which is caused by the top-to-bottom reflection code. We'd have to add an IF statement somewhere to catch this, but that is do-able. I'll see if I can diagnose that gnarled bug next as this could affect which version we convert next to 6502 - the diagonal-batch version or the vline batch version. Would you mind converting the vline-batcher version (or the 45 degree version below) to see how fast it is? Can we beat the MOS ellipse routine for speed (even with that EOR at y=0 bug for now)?

Edit: I've investigated the "gnarled" bug and I think it's probably because of the way DRAW works, Unfortunately it does not break a line segment into equal-length horizontal chunks Example. This means it probably can't be trusted to do the trick I wanted it to do, except for lines which are oriented at strict multiples of 45 degrees. Hence this version is possibly the best I can do as it only aggregates line segments which are oriented at multiples of 45 degrees, and so hopefully avoids the gnarled edge problem.

- TobyLobster
**Posts:**216**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

### Re: Circle floodfill ideas?

I'll take a look at converting the latest to 6502.

That is unfortunate. However, the DRAW command does seem to split the line into two equal parts if the difference in Y is only one, and the length along X is even. (And similarly with X and Y reversed), e.g. here Perhaps we can use this?

### Re: Circle floodfill ideas?

TobyLobster wrote: ↑Mon Nov 29, 2021 2:11 pmI'll take a look at converting the latest to 6502.

That is unfortunate. However, the DRAW command does seem to split the line into two equal parts if the difference in Y is only one, and the length along X is even. (And similarly with X and Y reversed), e.g. here Perhaps we can use this?

I think we'll get diminishing returns from that, as it only covers fairly short lines (2 pixels upwards at a time). Also we were already starting to get diminishing returns chasing after fast routines to do more and more specific gradients - because on average something like 50% of lines we require will have a gradient with magnitude >1 (that is covered by all of the vline groupings), and for the lines with shallow gradients (<1) we could divide some of those up into gradients of 45 degrees (we've got those covered), and hline stubs (we've got those covered). The rest, I hope, are fairly infrequent. Also, my code is getting rather messy now, which is disappointing, so I'm not keen on adding any more edge cases. I think the code will need a tidy it up in future, and would welcome any wisdom there! But hopefully, as it is, we can get the time currently spent on MOS routines (55%) down significantly with the vline and 45 degree line groupings? Anyway if you can code it then great, but beware that the code may need tidying up again in the near future (hopefully only minor tweaks from then on?). Still hoping we can beat the Plot&C5 ellipse command for speed! Mike.

- TobyLobster
**Posts:**216**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

### Re: Circle floodfill ideas?

I've coded the latest version (in a 'diagonals' branch here) and sadly it's slightly slower than our previous version. I think the extra calculations required are cancelling out the benefit of fewer calls for the diagonal lines. Perhaps we have reached the limit of what is possible for "OS Legal". There are still some improvements that can speed up the 6502 implementation which I will think about.

- TobyLobster
**Posts:**216**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

### Re: Circle floodfill ideas?

Ah - possibly my code isn't working properly yet... I'm investigating.

### Re: Circle floodfill ideas?

Ok, thanks for trying. Fingers crossed we get an improvement.

I just ran your latest version at https://bbc.godbolt.org/?autoboot&disc= ... LLIPSE.SSD

Then typed MOVE 40,40, and it drew 2 coloured dots onto the screen. Very strange!

If we don't beat the built-in ellipse routine, then it's hats off to the original MOS programmers. They must have done it under more time-pressure than we have.

I just ran your latest version at https://bbc.godbolt.org/?autoboot&disc= ... LLIPSE.SSD

Then typed MOVE 40,40, and it drew 2 coloured dots onto the screen. Very strange!

If we don't beat the built-in ellipse routine, then it's hats off to the original MOS programmers. They must have done it under more time-pressure than we have.

- TobyLobster
**Posts:**216**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

### Re: Circle floodfill ideas?

Ok, I fixed it, and it has improved the speed. Latest timings show we now have about the same duration as the Acornsoft Graphics Extension Rom (GXR) as used in OWLET e.g. here, taking about 63/64 centiseconds for our test ellipse.

The GXR version (back in the day) was taken and improved to become the version in the Master MOS. The Master MOS version is still faster than ours (46 centiseconds for our test ellipse, as opposed to the 63 centiseconds).

Coloured ellipse, matching the colours of the BASIC version: Regular ellipse, slightly faster due to not changing colours: Source is here

The GXR version (back in the day) was taken and improved to become the version in the Master MOS. The Master MOS version is still faster than ours (46 centiseconds for our test ellipse, as opposed to the 63 centiseconds).

Coloured ellipse, matching the colours of the BASIC version: Regular ellipse, slightly faster due to not changing colours: Source is here

### Re: Circle floodfill ideas?

Great work. I measure 62 centi-seconds for the plain-colour version of your code, versus 63 for the PLOT command. That's not a big difference but we have the handicap of calling the OS repeatedly from our code which they don't. Still, I'm surprised we are still not thrashing it because I presume they are computing a 40bit multiplication at every step and a 40bit sqrt at every step? Is that the case for the OS routine? Plus, to rub salt into the wound, I suspect their code length will be shorter than ours (as they'll just reuse their pre-existing in multiplication and sqrt routine). Do you have a link to their ellipse code within your MOS dissasembly?

What proportion of the runtime of our latest version is now spent on OS calls? Previously it was 55% wasn't it?

I've looked at your assembly code and I can't see any speed-ups possible (although my BASIC source code could maybe be tweaked a bit to squeeze a bit more efficiency perhaps? For example, those *2 constants could be removed by rearranging the maths a little, but I think the benefit will be minimal.) Thank you for collaborating on this - it's been fun, even if it's not reaching the ManicMiner 2021 levels of glory.

What proportion of the runtime of our latest version is now spent on OS calls? Previously it was 55% wasn't it?

I've looked at your assembly code and I can't see any speed-ups possible (although my BASIC source code could maybe be tweaked a bit to squeeze a bit more efficiency perhaps? For example, those *2 constants could be removed by rearranging the maths a little, but I think the benefit will be minimal.) Thank you for collaborating on this - it's been fun, even if it's not reaching the ManicMiner 2021 levels of glory.

- TobyLobster
**Posts:**216**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

### Re: Circle floodfill ideas?

A while back I looked at disassembling the GXR ROM, but didn't get very far with the ellipse code before moving on to something else. However, they are doing something along the same lines we are with an integer based routine. The outline ellipse code starts at &B9EA for anyone wanting to look. I think there is what could be a 40bit x 40bit multiply in initialisation code used for both circles and ellipses at &BEF6. The ellipse code only draws individual pixels, not line segments, using an internal OS routine to do just that. So they do skip the overhead of calling the OS multiple times.

There's certainly ways to make our code shorter, at the expense of speed though. e.g. The two halves of the ellipse are essentially running the same code with two different sets of data. So this could be one routine. There's also the possibility of encoding the arithmetic commands as bytes of data that gets read and interpreted. That would save memory, but with the speed penalty of decoding the data.

I've only disassembled the standard BBC Micro MOS, and that doesn't have an ellipse routine. Only the GXR and Master MOS have ellipse routines, and I've not looked at the Master MOS.

Without the call the the OS it takes 26 centiseconds, 62 centiseconds with. So about 58%. I would have thought it would go down, being fewer calls to the OS, but I guess my optimisations to the 6502 code evened that percentage change out. But the overall time has come down, which is the goal.

As mentioned, a much faster version for games would be possible with poking directly to the screen memory, using more zero page memory etc. I'll leave that as an exercise for the reader. Also for the enthusiastic reader, further development includes filled ellipses, pattern filling, ellipse arcs, chords, etc.

Thank you - I've very much enjoyed it too.mike12f wrote: ↑Fri Dec 03, 2021 4:20 pmI've looked at your assembly code and I can't see any speed-ups possible (although my BASIC source code could maybe be tweaked a bit to squeeze a bit more efficiency perhaps? For example, those *2 constants could be removed by rearranging the maths a little, but I think the benefit will be minimal.) Thank you for collaborating on this - it's been fun, even if it's not reaching the ManicMiner 2021 levels of glory.

### Re: Circle floodfill ideas?

I've just ran a little investigation into the count of the number of DRAW commands called for the various versions of this algorithm (for that one particular ellipse we've been testing on):

With just aggregating hlines, there were 406 DRAW calls needed.

With aggregating vlines+hlines, there were 182 DRAW calls needed

With aggregating vlines+hlines+45degree diagonals, there were 140 DRAW calls needed

With aggregating vlines+hlines+any diagonal orientation (with the gnarled appearance bug), there are 132 DRAW calls needed.

So there clearly are diminishing returns appearing there. Just aggregating hlines+vlines gave the biggest bang for buck.

As calling the OS currently takes 34 centiseconds of the run time, and I don't think we can lower the number of DRAW calls much below 140 calls, we therefore can never reduce that 34 centi-seconds any further.

With just aggregating hlines, there were 406 DRAW calls needed.

With aggregating vlines+hlines, there were 182 DRAW calls needed

With aggregating vlines+hlines+45degree diagonals, there were 140 DRAW calls needed

With aggregating vlines+hlines+any diagonal orientation (with the gnarled appearance bug), there are 132 DRAW calls needed.

So there clearly are diminishing returns appearing there. Just aggregating hlines+vlines gave the biggest bang for buck.

As calling the OS currently takes 34 centiseconds of the run time, and I don't think we can lower the number of DRAW calls much below 140 calls, we therefore can never reduce that 34 centi-seconds any further.

- TobyLobster
**Posts:**216**Joined:**Sat Aug 31, 2019 7:58 am-
**Contact:**

### Re: Circle floodfill ideas?

It is possible to draw that example ellipse in 65 DRAW calls, here, but working out the exact lines to draw to remain a 'pixel perfect' ellipse in general is not known, at least not to me.

### Re: Circle floodfill ideas?

That's interesting. Is there a fast algorithm we could use to decide those points?