Starting a Prince of Persia port...

Got a programming project in mind? Tell everyone about it!
User avatar
Rich Talbot-Watkins
Posts: 1108
Joined: Thu Jan 13, 2005 5:20 pm
Location: Palma, Mallorca

Re: Starting a Prince of Persia port...

Postby Rich Talbot-Watkins » Tue Nov 14, 2017 1:49 pm

I still feel like there should be no need for the double buffer. Even if the player plot is 140 scanlines, and the save/restore background 40 scanlines each, that still comes within the frame budget - so with raster timing so that you start the erase when the beam hits the screen block to restore gives you 312 scanlines to erase, plot the player, and plot the foreground. The flicker problem will always arise if there's too much time between erasing and replotting the player - I have no idea if this would require big code restructuring, but you should always be aiming for minimal time between those two.

I haven't considered other animated things going on so far (e.g. an opponent) or moving/falling scenery tiles. If no bounding boxes are overlapping, these could also be erased/replotted in quick succession, minimising the flicker some more. Overlapping moving graphics would be a tricky problem (would involve handling them as a single batch - erase all then replot all) but not impossible to do.

As long as the player looks nice, I personally wouldn't be offended by a bit of flicker elsewhere (we all played and enjoyed Chuckie Egg after all!).

So this is my suggestion I guess: secret option #3 - single buffered, rework the low-level rendering code. Not saying it's an easy option, but, until convinced otherwise, I wouldn't rule it out just yet. The advantage of PoP is that, for the most part, there's only one moving sprite, with entirely static scenery throughout. You could still avoid the foreground scenery overplot by treating pixel bit 3 as 'foreground' which the sprites don't overwrite - I'm presuming that'd save you another 40 scanlines too.

crj
Posts: 225
Joined: Thu May 02, 2013 4:58 pm

Re: Starting a Prince of Persia port...

Postby crj » Tue Nov 14, 2017 4:18 pm

If there really is no way to make the code more compact and it really is only 2K you need, you could just shrink MODE 2 by a little instead of resorting to MODE 5? Shaving 32 pixels off the bottom would give you 2560 bytes. Or you could shave just 16 pixels and have 1280 bytes in each of main and shadow memory.

Alternatively, if I understand what you're up to properly, maybe you could use the whole of Hazel? You're not going to be double-buffering during disc accesses, so when you want to touch disc copy your code out of Hazel into the spare display buffer and re-initialise ADFS.

Of course, if you don't mind adding extra hardware such as a second processor, or a cartridge stuffed with RAM life becomes easy. Or, actually, more in the spirit of retro kit, ship the game as a sideways ROM that presents as one 16K bank but is actually much larger and pages bits of itself in and out as needed.

User avatar
kieranhj
Posts: 500
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK

Re: Starting a Prince of Persia port...

Postby kieranhj » Tue Nov 14, 2017 5:21 pm

crj wrote:If there really is no way to make the code more compact and it really is only 2K you need, you could just shrink MODE 2 by a little instead of resorting to MODE 5? Shaving 32 pixels off the bottom would give you 2560 bytes. Or you could shave just 16 pixels and have 1280 bytes in each of main and shadow memory.

Unfortunately I need closer to 8K based on my estimations. There is around 2K sloshing about in the system but very hard to utilise as in small chunks spread across all the banks. It could be possible to shave a few columns off the left & right side of the screen to save a bit of RAM, although will add some complication to the clipping as the background plotting all assumes unclipped for speed (moving sprites would all be OK.) Ditto top & bottom - it's 192 lines across 3 corridors so 64 lines per corridor, so could shave a bit off the top/bottom or remap the data to be <64 lines.

crj wrote:Alternatively, if I understand what you're up to properly, maybe you could use the whole of Hazel? You're not going to be double-buffering during disc accesses, so when you want to touch disc copy your code out of Hazel into the spare display buffer and re-initialise ADFS.

It looks like on a Master pages &C000 & C100 of HAZEL contain the DFS catalog. As JGH noted page &C200 gets trashed by DFS during OSWORD &7F. It could be possible to use these 3x pages for data during level gameplay that isn't carried over between level load. Page &DF00 contains FS configuration so presuming (?) file loading will be unhappy if I trash this.

crj wrote:Of course, if you don't mind adding extra hardware such as a second processor, or a cartridge stuffed with RAM life becomes easy. Or, actually, more in the spirit of retro kit, ship the game as a sideways ROM that presents as one 16K bank but is actually much larger and pages bits of itself in and out as needed.

The C64 version did this, as I understand it. My goal is to create a single DFS floppy disc that runs on a vanilla BBC Master. I believe this is possible, just a question of feature tradeoff. :)

User avatar
kieranhj
Posts: 500
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK

Re: Starting a Prince of Persia port...

Postby kieranhj » Tue Nov 14, 2017 5:44 pm

Rich Talbot-Watkins wrote:I still feel like there should be no need for the double buffer. Even if the player plot is 140 scanlines, and the save/restore background 40 scanlines each, that still comes within the frame budget - so with raster timing so that you start the erase when the beam hits the screen block to restore gives you 312 scanlines to erase, plot the player, and plot the foreground. The flicker problem will always arise if there's too much time between erasing and replotting the player - I have no idea if this would require big code restructuring, but you should always be aiming for minimal time between those two.

I have no doubt that you're right - it should be possible to make the player flicker free with raster timing and other tricks (like foreground bit 3) but all my investigations and experiments feel like I'm fighting the original architecture of the code, which is set up to be double buffered. The moment I turned it on was like a revelation - once I'd done the hard graft of refactoring memory usage it just worked and is pleasingly rock solid. I know I'm being mean as I haven't shared it yet for everyone to judge for themselves but I'm finding it hard to go back to any flicker after getting this far! (I still have a bug with the buffer swapping implementation as occasionally there is a small bit of tearing.)

Rich Talbot-Watkins wrote:I haven't considered other animated things going on so far (e.g. an opponent) or moving/falling scenery tiles. If no bounding boxes are overlapping, these could also be erased/replotted in quick succession, minimising the flicker some more. Overlapping moving graphics would be a tricky problem (would involve handling them as a single batch - erase all then replot all) but not impossible to do.

As long as the player looks nice, I personally wouldn't be offended by a bit of flicker elsewhere (we all played and enjoyed Chuckie Egg after all!).

Yeah, the player is definitely the most important thing on the screen, which is why I'm also loathed to reduce the resolution of the sprites. The draw loop is already sorted and batched into passes to avoid the overlapping sprites problem (which will occur whenever the player has a sword fight with a guard) so the big low-level rendering change would be to split this up to do localised batches, i.e. erase and redraw everything (in layers) affected by the region surrounding each animated object in turn, combining the regions together for overlapping sprites. I will continue to ponder this possibility but looking at how much there is still to port I'm finding it hard to continue down that path, being honest.

Rich Talbot-Watkins wrote:So this is my suggestion I guess: secret option #3 - single buffered, rework the low-level rendering code. Not saying it's an easy option, but, until convinced otherwise, I wouldn't rule it out just yet. The advantage of PoP is that, for the most part, there's only one moving sprite, with entirely static scenery throughout. You could still avoid the foreground scenery overplot by treating pixel bit 3 as 'foreground' which the sprites don't overwrite - I'm presuming that'd save you another 40 scanlines too.

Definitely not easy. :wink: It might have been easier to start with the PC rendering engine, if it wasn't in C, as the Apple II version has many quirks that are a consequence of the strange artefact colour hi-res mode. For instance the foreground components are drawn twice with the second layer being ORA'd on top of the first to alter the pixel parity. This doesn't make sense for MODE 2 but trying to surgically alter that part of the code has resulted in various unintended rendering artifacts whenever I've messed with it.

crj
Posts: 225
Joined: Thu May 02, 2013 4:58 pm

Re: Starting a Prince of Persia port...

Postby crj » Tue Nov 14, 2017 8:35 pm

kieranhj wrote:all my investigations and experiments feel like I'm fighting the original architecture of the code, which is set up to be double buffered

OK. Having put on my 8-bit Acorn hacker hat and suggested tricks for scavenging more RAM from the Master, I'll now take it off and instead put on my computer sciencist hat...

I've not seen this code, but I assume that there's a large pool of sprites to plot, and a relatively small number of routines which do that plotting. I also assume that the amount of foreground is a small proportion of the toal screen area.

If both those conditions are true, you could get sneaky. When plotting a sprite, don't actually write the data to screen memory. Instead, have two instances of a data structure that is:
  • A linear buffer of blocks (screen address+size+mask+data) to be written to the screen
  • A binary min-heap of pointers to those blocks, sorted by screen address
For each frame, do the following:
  • Perform all your sprite writes into the this-frame structure
  • Wait for Vsync
  • Commit the this-frame structure to screen in screen-address order.
  • As you commit each block, save the data you overwrite as a block in the next-frame structure
  • Swap which structure you're using as this-frame and which as next-frame
That should be a (very) efficient way to get a mixture of removing the previous frame's sprites and applying this frame's sorted into a flicker-free order. And the resulting data structure would probably be a lot smaller than a second bank of screen memory.

The commit routine would be the only thing that had to access screen memory. If you placed the structures in low memory, you could run most of the time with shadow memory in place, and code anywhere could add things to the this-frame structure.

If that basically worked but wasn't compact enough, you could finesse things:
  • Put the this-frame and next-frame heaps in the same block of memory, one ascending and the other descending.
  • Only store masks when necessary.
  • Directly refer to sprites rather than copying them into the buffer whenever they're suitably aligned and in appropriate memory.

User avatar
kieranhj
Posts: 500
Joined: Sat Sep 19, 2015 10:11 pm
Location: Farnham, Surrey, UK

Re: Starting a Prince of Persia port...

Postby kieranhj » Wed Nov 15, 2017 5:05 pm

crj wrote:OK. Having put on my 8-bit Acorn hacker hat and suggested tricks for scavenging more RAM from the Master, I'll now take it off and instead put on my computer sciencist hat...

I've not seen this code, but I assume that there's a large pool of sprites to plot, and a relatively small number of routines which do that plotting. I also assume that the amount of foreground is a small proportion of the toal screen area.

If both those conditions are true, you could get sneaky. When plotting a sprite, don't actually write the data to screen memory. Instead, have two instances of a data structure that is...

Hey crj, thank you for the suggestions. This is very interesting and could potentially lend itself to the way that PoP structures its drawing code anyway. Everything is placed into image (display) lists by the gameplay code as required then a separate rendering function (in a separate memory bank) performs the plot operations and is the only code that actually touches the screen memory. I will continue to ponder how this approach might fit in and help matters.

The challenge I have is that to reduce flicker I need to restore, store & plot the player (moving) sprites as quickly as possible, so in a localised manner, rather than doing everything in passes. To do this I would need to flag all plot operations that overlap the bounding box of the moving sprite - I do have the bounding information easily available. In order to keep the screen looking correct I still need to obey the plot operation order so - restore, wipes, background, store, moving (mid) sprites, foreground. This could be possible on a localised way but is more complicated (and lengthy) than just a restore, store, plot.

Lastly any overlapping moving sprites would have to be combined together into a single group - this happens regularly and I believe all the operations will take longer than a single frame to complete, bringing the flicker back.

E.g. player standing in front of a torch on the wall sword fighting a guard who is standing behind a pillar. (This happens on level 1.) The swords are both considered separate moving objects, believe it or not, so there would be 4x mid sprites all overlapping. To ensure all plotting happens correctly the order has to be: restore x4, plot background (torch frame), store + plot mid sprites left to right x4, plot foreground pillar.

The only way I see any of the single frame solutions coming close to working would be to not draw foreground sprites ever so would have to use the top-bit as foreground approach. I did investigate this previously but struggled to implement last time; I will take another look.

User avatar
tricky
Posts: 1875
Joined: Tue Jun 21, 2011 8:25 am
Contact:

Re: Starting a Prince of Persia port...

Postby tricky » Wed Nov 15, 2017 6:31 pm

You and stardot got a mention on "The Retro Hour" podcast, they seemed very impressed that PoP could be done on a beeb and said that they should do more episodes on the beeb as it was such a big part of the UK retro computer history.

sbadger
Posts: 218
Joined: Mon Mar 25, 2013 1:12 pm
Location: Farnham, Surrey

Re: Starting a Prince of Persia port...

Postby sbadger » Wed Nov 15, 2017 8:33 pm

tricky wrote:You and stardot got a mention on "The Retro Hour" podcast, they seemed very impressed that PoP could be done on a beeb and said that they should do more episodes on the beeb as it was such a big part of the UK retro computer history.


Not come across this before.. sounds good. What episode was the one you mention?
Thanks
A3020 | BBCBx3 | Electrn | Masterx3 | RPix3
A600 | C64 bbin x2| C64 C | Toastrack | XB360 | GB | GBC | GBA | GBASP | DS | 3DS XL x2| MD | MS
Atari 7600 | PS1-2-3-4 | PSP | Vita | SNES | GC | N64 | Wii & U | Switch | Jamma Cab | Sony PVMx2

crj
Posts: 225
Joined: Thu May 02, 2013 4:58 pm

Re: Starting a Prince of Persia port...

Postby crj » Thu Nov 16, 2017 3:10 am

kieranhj wrote:The challenge I have is that to reduce flicker I need to restore, store & plot the player (moving) sprites as quickly as possible, so in a localised manner, rather than doing everything in passes. To do this I would need to flag all plot operations that overlap the bounding box of the moving sprite

OK. Here's an alternative:

Allocate a bunch of 64-byte blocks. I'm not sure how many you'd need, but since you want to claw back 8Kbytes you'd probably be hoping to need less than about 10K's worth. The blocks must be at even addresses; putting them on 64-byte boundaries kinda makes sense, though. When I talk about a block address being "null", I mean the high byte is zero.

There are several kinds of block.

A free block is not currently in use. It simply contains a pointer to the next free block. (If the pointer is null, it's the last free block.)

A revert block represents a 32-byte fragment of screen memory which will need to be reverted to the background image during the next update. The first 32 bytes are the background data; the second 32 bytes are unused

A modify block represents a 32-byte fragment of screen memory which is to be changed. The first 32 bytes are the background data; the second 32 the data to be put there

A tree block represents a kilobyte of screen memory. It contains 32 entries, one for each 32 bytes of that kilobyte. (32*32=1024, of course.) The entry is:
  • Null: no block for this 32 bytes
  • Odd address: addr & ~1 is a revert block
  • Even address: a modify block
At the root, have a dedicated 60-byte structure. For each kilobyte of screen memory, have a count and an address. The count is the number of revert and/or modify blocks in that kilobyte, and address is a tree block if, and only if count>0. If count=0, the address is null.

So. You're preparing the next frame. If you want a byte to be written to screen memory at address addr next frame, do something like:

Code: Select all

ByteAddr getAddressOf(ScreenAddr addr) {
  k = addr>>1024; // Kilobyte
  f = (addr>>5)&0x1F; // Fragment
  b = addr & 0x1F; // Byte within fragment

  tBlock = rootNode[k].tBlock;
  if (!tBlock) {
    tBlock = getFreeBlock();
    fillWithZeros(tNode);
    rootNode[k].tNode = tNode;
    rootNode[k].count = 0;
  }

  mBlock = tBlock[f];
  if (!mBlock) {
    mBlock = getFreeBlock();
    ++rootNode[k].count;
    copyScreenToBackground(mBlock,k,f);
    mBlock ^= 1;
  }

  if (mBlock&1) {
    mBlock |= 1;
    copyBackgroundToForeground(mBlock);
    tBlock[f] = mBlock;
  }

  return mBlock[b];
}

Block getFreeBlock() {
  assert(first_free); // Otherwise, we're out of memory!
  result = first_free;
  free_head = result.next_free;
}


This could obviously be optimised when writing consecutive bytes by keeping hold of mBlock and tBlock and skipping a lot of checks. The amount of state you need to hold is small, so you could comfortably simultaneously keep state for writing to several adjacent rows of screen memory if that helped with sprite plotting. (NB: Every row would step into a new modify block simultaneously, but they'd step into new tree blocks at different positions.)

Having prepared a frame:

Code: Select all

function frameToScreen() {
  waitForVsync();
  for (k=0; k<20; ++k) {
    if (tBlock = rootNode[k].tBlock) {
      for (f=0; f<32; ++f) {
        if (block = tBlock[f]) {
          if (block&1) {
            rBlock = block & ~1;
            copyBackgroundToScreen(rBlock, k, f);
            releaseBlock(rBlock);
            tBlock[f] = NULL;
            if (--rootNode[k].count==0) {
              releaseBlock(tBlock);
              rootNode[k] = NULL;
            }
          } else {
            mBlock = block;
            copyForegroundToScreen(mBlock, k, f);
            mBlock |= 1;
          }
        }
      }
    }
  }
}

function releaseBlock(block) {
  block.next_free = first_free;
  first_free = block;
}


Note I've no idea if that can be represented efficiently enough in 6502 assembler, but I'm not seeing anything inherently tricky for an 8-bit CPU to do.

crj
Posts: 225
Joined: Thu May 02, 2013 4:58 pm

Re: Starting a Prince of Persia port...

Postby crj » Thu Nov 16, 2017 3:51 am

Note, incidentally, that this is almost equivalent to triple-buffering. At any given moment, you have access to what's on screen, the background without any foreground sprites, and what's going to be on screen next frame.


Return to “projects”

Who is online

Users browsing this forum: No registered users and 2 guests