# ------------------------------------------------------------------- # OBJECTPROCESSOR (c) Copyright 1995-1996 Nat! & KKP # ------------------------------------------------------------------- # These are some of the results/guesses that Klaus and Nat! found # out about the Jaguar. Since we are not under NDA or anything from # Atari we feel free to give this to you for educational purposes # only. Thanks to NEUROMANCER for many worthy corrections and # the GPU-object info. # # Please note, that this is not official documentation from Atari # or derived work thereof (both of us have never seen the Atari docs) # and Atari isn't connected with this in any way. # # Please use this informationphile as a starting point for your own # exploration and not as a reference. If you find anything inaccurate, # missing, needing more explanation etc. by all means please write # to us: # nat@zumdick.rhein-main.de # or # kkp@gamma.dou.dk # # If you could do us a small favor, don't use this information for # those lame flame-wars on r.g.v.a or the mailing list. # # HTML soon ? # ------------------------------------------------------------------- # $Id: op.html,v 1.28 1997/03/30 02:27:13 nat Exp $ # # If there are two theories I put the more likely one first. # -------------------------------------------------------------------
Things to know about the Objectprocessor (OP): ============================================== -1 Imagine a phrase being an entity of 64 bits (or 8 bytes for that matter). 0. The object list is a linked list. 1. The object list is traversed by the object processor for each! scanline. 2. The Objectprocessor probably works like this: Whenever a new linebuffer needs to be filled, the OP is called to do its chore, while the videosystem is busy displaying the other linebuffer. The OP does its work by traversing the objectlist and interpreting each object in sequence. Each object has per linebuffer the chance ONCE to fill the linebuffer. (Note: that this does not mean necessarily per scanline, since with the special HDB2-mode it can happen that two linebuffers are used for each scanline!) It fills the linebuffer at a specified horizontal position for a specified width. The data in the linebuffer is always overwritten (except when the Read-Modify-Write bit is set). If the active object has the transparent bit set, it will not overwrite values in the linebuffer when its source pixel has the value zero. The 'transparency' check is done before looking up the pixel's color in the CLUT (1 - 256 color modes). 2.1 The sooner a object appears in the list the more in the background it appears. The linebuffer is initialized by the video chip with the linebuffer-backgroundcolor (BG) before the OP starts filling the linebuffer. One may also assume that the OP normally traverses the linebuffer from left to right, except when the horizontal flip bit is set. (Very useful information indeed! (har) ) Each bitmap object is made up of pixels. These pixels can be either contain the color itself (direct) as in CrY and True-Color modes or be an index into a Colorlookuptable (indirect). 2.2 We assume that the OP writes into the linebuffer locally, so that the object-data is read over the bus, but not written into the linebuffer over the bus (which would be way evil) 2.3 If all these theories are true, then the OP has on the average one scanline time to prepare the linebuffer. (In a setup where one linebuffer is used per scanline) 2.4 The videosystem can deal with 16bit RGB/CrY-color and 24bit RGB pixels, the size of the pixels the OP writes into the linebuffer and pulls out of the CLUT, depends on the pixel-type chosen for the videosystem. 2.5 The object in the objectlist are *modified* by the OP. This means that an object list is only good for one frame. You need to continually refresh your object list each VBLANK. 3. The last object must be a STOP object. 4. The Objectlist must be double-phrase aligned. This means that the lower nybble of the address must be zero. (Maybe this is wrong and it is just object alignment that you should take into account) 5. The address of the image of an object must be (as expected) phrase aligned (zero in the lower 3 bits) 6. There are five different objects that the Objectprocessor knows about. These are: 1. Bitmapped Object 2. Scaled bitmapped object 3. GPU-Object (interrupts the GPU) 4. Branch object 5. Stop object (marks the end of the object list) The objects have different sizes. The minimum size of an object is a "phrase". Also note the alignment constraints. Object type Number Size in phrases Alignment in phrases ------------------------------------------------------------- BITMAP 0 2 2 SCALE 1 3 4 !! GPU 2 1 1 BRANCH 3 1 1 STOP 4 1 1 7. To keep the Objectprocessor from fetching data (and wasting bandwidth) during the VBLANK you usually put two branch objects at the beginning of the display list, that branch to the stop object if the first displayable scanline has not been reached or the last displayable scanline has already been displayed. 7.1 The OP mustn not take than a scanlines worth of time to process the object list, else the display tears. (If using a single linebuffer per scanline) 8 The OP usually hogs the bus, when doing data transfers, since it is normally the most highly priorised (interesting) device on the bus. 9 In the special mode where two linebuffers are used for each scanline, you should remember that the OP executes the object list twice. That will give you quite some headaches. For example sprites crossing the "boundary" will have to be split in two objects, which will be really painful, if those sprites are scaled objects. Look for the branch object about an idea how to setup separate lists, for each linebuffer. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 9 Your friendly OP-registers ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: RW: OLP ($F00020) ~~~~~~~~~~~~~~~~~ 32 28 24 20 16 12 8 4 0 +--------^---------^---------^--------+--------^--------^--------^--------+ | low_word | high_word | +-------------------------------------+-----------------------------------+ low_word: high_word: The address of the object list. The 32 bit address is word swapped. So you gotta store it like this: move.l #objlist,d0 swap d0 move.l d0,OLP It seems a good idea to set this on every VBL. (My programs run more predictable this way) RW: OB ($F00010) ~~~~~~~~~~~~~~~~ 32 28 24 20 16 12 8 4 0 +--------^---------^---------^--------^--------^--------^--------^--------+ 0 | object-data | +-------------------------------------------------------------------------+ 64 60 56 52 48 44 40 36 32 +--------^---------^---------^--------^--------^--------^--------^--------+ 1 | object-data | +-------------------------------------------------------------------------+ object-data: This is used to pass data/pointer to the GPU when using a GPU object. Lord knows what the second phrase is for... R: OBF ($F00026) ~~~~~~~~~~~~~~~~ 32 28 24 20 16 12 8 4 0 +--------^---------^---------^--------^--------^--------^--------^-----+--+ | data :f | +----------------------------------------------------------------------+--+ data + flag (f): The STOP objects' data field is copied here. flag (f): The object processor flag. You can hook up an IRQ (Level 2) (?) to this bit, which can in turn serve to interrupt the GPU and the 68K (and possibly also the DSP). This can be used to generate HBLANK-like interrupts, although the STOP does seldom occur in the blanking period of the video chip, but much sooner! ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 10 This is what a branch object looks like: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Phrase #0: 63 56 48 40 32 24 16 8 3 0 +--------^---------^-----+---^--------^--------+--------+--+-----^----+---+ | unused | Link-address | unused |CC| VCnt |011| +------------------------+---------------------+--------+--+----------+---+ 42..........24 23..16 15.14 13...3 2..0 21bits 8bit 2bit 11bits 3bits The branch objects are used to compare the current scanline with the value stored in the branch object. Depending on the branch instructions comparison mode, the branch is taken either on < == != or >. The taken branch taken uses the information from the Linkinfo and branches to the phrase-indexed object. If the comparison fails it simply examines and handles the next object in the list. Link-address: See the bitmapped object for more infos on the link address. VCnt: This is the value you compare the vertical scanline counter with (VC). For CC code 10 the operation goes: if( object->YCnt < VC) goto object->link; Condition codes (CC): Values Comparison/Branch -------------------------------------------------- 000 Branch on equal (VCnt==VC) 001 Branch on less than (VCnt>VC) 010 Branch on greater than (VCntHC in the video chip (maybe for every scanline (?), you can branch when the OP detects, that it is filling the second linebuffer. Other theory: CC is 3 bits long and there exists a fifth value: 100 Branch if on second halfline ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 11 This is what a stop object looks like: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Phrase #0 (1 of 1): 63 56 48 40 32 24 16 8 0 +--------^---------^---------^--------^--------^--------^--------^----+---+ | data |100| +---------------------------------------------------------------------+---+ data: Data is copied into the object status register. The lowest bit can be used to trigger IRQs, the rest of can be used at the programmers whim. ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: 12. This is what a bitmap object looks like: ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: Phrase #0 (1 of 2): 63 56 48 40 32 24 16 8 0 +--------^---------^-----+------------^--------+--------^--+-----^----+---+ | data-address | Link-address | Height | YPos |000| +------------------------+---------------------+-----------+----------+---+ 63 .............43 42.........24 23....14 13....3 2.0 21 bits 19 bits 10 bits 11 bits 3 bits (11.8) data-address: Pointer to the bitmap ***DESTROYED BY THE OP*** link-address: Pointer to the next object height: Height in pixels y-pos: Vertical position ***DESTROYED BY THE OP*** type: Object type data-address: bits 63-43 An address is a memory address in terms of phrases. To get the byte address you have to shift it up by 3. (or in this example to get the data-address you would fetch the upper lword with the 68K and do): move.l (a0),d0 ; fetch it (bits 63-32) moveq #11,d1 ; or some other less lame way lsr.l d1,d0 ; shift it down for phrase address lsl.l d1,d0 ; shift it up for byte address link-address: bits 42-24 The link address strings the object list together. So it really is a linked list, not just an array. OK an array would have been better and the link could have been a number of phrases to skip. It misses the upper two bits two form a proper full 24 bit address. This means that objects must reside in the lower 4 MB. This is addresses a phrase also, not a byte. For the byte address shift it up by three. height: The height of the object is also stored in the first phrase. This is the number of pixels an object has in it vertical extent. ypos: The YPos is predictably the vertical position of the object on the screen. The vertical position is the halfline vertical position. In video terms the first theoretically possible _visible_ position (depending on your overscanning) will be at VDB See Video Therefore for non interlaced screens this value is Y * 2 + VDB, for interlaced just Y + VDB. Theory 1: Like on the Falcon the screen is divided into two horizontal halflines. Except for really wide screens in excess of 1024 pixels horizontally, you always stay in the first halfline. (That's why its eleven bits, and the height is only 10 bits.) A problem with this theory is, that the Xpos field is 12 bits anyway... Theory 2: This means that in interlace mode this is the "true" vertical position on the screen. In non-interlaced modes (non-flicker) modes, you should multiply your Y-Pos by two and stuff that into the object. (That's why its eleven bits, and the height is only 10 bits.) type: Lastly the object type indicates with a 0 (000) that this object is a normal non-scaled bitmap object. Phrase #1 (2 of 2): 63 56 48 40 32 24 16 8 0 +--------^-+------+^----+----^--+-----^---+----^----+---+---+----^--------+ | unused |1stpix| flag| idx | iwidth | dwidth | p | d | x-pos | +----------+------+-----+-------+---------+---------+---+---+-------------+ 63...55 54..49 48.45 44.38 37..28 27..18 17.15 14.12 11.....0 9bit 6bit 4bit 7bit 10bit 10bit 3bit 3bit 12bit (6.4) Curiously there seem to be some unused bits in the top half of this second phrase. Anyway starting from the left: 1stpix: Pixels to skip flags (flag): How to handle the source data index (idx): Index into the CLUT iwidth: Width of the image dwidth: Offset to the next line of the image pitch: Increment for the Datapointer depth: Pixeldepth of the bitmap x-pos: Horizontal position of the object 1stpix: bits 54-49 this is a field of 6 bits that contains the number of 'bits' to skip before fetching the first pixel. This must be used whenever your bitmap data isn't phrase aligned. Maybe most often used for CLUT modes. You get the value you want to write here by calculating: pixelindex * bits_per_pixel (f.e. 8 for 256 color mode) flags: bits 48-45 You can tell the Objectprocessor the way it should handle the display data. These are the values you set here: Bit3 Bit2 Bit1 Bit0 ---------------------------------------------------------------- Release Transparent ReadModifyWrite Horizontal Flip Horizontal flip / aka Reflect: Lets the Objectprocessor run its path from the other end of the sprite data, which should effectively flip your sprite data. Ex: an eight bit sprite is normally drawn as 01234567 flipped 76543210 ^ | start at XPOS. ReadModifyWrite: The object processor reads the the pixel from the line buffer does something with the bitmap pixel value and the linebuffer pixel value and stores the result back into the linebuffer. For CrY-color the lower byte of the bitmap pixel value is sign extended and added to the lower byte of the linebuffer pixel value, thereby increasing or decreasing (depending on the sign) the intensity of the linebuffer pixel. This is a 'saturating add' meaning that you don't wrap around, but subtractions stick at 0 and additions stick at 255. The cry hues (upper byte) are mangled even more strangely, the effect could (with the right values) be like looking through a colored glass (your bitmap object with the RMW-flag set) onto the background (the other bitmap objects below it) This might be similar to what happens when gouraud-shading. Refer to the blitter docs. Transparent: When the source pixel is zero, this pixel will not be written. This is the way to achieve transparent sprites with the GPU. (Both CLUT and non-CLUT pixels) Release: If cleared then the OP 'hogs' the bus for the time it takes to fetch the scanline data of the object. If this bit is set, then the bustime is shared with other processors. If you have lotsa interrupts going, this might be worthwhile. Should apparently NOT be set on objects with more than 8 bitplanes, probably because then the OP might glitch. index (idx): bits 44-38 Index into the ColorLookUpTable (CLUT) This information is only used for 1 - 2 or 4 bitplane objects, to determine the offset in the CLUT to use. 1 bitplane 2 bitplane 4 bitplane ------------------------------------------------------- iiiiiiii iiiiii0 iiiii00 The value is shifted left once and then used as an index into the CLUT. Note that in 2 + 4 bitplane modes not all bits are in used, because the lower bits are replaced with the pixel value. For example in 4-bits-per-pixel mode pixel #7 and an idx value of 64 gives you an index of (64*2)+7 -> 135 So you preload the CLUT with the colors you want to use, for example green at index #241. When you want to display a small green arrow on the screen (as a pointer) for example you set your object to transparent, and the index to 120. When the object pointer fetches a set pixel, it will write the green value into the linebuffer. iwidth: bits 37-28 Tell the OP how many *phrases* to draw in each line. This is the actual number of phrases to draw, not the horizontal index to index the next line (dwidth). This is probably not just #pixels_to_draw / bits_per_pixel, but rather the number of phrases the object spans. If a 32bit object spans two phrases you should enter a two here. dwidth: bits 27-18 The horizontal phrase offset the OP should use to index to the next line. If you data is laid out in consecutive strips of horizontal data like this: screen : 00000000000 11111111111 22222222222 33333333333 memory
$Id: op.html,v 1.28 1997/03/30 02:27:13 nat Exp $