Psion Organiser II Machine Code.

A few examples of things I needed to make, that may be of use to others.

BYTES: A program to write raw bytes of arbitrary count to arbitrary locations.

There are programs by Jaap Scherphuis and others that mix OPL and machine code, most of which use either POKCNV% or CONV$ to get raw bytes of code into arrays various. I wanted to make a single general purpose program to replace them, and wrote this to teach myself how to use machine code in the Psion Organiser II XP and LZ computers.


      Start of machine code, having passed address of input data array in to register D.
    18  1 2  XGDX           ;Swap D and X registers.
    EC  2 5  LDD $02,X      ;Get destination address...
    DD  2 4  STD $43        ;...Store it in scratch register 2.
    EE  2 5  LDX $00,X      ;Get address of hex data...
    DF  2 4  STX $41        ;...Store it in scratch register 1.
    4F  1 1  CLRA           ;Clear top byte of word.
    E6  2 4  LDAB $00,X     ;Get string length.
    3A  1 1  ABX            ;Add length, point to end of string.
    54  1 1  LSRB           ;Convert length to byte count...
    DD  2 4  STD $45        ;...Store it in scratch register 3.
      Loop, converting hex numbers to single bytes.
    09  1 1  DEX            ;X steps by 1 to point at 2-byte word.
    EC  2 5  LDD $00,X      ;Get digits to combine into one number.
    81  2 2  CMPA $40       ;Test for numbers above 64.
    25  2 3  BCS $02        ;If N-64 passes through zero...
    80  2 2  SUBA $07       ;...Subtract 7.
    C1  2 2  CMPB $40       ;Test for numbers above 64.
    25  2 3  BCS $02        ;If N-64 passes through zero...
    C0  2 2  SUBB $07       ;...Subtract 7.
    48  1 1  ASLA           ;Shift A left.
    48  1 1  ASLA           ;Shift A left.
    48  1 1  ASLA           ;Shift A left.
    48  1 1  ASLA           ;Shift A left.
    C4  2 2  ANDB $0F       ;Clear the top four bits.
    1B  1 1  ABA            ;Add both bytes into A.
    36  1 3  PSHA           ;Stack the resulting byte.
    09  1 1  DEX            ;X steps by 1 of 2 steps.
    9C  2 4  CPX $41        ;Test for start of string.
    22  2 3  BHI $E4        ;If address is more than start, repeat loop.
      Exit loop...
    DE  2 4  LDX $43        ;Get destination address.
    D6  2 3  LDAB $46       ;Get byte count.
      Loop, storing bytes at destination.
    32  1 3  PULA           ;Get stacked byte.
    A7  2 4  STAA $00,X     ;Store it.
    08  1 1  INX            ;Point to next byte.
    5A  1 1  DECB           ;Decrease B.
    26  2 3  BNE $F9        ;If counter is not zero, repeat loop.
      Exit loop...
    DE  2 4  LDX $45        ;Get byte count for return.
    39  1 5  RTS            ;Return to OPL.


To use the machine code, you could use either POKCNV% or CONV$ to get the code (shown in the REM statements in the first of two small procedures below) into an OPL string, but I prefer to translate the whole of that first procedure to an object-code OB3 file (OPL text not included) to put in a pack image. After translation, I open the OB3 file in a hex editor and replace the string of XXXX... with the code bytes. The fit is exact, and XXXX... makes it very easy to see where to put it.

Note that the code returns a byte count. This is essential for completing an OPL string because unlike C there is no string terminator, you need to specify the length directly. This is why string arrays are so useful for code. Much larger buffers (like allocator cells) can be filled too, because the program will write anywhere you tell it to (so be very careful!). The returned byte count can be used as an offset to add to a running total, so a few calls to this program can write a big chunk of code a few bytes at a time.

The second procedure tests the construction of a standard OPL string by giving the answer to the joke. I don't know what it says about me. Maybe I'm a bad man, but after all these years I still find it funny...


    REM CODE:18EC02DD43EE00DF414FE6003A54DD4509EC00814025028007C1402502
    REM CODE:C00748484848C40F1B36099C4122E4DE43D64632A700085A26F9DE4539
    LOCAL C$(58),S$(255),V%(2)
    S$=H$ :V%(1)=ADDR(S$) :V%(2)=A%


    LOCAL S$(36),A%
    S$="Why don't Class War drink Earl Grey?"
    A%=ADDR(S$) :POKEB A%,BYTES:(A%+1,"426563617573652070726F706572207465612069732074686566742E")


Some context as to why this exists... I'd bought several Organiser related things in more than one transaction from a guy called Jason Brown on eBay, and we ended up messaging each other a lot for a while, and part of that was a discussion of machine code, which I knew nothing about at the time. I eventually discovered that it's much easier than I assumed it might be, almost like making extremely compact and durable contructions in Lego. At least, that's the best analogy I can come up with... Maybe writing a Haiku is a better analogy at times, or constructing small electronic circuits with the fewest parts possible. It's an exercise in efficiency.

I'd sent him this:


    LOCAL N%,X%,O%,H%,L%
    N%=LEN(H$) :X%=N%+1
    WHILE N% :H%=L% :O%=X%-N%
     IF N% AND 1 :O%=O%/2
      POKEB A%+O%-1,16*H%+L%

It is fast, and small, and replaces POKCNV% and CONV$ effectively, but he sent me this:

  OLD VERSION. By Jason Brown: a replacement for CONV$ that takes a pointer (address) for a string
  and works in place so on exit, the calling OPL knows where to look for it.
    18  1 2  XGDX           ; Save passed string address
    E6  2 4  LDAB $00,X     ; Load string length
    54  1 1  LSRB           ; B = B/2
    27  2 3  BEQ End        ; If empty string, then return
    E7  2 4  STAB $00,X     ; Update stored string length for OPL
    08  1 1  INX            ; Advance address to start of data
    DF  2 4  STX $41        ; Initialize Read address
    DF  2 4  STX $43        ; Initialise Write address
    37  1 3  PSHB           ; Store num of loops remaining
    DE  2 4  LDX $41        ; Load read pointer
    A6  2 4  LDAA $00,X     ; Load next char
    80  2 2  SUBA $30       ; Subtract %0
    81  2 2  CMPA $0A       ;
    25  2 3  BCS Cont1      ; Skip if less than 10
    80  2 2  SUBA $07       ; Subtract (%A-%0-10=7d)
    C6  2 2  LDAB $10       ; Load 16 into AccB
    3D  1 7  MUL            ; B = A*B
    A6  2 4  LDAA $01,X     ; Load next char
    80  2 2  SUBA $30       ; Subtract %0
    81  2 2  CMPA $0A       ;
    25  2 3  BCS Cont2      ; Skip if less than 10
    80  2 2  SUBA $07       ; Subtract (%A-%0-10=7d)
    1B  1 1  ABA            ; A = A+B
    08  1 1  INX            ;
    08  1 1  INX            ; Increment read pointer by 2
    DF  2 4  STX $41        ;   ...and update saved version
    DE  2 4  LDX $43        ; Load write pointer
    A7  2 4  STAA $00,X     ; Store result
    08  1 1  INX            ; Increment write pointer by 1
    DF  2 4  STX 43         ;   ...and update saved version
    33  1 3  PULB           ; Retreive loops to go counter
    5A  1 1  DECB           ;   ...and decrease it
    26  2 3  BNE Loop       ; Loop until no more loops needed
    39  1 5  RTS            ; Return to calling OPL

In comparison, it is dazzlingly fast, and I knew that at some point I had to learn machine code. When I did, I found that there were ways to beat his code for speed. To see why, compare the two programs carefully to see how they differ in approach, noting the number of instruction cycles needed to complete the task.

The new version makes far better use of the stack to avoid swapping pointers. It uses two loops whose total instruction count is smaller than those in one loop in the old. Most instructions in each loop in the new version are very fast, mostly two cycles or less. The total code size is bigger in the new version, and the unlooped cycle count is larger, but it does more. The old version cannot write bytes to any location other than the source string, and returns no byte count. The new version will write bytes to any accessible RAM, return a byte count, and do it at least 33% faster. The old version has 42 bytes and 69 cycles in its loop, but the new one loops only 35 bytes and 50 cycles.

CRC16: A program to write and test checksums for MODBUS or Psion protocols.

I ended up putting off my effort to work with MODBUS indefinitely due to several other programs I was working on, but this CRC16 generator was the first thing I wrote, because I knew that an OPL program could never have done this fast enough to cope with the bulk of serial comms signals in the Morningstar MPPT photovoltaic charge controller I'd intended to work with.

    Start of machine code, having passed address of MODBUS string in to register D.
    18  1 2  XGDX           ;Swap D and X registers.
    DF  2 4  STX $43        ;Store string's start address.
    E6  2 4  LDAB $00,X     ;Get string length...
    C0  2 2  SUBB $02       ;...minus 2 bytes for CRC.
    3A  1 1  ABX            ;Add length to string address.
    DF  2 4  STX $41        ;Store string's end address.
    DE  2 4  LDX $43        ;Reload string's start address.
    CC  3 3  LDD $FFFF      ;Initialise CRC in D register.
    Outer loop, iterating bytes for XOR with CRC.
      08  1 1  INX            ;X steps by 1 to point at next byte.
      E8  2 4  EORB $00,X     ;XOR CRC with byte addressed by X.
      DF  2 4  STX $43        ;Store address, freeing X.
    Inner loop, bitshifting and XOR with $A001.
        CE  3 3  LDX $0008      ;Initialise loop counter.
        04  1 1  LSRD           ;Shift data right.
        24  2 3  BCC $04        ;If lsb was zero, bypass XOR.
        88  2 2  EORA $A0       ;If lsb was NOT zero, do XOR.
        C8  2 2  EORB $01       ;Second byte of XOR.
        09  1 1  DEX            ;Decrease loop count in X.
        26  2 3  BNE $F6        ;If counter is not zero, repeat loop.
    Exit inner loop...
      DE  2 4  LDX $43        ;Reload address into X.
      9C  2 4  CPX $41        ;Test for end of string.
      25  2 3  BCS $E8        ;If counter is not zero, repeat loop.
    Exit outer loop...
    36  1 3  PSHA           ;Swap CRC bytes: A to Stack...
    17  1 1  TBA            ;...B to A...
    33  1 3  PULB           ;...Stack to B.
    DD  2 4  STD $41        ;Store copy of CRC.
    Test calculated CRC match with CRC bytes and set CRC bytes if needed.
    A3  2 5  SUBD $01,X     ;Try to cancel CRC bytes with CRC.
    27  2 3  BEQ $07        ;If CRC did not match...
    DC  2 4  LDD $41        ;...Get stored copy of CRC.
    ED  2 5  STD $01,X      ;...Set CRC bytes with CRC.
    CE  3 3  LDX $0001      ;...Set X to 1.
    26  2 3  BNE $03        ;If CRC did match...
    CE  3 3  LDX $0000      ;...Set X to 0.
    09  1 1  DEX            ;Convert X to return OPL Boolean True/False.
    39  1 5  RTS            ;Return to OPL.

CRC16 is useful in general too, and Psion use it in their own protocol, but with one specific variation from the code above. Note the line "LDD $FFFF      ;Initialise CRC in D register." Intialise that to $0000 instead of $FFFF for Psion protocol. It's less robust in cases where some chunk of bytes begins with a sequence of $00 but it's what Psion did, so for compatibility it's best to do the same. I can't think of any reason to reinvent that particular wheel (except in emulating their protocol OUTSIDE their own systems, not inside), but there it is. :) More to the point: There may be various systems that use a CRC16 two-byte checksum, and it's likely that they only differ in the value used to initialise the checksum, so the method may find many uses.