How the DMA code works

Reply Subscribe

Thread Tools

Search this Thread

Mar 29, 2009 | 04:12 AM

jcsbanks

Thread Starter

Evolved Member

Joined: May 2006

Posts: 2,399

Likes: 6

From: UK

How the DMA code works

I put this code just before Tephra's code, and steal his hook to run my code (COPY and TIMEOUT near the end of the code below) first before jumping to his. I also move Tephra's alt map vectors and any other map vectors I want to RAM. The Ecuflash definitions for the alt maps stay the same because they are editing the ROM copy of the alt maps that I copy to RAM in the COPY routine, which copies a 2K block from ROM to RAM. To detect whether a copy is needed I check Tephra's variable that he sets to 0xDEAD when everything has been setup.

TIMEOUT checks for ECU logging variables and kills the DMA processes - stops the comms hanging if the PC crashes or the cable is unplugged, or the engine turned off.

The rest of my code is interrupt driven with the following three vectors pointing to my code, and four changes to stop existing ECU code from destroying my DMA processes (shown as * because they move between ROMs).

138 - DMA end interrupt vector (marked DMAEND below) - see comments
324 - RXI interrupt vector (marked RXI0 below). This code replaces the stock serial port/MUT/OBD interrupt routine, jumps back to stock routines (via exitformut) if we are not MUT logging and the request is not E0, E1, or E2.
32c - transmit end interrupt vector (marked TEIE below) - see comments

*E13E - change cmp and bra conditions - stops ECU killing our DMA
*E3AE - change from DMA to RAM 8480 - this uses 0xFFFF8480 as our DMA flag to stop the ECU from interfering with our DMA
*E8F6 NOP - stops ECU killing our DMA
*FA72 NOP - stops ECU killing our DMA

Code:

RXI0:
  add     #-4, r15 SORT OUT THE STACK
  sts.l   pr, @-r15
  mov.l   r14, @-r15
  mov     r15, r14
  sts.l   macl, @-r15
  sts.l   mach, @-r15
  mov.l   r10, @-r15
  mov.l   r11, @-r15
  mov.l   r12, @-r15
  mov.l   r13, @-r15
  mov.l   r3, @-r15
  mov.l   r4, @-r15
  mov.l   r5, @-r15
  mov.l   r6, @-r15
  mov.l   r7, @-r15
  mov.l   r0, @-r15

  mov.l (int_disable),r10
  jsr @r10
  nop             

mov.l (RDR0),r10 READ SERIAL PORT
mov.b @r10,r10
extu.b r10,r10
mov.w (E0),r3
cmp/hs r3,r10 EXIT IF NOT E0,E1,E2
bf exitformut
mov.w (E2),r3
cmp/hi r3,r10
bt exitformut
mov.l   (bit7allowslogging), r0
mov.w   @r0, r0
tst     #0x80, r0 EXIT IF NOT LOGGING
bt      exitformut
mov.l   (mutorobd), r0
mov.w   @r0, r0
tst     #0x80, r0 EXIT IF NOT IN MUT MODE
bt      exitformut
mov.l   (receive_transmit_status_bits), r0
mov.w   @r0, r0
tst     #0x80, r0 CHECK WE ARE NOT GETTING AN ECHO OF SOMETHING WE JUST SENT
bf      exitformut

brE012: FOR SETTING UP FIRST DMA TRANSFER
mov.l (DMAOPFLAG2),r11
mov.l r10,@r11
mov #0,r0
mov.l (counter1),r10 STOP MUT TIMEOUT
mov.w r0,@r10
mov #0xfffffffc,r0
mov.l (CHCR3),r10 RESET DMA
mov.l @r10,r10
and r0,r10
mov.l (CHCR3),r11
mov.l r10,@r11
mov.l (RDR0),r0
mov.l (SAR3),r11 SOURCE IS SERIAL PORT
mov.l r0,@r11
mov.l (DMAread),r0
mov.l (DAR3),r11 DESTINATION IS OUR OWN...
mov.l r0,@r11
mov #6,r0 ...6 BYTE MEMORY BLOCK - FOUR BYTES ADDRESS, TWO BYTES LENGTH
mov.l (DMATCR3),r11
mov.l r0,@r11
mov #0x37,r0 0X37 IS THE NUMBER OF MY CHILDHOOD HOME - SUITABLE RANDOM NON ZERO NUMBER THAT WE CAN SET WHEN WE DON'T WANT THE ECU TO KILL OUR DMA PROCESS
mov.l (DMAOPFLAG),r10
mov.l r0,@r10
mov.l (SSR0),r10
mov.b @r10,r0
and #0x87,r0 RESET SERIAL PORT
mov.b r0,@r10
mov.l (DMA3CONFIGread),r0 SET DMA CHANNEL 3 TO READ FROM SERIAL PORT
mov.l (CHCR3),r10
mov.l r0,@r10
  mov.l (int_enable),r10
  jsr @r10
  nop
bra exit
nop

exitformut:
  mov.l (int_enable),r10
  jsr @r10
  nop
mov.l (serialreceivewithoutdma),r10 BACK TO NORMAL SERIAL INTERRUPT, NONE OF OUR BUSINESS!
jsr @r10
nop

exit:
  mov.l   @r15+, r0 STACK
  mov.l   @r15+, r7
  mov.l   @r15+, r6
  mov.l   @r15+, r5
  mov.l   @r15+, r4
  mov.l   @r15+, r3
  mov.l   @r15+, r13
  mov.l   @r15+, r12
  mov.l   @r15+, r11
  mov.l   @r15+, r10
  lds.l   @r15+, mach
  lds.l   @r15+, macl
  mov.l   @r15+, r14
  lds.l   @r15+, pr
  add     #4, r15
  rte RETURN FROM EXCEPTION/INTERRUPT
  nop


.align 2
int_disable:
.long 0x400
int_enable:
.long 0x41e
serialreceivewithoutdma:
.long 0xe32a /*TO BE CHANGED FOR EACH ROM*/
bit7allowslogging:
.long 0xffff6fea /*TO BE CHANGED FOR EACH ROM*/
mutorobd:
.long 0xffff6fb2 /*TO BE CHANGED FOR EACH ROM*/
receive_transmit_status_bits:
.long 0xffff6fe6 /*TO BE CHANGED FOR EACH ROM*/
counter1:
.long 0xffff7054 /*TO BE CHANGED FOR EACH ROM*/
SAR3:
.long 0xffffecf0
DAR3:
.long 0xffffecf4
DMATCR3:
.long 0xffffecf8
CHCR3:
.long 0xffffecfc
DMA3CONFIGread:
.long 0x20105
SSR0:
.long 0xfffff004
RDR0:
.long 0xfffff005
DMAOPFLAG:
.long 0xffff8480
DMAOPFLAG2:
.long 0xffff8484
DMAread:
.long 0xffff8488
E0:
.word 0xE0
E2:
.word 0xE2

/*--------------------------------*/
.align 4
DMAEND:
sts.l pr,@-r15 STACK
mov.l r0,@-r15
mov.l r10,@-r15
mov.l (_int_disable),r10 STOP INTERRUPTS
jsr @r10
nop

mov #0xfffffffc,r0
mov.l (_CHCR3),r10 CLEAR DMA PROCESS
mov.l @r10,r10
and r10,r0
mov.l (_CHCR3),r10
mov.l r0,@r10

mov.l (_DMAOPFLAG2),r0 SINCE WE USE TWO CONSECUTIVE DMA PROCESSES, I USE THIS VARIABLE TO KEEP TRACK OF WHAT WE'RE DOING - READ, WRITE ETC.
mov.l @r0,r0
mov #1,r10
cmp/eq r10,r0
bt brwrite
nop
mov.w (_E0),r10
cmp/eq r10,r0
bt brE0
nop
mov.w (_E1),r10
cmp/eq r10,r0
bt brE1
nop
mov.w (_E2),r10
cmp/eq r10,r0
bt brE2
nop
bra TEIEinvade
nop

brwrite:
mov.l (_SSR0),r10
mov.b @r10,r0
tst #4,r0
bf TEIEinvade DMA MAY HAVE ENDED BUT SERIAL PORT TRANSMISSION MAY NOT HAVE
nop

mov.w (_SCR0_CLRTIE_SETTEIE),r0 SETUP TRANSMIT END INTERRUPT IF NOT YET FINISHED
mov.l (_SCR0),r10
mov.b r0,@r10

mov.l (_int_enable),r10
jsr @r10
nop
mov.l @r15+,r10
mov.l @r15+,r0
lds.l @r15+,pr
rte
nop

brE0: USES EARLIER 4 BYTE ADDRESS AND 2 BYTE LENGTH TO LOG THE MUT TABLE
mov #1,r0
mov.l (_DMAOPFLAG2),r10
mov.l r0,@r10
mov.l (_DMAaddress),r0
mov.l @r0,r0
mov.l (_SAR3),r10
mov.l r0,@r10
mov.l (_TDR0),r0
mov.l (_DAR3),r10
mov.l r0,@r10
mov.l (_DMAlength),r0
mov.w @r0,r0
mov.l (_DMATCR3),r10
mov.l r0,@r10
mov #0x37,r0
mov.l (_DMAOPFLAG),r10
mov.l r0,@r10
mov.w (_SCR0_CLRRE_SETTIE),r0
mov.l (_SCR0),r10
mov.b r0,@r10
mov.l (_DMA3CONFIGwriteindirect),r0
mov.l (_CHCR3),r10
mov.l r0,@r10
mov.l (_int_enable),r10
jsr @r10
nop
mov.l @r15+,r10
mov.l @r15+,r0
lds.l @r15+,pr
rte
nop

brE1: USES PREVIOUS 4 BYTE ADDRESS AND 2 BYTE LENGTH TO WRITE A BLOCK OF RAM TO THE SERIAL PORT
mov #1,r0
mov.l (_DMAOPFLAG2),r10
mov.l r0,@r10
mov.l (_DMAaddress),r0
mov.l @r0,r0
mov.l (_SAR3),r10
mov.l r0,@r10
mov.l (_TDR0),r0
mov.l (_DAR3),r10
mov.l r0,@r10
mov.l (_DMAlength),r0
mov.w @r0,r0
mov.l (_DMATCR3),r10
mov.l r0,@r10
mov #0x37,r0
mov.l (_DMAOPFLAG),r10
mov.l r0,@r10
mov.w (_SCR0_CLRRE_SETTIE),r0
mov.l (_SCR0),r10
mov.b r0,@r10
mov.l (_DMA3CONFIGwritedirect),r0
mov.l (_CHCR3),r10
mov.l r0,@r10
mov.l (_int_enable),r10
jsr @r10
nop
mov.l @r15+,r10
mov.l @r15+,r0
lds.l @r15+,pr
rte
nop

brE2: USES PREVIOUS 4 BYTE ADDRESS AND 2 BYTE LENGTH TO READ A BLOCK FROM SERIAL PORT AND WRITE IT TO RAM
mov #2,r0
mov.l (_DMAOPFLAG2),r10
mov.l r0,@r10
mov.l (_RDR0),r0
mov.l (_SAR3),r10
mov.l r0,@r10
mov.l (_DMAaddress),r0
mov.l @r0,r0
mov.l (_DAR3),r10
mov.l r0,@r10
mov.l (_DMAlength),r0
mov.w @r0,r0
mov.l (_DMATCR3),r10
mov.l r0,@r10
mov #0x37,r0
mov.l (_DMAOPFLAG),r10
mov.l r0,@r10
mov.l (_SSR0),r10
mov.b @r10,r0
and #0x87,r0
mov.b r0,@r10
mov.l (_DMA3CONFIGread),r0
mov.l (_CHCR3),r10
mov.l r0,@r10
mov.l r0,@r10
mov.l (_int_enable),r10
jsr @r10
nop
mov.l @r15+,r10 STACK
mov.l @r15+,r0
lds.l @r15+,pr
rte RETURN FROM EXCEPTION/INTERRUPT
nop

.align 4
TEIE: TRANSMIT END INTERRUPT
sts.l pr,@-r15
mov.l r0,@-r15
mov.l r10,@-r15
mov.l (_int_disable),r10
jsr @r10
nop
TEIEinvade: JUMP HERE FROM EARLIER IF TRANSMISSION HAS ALREADY FINISHED
mov #0,r0
mov.l (_DMAOPFLAG),r10 CLEAR OUR 0X37 VARIABLE
mov.l r0,@r10
mov.w (_SCR0_SETRE_CLRTEIE),r0 RESET SERIAL PORT INTERRUPT CONFIG
mov.l (_SCR0),r10
mov.b r0,@r10

mov.l (_SSR0),r10 RESET SERIAL PORT STATUS FOR NEXT COMMS
mov.b @r10,r0
and #0x87,r0
mov.b r0,@r10

mov.l (_int_enable),r10
jsr @r10
nop
mov.l @r15+,r10 STACK
mov.l @r15+,r0
lds.l @r15+,pr
rte
nop

.align 2

_SAR3:
.long 0xffffecf0
_DAR3:
.long 0xffffecf4
_DMATCR3:
.long 0xffffecf8
_CHCR3:
.long 0xffffecfc

_SCR0:
.long 0xfffff002
_TDR0:
.long 0xfffff003
_SSR0:
.long 0xfffff004
_RDR0:
.long 0xfffff005

_DMA3CONFIGwriteindirect:
.long 0x10011005
_DMA3CONFIGwritedirect:
.long 0x11005
_DMA3CONFIGread:
.long 0x20105

_DMAOPFLAG:
.long 0xffff8480
_DMAOPFLAG2:
.long 0xffff8484
_DMAaddress:
.long 0xffff8488
_DMAlength:
.long 0xffff848c

_int_disable:
.long 0x400
_int_enable:
.long 0x41e

_E0:
.word 0xE0
_E1:
.word 0xE1
_E2:
.word 0xE2

_SCR0_CLRTIE_SETTEIE:
.word 0x24
_SCR0_SETRE_CLRTEIE:
.word 0x70
_SCR0_CLRRE_SETTIE:
.word 0xa0

/*--------------------------------*/
.align 4
COPY: COPY ROM TO RAM IF TEPHRA'S DEAD VARIABLE IS NOT 0XDEAD
sts.l pr,@-r15
mov.l r0,@-r15
mov.l r1,@-r15
mov.l r2,@-r15
mov.l r10,@-r15
mov.l r11,@-r15

mov.w (DEADval),r0
mov.l (DEADloc),r1
mov.w @r1,r1
cmp/eq r1,r0
bt TIMEOUT
nop

mov.l (ROM),r10
mov.l (RAM),r11
mov.l (LENGTH),r1
mov #0,r0
loop:
mov.l @(r0,r10),r2
mov.l r2,@(r0,r11)
add #4,r0
cmp/hs r1,r0
bf loop
nop

TIMEOUT: KILLS DMA IF COMMS HAVE BEEN KILLED
mov.l (__int_disable),r10
jsr @r10
nop

mov.l (__bit7allowslogging),r10
mov.w @r10,r0
tst #0x80,r0
bf __exit
nop

mov #0xfffffffc,r0
mov.l (__CHCR3),r10
mov.l @r10,r10
and r10,r0
mov.l (__CHCR3),r10
mov.l r0,@r10

mov #0,r0
mov.l (__DMAOPFLAG),r10
mov.l r0,@r10

__exit:
mov.l (__int_enable),r10
jsr @r10
nop
mov.l @r15+,r11
mov.l @r15+,r10
mov.l @r15+,r2
mov.l @r15+,r1
mov.l @r15+,r0
lds.l @r15+,pr

mov.l (tephra),r10
jmp @r10
nop

.align 2
DEADloc:
.long 0xffff841c
__CHCR3:
.long 0xffffecfc
__DMAOPFLAG:
.long 0xffff8480
__bit7allowslogging: /*CHANGE FOR EACH ROM*/
.long 0xffff6fea
__int_disable:
.long 0x400
__int_enable:
.long 0x41e
tephra: /*CHANGE FOR 256K ECUS*/
.long 0x48000
ROM:
.long 0x37b00
RAM:
.long 0xFFFF8500
LENGTH:
.long 0x800
DEADval:
.word 0xDEAD

Apr 17, 2009 | 09:49 PM

logic

Evolved Member

iTrader: (2)

Joined: Apr 2003

Posts: 1,022

Likes: 7

From: Berkeley, CA

Hi John,

Let me see if I have the client-side protocol correct here, so that I don't mis-implement something. The code is great, but I'd like to step back for a moment and put down an English version of it, if only to make sure I understand this correctly. I'm really tired right now, but I'm hoping this all comes out coherently.

You've taken over three MUT commands: E0 through E2. Each is transmitted as a normal MUT command (ie. MUT already needs to be initialized).

There's a delay after the delivery of the command; the client suggests that this is baudrate-dependant? Is there a reasonable calculation that can be used to determine what ms delay is expected?

After the delay, the command payload is delivered, specific to the command. One note here: addresses are transmitted in...little-endian byte order? It looks like you're just deferring to x86 byte ordering, so I assume so. Running the values through htonl() ("network order") would have both made this explicit, and match the SH default byte order, but I don't think it matters too much.

So, individual command payloads:

E0, "indirect copy": A four-byte start address, followed by a two-byte length. Specifies the ROM address of a RAM lookup table sized according to the length specified (ie. the MUT table). Responds back with the same six bytes? Then returns the single-byte value at each RAM address looked up.
E1 "direct copy": Same thing, but instead of specifying the location of a ROM lookup table, you're specifying a RAM address and length directly; ie. pull a contiguous region of RAM.
E2 "direct write": Same thing again, but this time you send a RAM address to write to, followed by a two-byte length, and then the byte-by-byte contents of memory to write.
E3: This is used by Copy01button_Click() in the PC app, and I have absolutely no clue what it does. I see no reference to it in the SH code, and it doesn't look like it's attached to any GUI elements, so I assume this is just a bit of old stuff you were playing with? (There's also Mapsw0button_Click() and Mapsw1button_Click() that don't look like they're attached to any GUI elements either?)

So, it looks pretty straightforward from an implementation perspective: you essentially have an indirect-copy command (E0), a direct-copy command (E1), and a direct-write command (E2); using E0, it looks VERY easy to add on to an existing MUT logger without much in the way of code changes.

It looks like there might be an opportunity to automatically detect if your DMA code is there. If you send an E1, and you get a one-byte response back immediately, you know you've triggered a stock-ROM MUT command, rather than engaging a DMA response. If you get nothing back in a few ms, just continue on asking for a known address (say, the ROM ID or some such) just to complete the command, and make a note that you can now make requests via DMA rather than traditional MUT. Does this sound reasonable?

Have I missed anything obvious here? Any glaring misunderstandings of what your code is doing? If this all looks right, I might write this up on the wiki in the next few days, along with what I know about MUT in "stock" form.

(This has me thinking: how hard would it be to make a baudrate change with this? How cool would that be...negotiate a MUT connection at 15625 initially, but if it turns out we can do DMA, then let the client negotiate a higher speed at whatever baud rate they're capable of? 38400 for your PocketPC, 62400 for a full-on PC application, etc? Implementation could be as simple as moving the reference to the baud rate into RAM, no? You could even have TIMEOUT switch back to 15625 in the event of disconnection, so that later clients could come along at the expected baud rate and not have a problem.)

Apr 18, 2009 | 02:34 AM

jcsbanks

Thread Starter

Evolved Member

Joined: May 2006

Posts: 2,399

Likes: 6

From: UK

The delay was there to give MUT time to process the request before sending the address/length packet. It was baud rate dependent, I made it convervative. It may not be required at all now I use the serial interrupt (as shown in this code, rather than a hook in the MUT code) which should be ready much quicker.

Format is big endian.

E3 is defunct.

We could change the baud rate dynamically I think.

Confirmation of DMA capability could work as you say.

There is no real error checking in there except for when you send a block using an Ecuflash ROM it is verified afterwards, and I also flash up a message box if there is a mismatch.

What I'm not happy with is the robustness I've seen on the 9 doesn't seem to be happening on the 8. Comms don't seem quite so reliable (I have abuse tested my 9 setup in every way imaginable including repeated read/writes, pulling cables and turning off during reads/writes etc with behaviour just as I want). It also seems that the 8 RAM maps may get overwritten, I can't see from where in the code, but a few people are getting weirdness that sounds tricky to test...

My friend with an 8MR and I are going to change our throttle body shaft seals today, if I get time I'll try the 96260009 on his and try some abuse testing and see what I can find.

Apr 18, 2009 | 09:42 AM

logic

Evolved Member

iTrader: (2)

Joined: Apr 2003

Posts: 1,022

Likes: 7

From: Berkeley, CA

Error checking/verification: that's a client-side issue in my mind (which the current client handles pretty well), since it's so easy to re-read what was written and validate it. On the other hand, it would be less protocol overhead to do a rudimentary checksum of the written data and return it after the write completes than to re-read the whole thing back, but writing isn't exactly a critical path (read performance matters from a logging perspective, but writing is an interactive, one-off thing).

Thinking about how this could work, you could include a one or two byte checksum with each transfer; ie. E0+ADDR+LEN and E1+ADDR+LEN would get back the usual data followed by a checksum. For E2, you'd send E2+ADDR+LEN+DATA+CHKSUM, and get back a response from the ECU of 0x00 for a good checksum, and anything else would be an error.

You could go farther by enforcing a reasonable block size, and reserving a chunk of memory of that size for transfers. Move all incoming data to that area and checksum as it arrives. If it passes, copy it to it's proper home; therwise, return an error and don't bother moving it. That way, you shouldn't end up with a corrupted write except in completely bizarre cases, but you burn a bit of memory supporting it

Reliability: I honestly can't say I saw any problems with reliability in my test cases, but I didn't end up doing a lot of writing. E0 reliability has been rock solid in my testing, and that's with an underpowered machine that probably couldn't keep up half the time.

Hmm. I just had an interesting thought about the GUI interface: you could have the user pick and choose items they want to log (perhaps using the existing MUT table as a guide to actual RAM addresses), and construct a MUT-like table in RAM from that list , truncated to only the items they're interested in, then use E0 to read that table rather than the real MUT table. You'd get slimmer responses, and wouldn't have the current problem of only being able to read a contiguous chunk of the MUT table.

The more I look at this, the more I think I should have posted this stuff in the client thread, rather than this one.

(The funny thing is, I've taken a big interest in this lately, but I'm not even running this code on my own car anymore since I started testing out tephra's V7.

)

Apr 18, 2009 | 03:10 PM

jcsbanks

Thread Starter

Evolved Member

Joined: May 2006

Posts: 2,399

Likes: 6

From: UK

I had a quick play engine off with the 96260009 engine off because it was late in a residential area. I did find that repeated reads would sometimes give an error, but it was unpredictable and not repeatable. I have no idea why, on the 9 it either worked reliably or didn't at all. It is disappointing because I can't think of a single thing that I haven't applied from the successful 9 patch to the 8, and anything I hadn't done would have stopped it working. On the 9 it is really robust, works just as you'd want it to.

You can easily put a new mini-MUT table in RAM with only your required items in, although still contiguous. I have 19 items being logged presently.