Notices
ECU Flash

[Dev] SH2 disassembler

Thread Tools
 
Search this Thread
 
Old Oct 15, 2009, 11:44 AM
  #1  
Evolved Member
Thread Starter
iTrader: (2)
 
logic's Avatar
 
Join Date: Apr 2003
Location: Berkeley, CA
Posts: 1,022
Likes: 0
Received 5 Likes on 4 Posts
[Dev] SH2 disassembler

(I suspect I can count the number of people who will be interested in this post on one hand, but since the project is far enough along that it produces semi-useful output, it's probably time to toss it over the wall and see if anyone cares. So, here we go...)

I've written a rudimentary, but automated, SH2 disassembler in Python. It's licensed under the GNU General Public License (Version 3), and can be downloaded from here:

https://github.com/logic/sh2dis/

(You'll need all files from that folder ending in ".py", or just click on "zip" in the left-hand sidebar to download a zip file of the whole project.)

This is not an IDA replacement, at least not yet, although my motivation was not having to point folks who are interested in ECU development toward a $1000 code analysis package that they'll only end up using 1% of.

In fact, as of now it has no user interface at all, simply a "demo application" (dis.py) that, given a ROM image from an Evo VIII or IX (and probably most other 7052- or 7055-based platforms, such as the Hayabusa ECU), tries to perform an automatic disassembly in much the same manner as acamus' onload.idc script. For instructions on using dis.py, run "dis.py --help".

Segment handling is modeled after IDA, and I've tried not to torpedo the possibility of implementing other processors (I'm thinking specifically of H8/500 and HC11, for obvious reasons), but I just haven't had the time to think about that yet. The output doesn't currently include IDA's comment-based cross-references, although that information is tracked and could be added pretty easily. It automatically labels "known" (ie. from the platform docs) vectors and registers, and can follow most branches. Branch handling is done by doing very basic register assignment tracking, and there's a ton of room for improvement here (but it seems to be good enough for "in the wild" ROMs right now).

It requires Python 2.5 or 2.6. Python 3.0 will not work, period, full-stop; there's too much new stuff to make this a non-trivial porting exercise right now.

In case it's not completely obvious yet: this is NOT end-user software. The target audience for this is other developers right now, and probably only those with a solid working knowledge of both IDA and python. Knowing SH2 assembly wouldn't hurt, either.

Performance is not quite where I'd like it to be right now; it takes about 30 seconds on my old dev machine (Dual PIII 1GHz, Linux) to run through a complete disassembly and output, which feels a bit slower than IDA's automated analysis. I'll be very honest, I'm not worrying much about that just yet, since there's so much additional work to be done elsewhere. (If someone feels like tackling the main bottleneck, it's in sh2.py, in disasm_single(); a short-circuiting instruction matching scheme in there, perhaps along with better opcode storage in sh2opcodes.py, would probably cut runtime by more than half.)

It's probably extremely buggy, and the source is certainly a mess as it sits right now. Bug reports and patches are welcome.

Last edited by logic; Apr 10, 2011 at 12:17 PM. Reason: New URL.
Old Oct 15, 2009, 09:53 PM
  #2  
EvoM Community Team
iTrader: (15)
 
fostytou's Avatar
 
Join Date: Sep 2006
Location: Aurora, IL
Posts: 3,143
Received 6 Likes on 6 Posts
Thanks dude!
Old Oct 15, 2009, 10:16 PM
  #3  
EvoM Guru
iTrader: (6)
 
tephra's Avatar
 
Join Date: Feb 2007
Location: Melbourne, Australia
Posts: 9,486
Received 66 Likes on 42 Posts
good job.

IDA is very expensive, I wish they could cut us (the evo devs) a deal - since we really aren't using it for commercial use
Old Oct 15, 2009, 10:27 PM
  #4  
Evolved Member
iTrader: (22)
 
codgi's Avatar
 
Join Date: Aug 2004
Location: Seattle, WA
Posts: 2,491
Received 41 Likes on 37 Posts
Good work. Maybe I'll sleep a little less when I am at home at xmas and actually poke about this .

Originally Posted by tephra
good job.

IDA is very expensive, I wish they could cut us (the evo devs) a deal - since we really aren't using it for commercial use
Once software leaves its home their is no way to tell its intent so thats just the way it is
Old Oct 16, 2009, 05:59 AM
  #5  
Evolved Member
Thread Starter
iTrader: (2)
 
logic's Avatar
 
Join Date: Apr 2003
Location: Berkeley, CA
Posts: 1,022
Likes: 0
Received 5 Likes on 4 Posts
Yeah, I can't blame the guy writing IDA (it's really just one fellow, at the core of it) for the pricing; it's such a niche market that it's tough to stay in business if you don't charge appropriately. (Really, who is your target audience? Antivirus/security software developers, pirates, and hobbyists. But only one of those groups has money to spend on software, and another group is going to actively try to redistribute your software. )

And codgi hit the nail on the head; especially for low-volume sales like this, a "non-commercial use" or "student" license doesn't really work, because when you need it, it's probably a single-project need. It'd be like one of those companies selling data recovery software giving away a trial version that does a few recoveries before you have to buy it; they'd never get any sales, because you generally only go looking for software like that when you have a single recovery to do.

But that still means we need something to work with, and I'd rather tell people "here, use this free thing that has a few rough edges" than give them a suggestion that's just going to lead to them hopping on The Pirate Bay. Unfortunately, it has a LOT of rough edges right now. Working on it.
Old Oct 16, 2009, 08:36 AM
  #6  
Evolved Member
Thread Starter
iTrader: (2)
 
logic's Avatar
 
Join Date: Apr 2003
Location: Berkeley, CA
Posts: 1,022
Likes: 0
Received 5 Likes on 4 Posts
Okay, I lied: it works with Python 2.5 now. (I wanted to be able to test it on another machine, which didn't have 2.6, and upgrading would be...er...non-trivial.)
Old Oct 16, 2009, 11:41 AM
  #7  
Evolved Member
iTrader: (1)
 
Danieln's Avatar
 
Join Date: Apr 2008
Location: EUROPE
Posts: 563
Likes: 0
Received 0 Likes on 0 Posts
Thanks mate
Old Oct 16, 2009, 04:30 PM
  #8  
Evolved Member
iTrader: (6)
 
donour's Avatar
 
Join Date: May 2004
Location: Tennessee, USA
Posts: 2,501
Received 1 Like on 1 Post
It's not _that_ slow. It only takes 10 seconds to process 96420008 on my mac laptop. That's 157k lines of output. I'm travelling this week, but I'm excited to take a look at it later.

d
Old Oct 16, 2009, 07:08 PM
  #9  
Evolved Member
Thread Starter
iTrader: (2)
 
logic's Avatar
 
Join Date: Apr 2003
Location: Berkeley, CA
Posts: 1,022
Likes: 0
Received 5 Likes on 4 Posts
I did mention that it was a PIII 1.0GHz, right? But, it's a handy little machine to ssh over to for stuff like this; keeps work and play separate.

Just in case anyone would rather just check it out directly, the git repository is available at: https://github.com/logic/sh2dis

Last edited by logic; Apr 10, 2011 at 12:19 PM. Reason: Moved to github.
Old Oct 16, 2009, 08:31 PM
  #10  
Account Disabled
iTrader: (3)
 
0xDEAD's Avatar
 
Join Date: Jun 2009
Location: central pa
Posts: 312
Likes: 0
Received 0 Likes on 0 Posts
Nice, thanks.
Old Oct 17, 2009, 09:26 AM
  #11  
Evolved Member
iTrader: (6)
 
donour's Avatar
 
Join Date: May 2004
Location: Tennessee, USA
Posts: 2,501
Received 1 Like on 1 Post
I already have some patches for you. No change in functionality yet, but some improvements to the interface, documentation, that kind of stuff.

Example: you don't want to be sending the output to stdout if you are calling diassemble() from the Profile module.

Do you want them?

d
Old Oct 17, 2009, 12:19 PM
  #12  
Evolved Member
Thread Starter
iTrader: (2)
 
logic's Avatar
 
Join Date: Apr 2003
Location: Berkeley, CA
Posts: 1,022
Likes: 0
Received 5 Likes on 4 Posts
Patches are always welcome. I've been leaving some of the "fit and finish" stuff off to the side while I've been writing and re-writing the segment API and working out how best to integrate tests for instruction disassembly and register tracking. (It's nice working on something that actually lends itself well to automated testing for a change. )
Old Oct 19, 2009, 02:51 PM
  #13  
Evolved Member
Thread Starter
iTrader: (2)
 
logic's Avatar
 
Join Date: Apr 2003
Location: Berkeley, CA
Posts: 1,022
Likes: 0
Received 5 Likes on 4 Posts
Well, would you look at that; the output is starting to look familiar now:
Code:
00009C5C init:            mov.l @(0x14,pc),r15  ! [unk_9C74] = sp
                                                ! XREF: v_power_on_pc
                                                ! XREF: v_reset_pc ...
00009C5E                  mov.l @(0x18,pc),r0   ! [unk_9C78] = unk_FFFFABA0
00009C60                  mov.l @(0x18,pc),r1   ! [unk_9C7C] = unk_FFFFABA0
00009C62                  mov.l r1,@r0
00009C64                  mov #0x0,r0
00009C66                  ldc r0,vbr
00009C68                  ldc r0,gbr
00009C6A                  mov.l @(0x14,pc),r0   ! [unk_9C80] = sub_EE94
00009C6C                  jsr @r0               ! sub_EE94
00009C6E                  nop
00009C70                  bra reset
00009C72                  nop
         ! ------------------------------------------------------------
00009C74 unk_9C74:        .long sp              ! XREF: init
00009C78 unk_9C78:        .long unk_FFFFABA0    ! XREF: 0x9C5E
00009C7C unk_9C7C:        .long unk_FFFFABA0    ! XREF: 0x9C60
00009C80 unk_9C80:        .long sub_EE94        ! XREF: 0x9C6A
         ! ------------------------------------------------------------
00009C84 reset:           mov.l @(0x8,pc),r0    ! [unk_9C90] = v_int_trap1C
                                                ! XREF: v_gen_ill_inst
                                                ! XREF: 0x14 ...
00009C86                  ldc.l @r0+,sr
00009C88                  mov.l @(0x8,pc),r0    ! [unk_9C94] = init
00009C8A                  jmp @r0               ! init
00009C8C                  nop
This is just some candy (because I'm avoiding working on more important things), but it's neat to see the underlying construction start to show through in the output; this little snippet really demonstrates what it can do now.

I have a whole new level of respect for IDA and Hex-Rays at this point.
Old Oct 20, 2009, 02:29 PM
  #14  
Evolved Member
Thread Starter
iTrader: (2)
 
logic's Avatar
 
Join Date: Apr 2003
Location: Berkeley, CA
Posts: 1,022
Likes: 0
Received 5 Likes on 4 Posts
Happiness is progress.

dis.py now takes a "-m" command-line argument, which applies Mitsubishi-specific fixups to the ROM. Right now, that means it automatically locates/disassembles jump tables (as indicated by use of MOVA), and also tries to locate the MUT table (and seems to mostly succeed). Both are borrowed from acamus' onload.idc script; he should get the credit for the way I implemented determination of their locations.

At this point, the results appear to be almost as good as when I load up a virgin ROM in IDA, after onload.idc and sh3.cfg do their magic.

Next up: "empty space" scanning and .ORG directive generation (any block of "FF" larger than, say, five bytes, gets turned into a .ORG directive in the output), and output of any referenced RAM and hardware register addresses as .EQU directives (ie. "reg_PACRL .equ 0xFFFFF724"). Pretty soon, the output might actually be able to be fed back into gas directly.

Thinking further ahead, I'd like to add some kind of automated table extraction; ie. look for any "02 XX FF FF" and "03 XX FF FF" sequences that have references to them (or try to actually parse out sub_C28, sub_CC6, etc. calls; I'll need to get smarter about saving register data with generated code to pull that off, though), and at least label them as tables (tbl_XXXX, instead of unk_XXXX) or something. It's candy, but potentially useful, especially if I can bolt up the code I already have for parsing EcuFlash XML files, which would give me something to cross-reference auto-located tables against. I'll probably have to add some concept of an "array" of values at that point too, for auto-prettifying the output.
Old Oct 20, 2009, 03:55 PM
  #15  
Evolved Member
iTrader: (5)
 
RoadSpike's Avatar
 
Join Date: Oct 2006
Location: Sacramento, CA
Posts: 3,805
Likes: 0
Received 2 Likes on 2 Posts
Originally Posted by logic
Happiness is progress.

dis.py now takes a "-m" command-line argument, which applies Mitsubishi-specific fixups to the ROM. Right now, that means it automatically locates/disassembles jump tables (as indicated by use of MOVA), and also tries to locate the MUT table (and seems to mostly succeed). Both are borrowed from acamus' onload.idc script; he should get the credit for the way I implemented determination of their locations.

At this point, the results appear to be almost as good as when I load up a virgin ROM in IDA, after onload.idc and sh3.cfg do their magic.

Next up: "empty space" scanning and .ORG directive generation (any block of "FF" larger than, say, five bytes, gets turned into a .ORG directive in the output), and output of any referenced RAM and hardware register addresses as .EQU directives (ie. "reg_PACRL .equ 0xFFFFF724"). Pretty soon, the output might actually be able to be fed back into gas directly.

Thinking further ahead, I'd like to add some kind of automated table extraction; ie. look for any "02 XX FF FF" and "03 XX FF FF" sequences that have references to them (or try to actually parse out sub_C28, sub_CC6, etc. calls; I'll need to get smarter about saving register data with generated code to pull that off, though), and at least label them as tables (tbl_XXXX, instead of unk_XXXX) or something. It's candy, but potentially useful, especially if I can bolt up the code I already have for parsing EcuFlash XML files, which would give me something to cross-reference auto-located tables against. I'll probably have to add some concept of an "array" of values at that point too, for auto-prettifying the output.
Thats pretty kick *** maybe i can port that to straight C for a faster program runtime and portability to other people without the need for python to be installed.


Quick Reply: [Dev] SH2 disassembler



All times are GMT -7. The time now is 10:20 PM.