[Dev] SH2 disassembler

Reply Subscribe

Thread Tools

Search this Thread

Oct 15, 2009 | 11:44 AM

logic

Thread Starter

Evolved Member

iTrader: (2)

Joined: Apr 2003

Posts: 1,022

Likes: 7

From: Berkeley, CA

[Dev] SH2 disassembler

(I suspect I can count the number of people who will be interested in this post on one hand, but since the project is far enough along that it produces semi-useful output, it's probably time to toss it over the wall and see if anyone cares. So, here we go...)

I've written a rudimentary, but automated, SH2 disassembler in Python. It's licensed under the GNU General Public License (Version 3), and can be downloaded from here:

https://github.com/logic/sh2dis/

(You'll need all files from that folder ending in ".py", or just click on "zip" in the left-hand sidebar to download a zip file of the whole project.)

This is not an IDA replacement, at least not yet, although my motivation was not having to point folks who are interested in ECU development toward a $1000 code analysis package that they'll only end up using 1% of.

In fact, as of now it has no user interface at all, simply a "demo application" (dis.py) that, given a ROM image from an Evo VIII or IX (and probably most other 7052- or 7055-based platforms, such as the Hayabusa ECU), tries to perform an automatic disassembly in much the same manner as acamus' onload.idc script. For instructions on using dis.py, run "dis.py --help".

Segment handling is modeled after IDA, and I've tried not to torpedo the possibility of implementing other processors (I'm thinking specifically of H8/500 and HC11, for obvious reasons), but I just haven't had the time to think about that yet. The output doesn't currently include IDA's comment-based cross-references, although that information is tracked and could be added pretty easily. It automatically labels "known" (ie. from the platform docs) vectors and registers, and can follow most branches. Branch handling is done by doing very basic register assignment tracking, and there's a ton of room for improvement here (but it seems to be good enough for "in the wild" ROMs right now).

It requires Python 2.5 or 2.6. Python 3.0 will not work, period, full-stop; there's too much new stuff to make this a non-trivial porting exercise right now.

In case it's not completely obvious yet: this is NOT end-user software. The target audience for this is other developers right now, and probably only those with a solid working knowledge of both IDA and python. Knowing SH2 assembly wouldn't hurt, either.

Performance is not quite where I'd like it to be right now; it takes about 30 seconds on my old dev machine (Dual PIII 1GHz, Linux) to run through a complete disassembly and output, which feels a bit slower than IDA's automated analysis. I'll be very honest, I'm not worrying much about that just yet, since there's so much additional work to be done elsewhere. (If someone feels like tackling the main bottleneck, it's in sh2.py, in disasm_single(); a short-circuiting instruction matching scheme in there, perhaps along with better opcode storage in sh2opcodes.py, would probably cut runtime by more than half.)

It's probably extremely buggy, and the source is certainly a mess as it sits right now. Bug reports and patches are welcome.

Last edited by logic; Apr 10, 2011 at 12:17 PM. Reason: New URL.

Oct 15, 2009 | 09:53 PM

fostytou

EvoM Community Team

iTrader: (15)

Joined: Sep 2006

Posts: 3,143

Likes: 7

From: Aurora, IL

Thanks dude!

Oct 15, 2009 | 10:16 PM

tephra

EvoM Guru

iTrader: (6)

Joined: Feb 2007

Posts: 9,486

Likes: 67

From: Melbourne, Australia

good job.

IDA is very expensive, I wish they could cut us (the evo devs) a deal - since we really aren't using it for commercial use

Oct 15, 2009 | 10:27 PM

codgi

Evolved Member

iTrader: (22)

Joined: Aug 2004

Posts: 2,493

Likes: 41

From: Atlanta, GA

Good work. Maybe I'll sleep a little less when I am at home at xmas and actually poke about this

Quote:

Originally Posted by tephra

good job.

IDA is very expensive, I wish they could cut us (the evo devs) a deal - since we really aren't using it for commercial use

Once software leaves its home their is no way to tell its intent so thats just the way it is

Oct 16, 2009 | 05:59 AM

logic

Thread Starter

Evolved Member

iTrader: (2)

Joined: Apr 2003

Posts: 1,022

Likes: 7

From: Berkeley, CA

Yeah, I can't blame the guy writing IDA (it's really just one fellow, at the core of it) for the pricing; it's such a niche market that it's tough to stay in business if you don't charge appropriately. (Really, who is your target audience? Antivirus/security software developers, pirates, and hobbyists. But only one of those groups has money to spend on software, and another group is going to actively try to redistribute your software.

)

And codgi hit the nail on the head; especially for low-volume sales like this, a "non-commercial use" or "student" license doesn't really work, because when you need it, it's probably a single-project need. It'd be like one of those companies selling data recovery software giving away a trial version that does a few recoveries before you have to buy it; they'd never get any sales, because you generally only go looking for software like that when you have a single recovery to do.

But that still means we need something to work with, and I'd rather tell people "here, use this free thing that has a few rough edges" than give them a suggestion that's just going to lead to them hopping on The Pirate Bay. Unfortunately, it has a LOT of rough edges right now. Working on it.

Oct 16, 2009 | 08:36 AM

logic

Thread Starter

Evolved Member

iTrader: (2)

Joined: Apr 2003

Posts: 1,022

Likes: 7

From: Berkeley, CA

Okay, I lied: it works with Python 2.5 now. (I wanted to be able to test it on another machine, which didn't have 2.6, and upgrading would be...er...non-trivial.)

Oct 16, 2009 | 11:41 AM

Danieln

Evolved Member

iTrader: (1)

Joined: Apr 2008

Posts: 563

Likes: 0

From: EUROPE

Thanks mate

Trending Topics

Power falls off in high rpm range, what's the problem?

18

9.5k
Evo 9 gt-a wagon rom issue

1

125
Evo help with clutch please!

3

630
BR lower intercooler pipe set

2

221
confirm if flywheel should have step machined

3

361

Oct 16, 2009 | 04:30 PM

donour

Evolved Member

iTrader: (6)

Joined: May 2004

Posts: 2,502

Likes: 1

From: Tennessee, USA

It's not _that_ slow.

It only takes 10 seconds to process 96420008 on my mac laptop. That's 157k lines of output. I'm travelling this week, but I'm excited to take a look at it later.

d

Oct 16, 2009 | 07:08 PM

logic

Thread Starter

Evolved Member

iTrader: (2)

Joined: Apr 2003

Posts: 1,022

Likes: 7

From: Berkeley, CA

I did mention that it was a PIII 1.0GHz, right?

But, it's a handy little machine to ssh over to for stuff like this; keeps work and play separate.

Just in case anyone would rather just check it out directly, the git repository is available at: https://github.com/logic/sh2dis

Last edited by logic; Apr 10, 2011 at 12:19 PM. Reason: Moved to github.

Oct 16, 2009 | 08:31 PM

#10

0xDEAD

Account Disabled

iTrader: (3)

Joined: Jun 2009

Posts: 312

Likes: 0

From: central pa

Nice, thanks.

Oct 17, 2009 | 09:26 AM

#11

donour

Evolved Member

iTrader: (6)

Joined: May 2004

Posts: 2,502

Likes: 1

From: Tennessee, USA

I already have some patches for you. No change in functionality yet, but some improvements to the interface, documentation, that kind of stuff.

Example: you don't want to be sending the output to stdout if you are calling diassemble() from the Profile module.

Do you want them?

d

Oct 17, 2009 | 12:19 PM

#12

logic

Thread Starter

Evolved Member

iTrader: (2)

Joined: Apr 2003

Posts: 1,022

Likes: 7

From: Berkeley, CA

Patches are always welcome.

I've been leaving some of the "fit and finish" stuff off to the side while I've been writing and re-writing the segment API and working out how best to integrate tests for instruction disassembly and register tracking. (It's nice working on something that actually lends itself well to automated testing for a change.

)

Oct 19, 2009 | 02:51 PM

#13

logic

Thread Starter

Evolved Member

iTrader: (2)

Joined: Apr 2003

Posts: 1,022

Likes: 7

From: Berkeley, CA

Well, would you look at that; the output is starting to look familiar now:

Code:

00009C5C init:            mov.l @(0x14,pc),r15  ! [unk_9C74] = sp
                                                ! XREF: v_power_on_pc
                                                ! XREF: v_reset_pc ...
00009C5E                  mov.l @(0x18,pc),r0   ! [unk_9C78] = unk_FFFFABA0
00009C60                  mov.l @(0x18,pc),r1   ! [unk_9C7C] = unk_FFFFABA0
00009C62                  mov.l r1,@r0
00009C64                  mov #0x0,r0
00009C66                  ldc r0,vbr
00009C68                  ldc r0,gbr
00009C6A                  mov.l @(0x14,pc),r0   ! [unk_9C80] = sub_EE94
00009C6C                  jsr @r0               ! sub_EE94
00009C6E                  nop
00009C70                  bra reset
00009C72                  nop
         ! ------------------------------------------------------------
00009C74 unk_9C74:        .long sp              ! XREF: init
00009C78 unk_9C78:        .long unk_FFFFABA0    ! XREF: 0x9C5E
00009C7C unk_9C7C:        .long unk_FFFFABA0    ! XREF: 0x9C60
00009C80 unk_9C80:        .long sub_EE94        ! XREF: 0x9C6A
         ! ------------------------------------------------------------
00009C84 reset:           mov.l @(0x8,pc),r0    ! [unk_9C90] = v_int_trap1C
                                                ! XREF: v_gen_ill_inst
                                                ! XREF: 0x14 ...
00009C86                  ldc.l @r0+,sr
00009C88                  mov.l @(0x8,pc),r0    ! [unk_9C94] = init
00009C8A                  jmp @r0               ! init
00009C8C                  nop

This is just some candy (because I'm avoiding working on more important things), but it's neat to see the underlying construction start to show through in the output; this little snippet really demonstrates what it can do now.

I have a whole new level of respect for IDA and Hex-Rays at this point.

Oct 20, 2009 | 02:29 PM

#14

logic

Thread Starter

Evolved Member

iTrader: (2)

Joined: Apr 2003

Posts: 1,022

Likes: 7

From: Berkeley, CA

Happiness is progress.

dis.py now takes a "-m" command-line argument, which applies Mitsubishi-specific fixups to the ROM. Right now, that means it automatically locates/disassembles jump tables (as indicated by use of MOVA), and also tries to locate the MUT table (and seems to mostly succeed). Both are borrowed from acamus' onload.idc script; he should get the credit for the way I implemented determination of their locations.

At this point, the results appear to be almost as good as when I load up a virgin ROM in IDA, after onload.idc and sh3.cfg do their magic.

Next up: "empty space" scanning and .ORG directive generation (any block of "FF" larger than, say, five bytes, gets turned into a .ORG directive in the output), and output of any referenced RAM and hardware register addresses as .EQU directives (ie. "reg_PACRL .equ 0xFFFFF724"). Pretty soon, the output might actually be able to be fed back into gas directly.

Thinking further ahead, I'd like to add some kind of automated table extraction; ie. look for any "02 XX FF FF" and "03 XX FF FF" sequences that have references to them (or try to actually parse out sub_C28, sub_CC6, etc. calls; I'll need to get smarter about saving register data with generated code to pull that off, though), and at least label them as tables (tbl_XXXX, instead of unk_XXXX) or something. It's candy, but potentially useful, especially if I can bolt up the code I already have for parsing EcuFlash XML files, which would give me something to cross-reference auto-located tables against. I'll probably have to add some concept of an "array" of values at that point too, for auto-prettifying the output.

Oct 20, 2009 | 03:55 PM

#15

RoadSpike

Evolved Member

iTrader: (5)

Joined: Oct 2006

Posts: 3,805

Likes: 2

From: Sacramento, CA

Quote:

Originally Posted by logic

Thats pretty kick *** maybe i can port that to straight C for a faster program runtime and portability to other people without the need for python to be installed.

Reply Share

First
Prev
1 / 2
Next
Last