How to Reverse Engineer OS X and iOS Software

By  on  

The question of why we need to employ reverse engineering is an easy one to answer. When there is an executable, but no access to the source code, yet you still need to understand the inner workings of this particular software, you apply reverse engineering to it. Business situations where reverse engineering will be useful are many and they are very varied:

  • Researching complex software problem
  • Improving compatibility with closed solutions and formats
  • Improving interaction with a particular platform
  • Making maintenance of legacy code easier

There are many other cases where you need to reverse engineer software. In this article we will look at how to reverse engineer iOS app, as well as OS X software, and try to give you some practical advice on what you need to know and what tools you need to have.

Understanding binary structure

When reverse engineering a binary, you should now where executable code is situated inside it. Knowing binary structure is paramount in successfully learning how to reverse engineer software.

Executable binary format. Mach-O format of executable are very commonly used among the systems based on Mach kernel. It can be contained in either 'thin' binaries or 'fat' binaries. While thin binary has a sole Mach-O executable, fat binary can have many of them at once. Fat binaries are usually employed to combine executable code in a single file.

Header. The most important part of iOS or OS X executable is the header. Header is the first thing that loader reads when loading image. Thus, header is something that every binary starts with. Header always begins with a magic number that serves identification purposes. Different types of binaries employ different headers, with thin binaries using mach header and fat binaries using their own fat header, used to describe where all the mach headers in the binary are located.

The fat header starts with the 0xcafebabe magic number and contains information about every executable that resides in the binary file: CPU type and subtype, file offset and align values.

cyber

Fat Header of 'fat' executable

Mach Header of executable in 'thin' binary

Mach Header of the first executable in 'fat' binary

Each mach header starts with the 0xfeedface magic number and contains general information about the executable, such as target CPU type, subtype, loading options and load commands count and offset. Load commands provide the crucial information for loading image:

  • Sections and segments of executable, as well as their mapping to virtual memory
  • Paths of linked dynamic libraries
  • Symbol tables location
  • Code signature

Segments. Large parts of executable that are mapped to a certain virtual address space by the loader are called segments. Segments are divided by sections, each storing a certain type of information.

The 'TEXT' segment

Section 'text' of the 'TEXT' segment

All segments are byte streams. They start with command type and size, which may vary from one command to another.

Load Commands. Load commands describe each dynamic library dependency and include paths to corresponding binary files. Moreover, load commands also include locations of import and stub tables, symbol tables, as well as table that contains information for dynamic loader.

Load command for loading CoreFoundation binary

Symbol tables. All currently used symbols, both locally and externally defined, as well as stubs, generated via external calls executed via import table, are contained in the main symbol table.

Symbol table

Table is divided depending on whether the symbol is local, external, or debug. Each entry represents a certain part of the code via specifying name offset in the string table, ordinal section, type, or any other specific information.

Names of symbols from the main symbol table are contained in a separate string table. Another dynamic symbol table links each import table entry to the corresponding symbol. There is also a separate table with necessary data employed by dynamic loader for each external symbol.

Code signature. While code signature is freely available via open-source, just as in many other open-source projects, it is pretty badly documented. It can be viewed and managed by the codesign tool that allows to work with different signature parts. The location of code signature within the binary file is provided by the corresponding load command.

Load command for Code Signature

Code signature bytestream

Code signature itself contains its own header, which starts with the byte sequence:

  • Magic number (0xfade0cc0)
  • Code signature blob size
  • Slots count

The header is followed by the signature slot indexing byte-pair, where the first byte is a type and the second is a signature slot offset. Offset points to the beginning of a particular slot.

Each slot starts with the sequence of two bytes:

  • Magic number (0xfade0cc2)
  • Slot size

Code signature slots are intended to contain a number of important things such as:

  • Code directory
  • Signature requirements
  • Sealed resource description
  • Entitlements
  • Code signature

Table of hashes, as well as hash algorithm, code page size and table size are all contained within code directory. Table is divided into positive part, that has hashes of pages of executable code, and negative part, that optionally includes hashes of different code signature parts, as well as hash of info.plist.

Entitlements, resources and code signing requirements are simple bytestreams of particular files situated inside the bundle.

Code signature slot always contains encrypted code directory that uses CMS format. 

What else you need to know

Architecture. Nowadays, every desktop device employs x86-64 CPUs, while mobile devices usually use some variation of ARMv7 or ARMv8 architecture. Knowing instruction sets of a particular CPU architecture is very important for successful algorithm reverse-engineering. It is also very beneficial to know calling conventions and various ARM specifics, such as thumb mode or opcodes format.

Caches. In modern software, a single file called shared cache is used to merge all system frameworks and dylibs. It is located at /System/Library/Caches/com.apple.dyld/.

Reverse engineering tools

Mac provides some tools for iOS and OS X reverse engineering out of the box. Tools in question include:

  • lldb – powerful debugger
  • otool – console tool that can be used to view the content of Mach-O executables
  • nm – allows to view names and symbols contained inside Mach-O executables
  • codesign – provides detailed information about code signatures

Additionally, many third-party tools and utilities are also available, that can help you with OS X and iOS reverse engineering process. Examples of such tools include:

  • Interactive DisAssembler (IDA) – one of the most useful and important tools for conducting complex and detailed researches on executables.
  • MachOView – in terms of functionality this freeware tool is similar to otool and nm (in that it allows to view Mach-O files structure), but it presents information in a much more easy and intuitive way due to having a GUI. Main drawback of MachOView is that it is quite unstable.
  • Class-dump – tool, that allows to dump class declarations into normal headers from an executable one.
  • Hopper – great interactive shareware tool for iOS and OS X software reverse engineering.

Conclusion

Learning how to reverse engineer OS X software or iOS apps can be quite a challenge. It requires both advanced knowledge and experience with programming in order to understand the structure of the software and the intent of the person writing it. However, by going at it and sticking to it, anybody can learn OS X and iOS software reverse engineering, and the skills, gained in the process will greatly benefit you when it comes to improving your own software.

Dennis Turpitka

About Dennis Turpitka

Dennis Turpitka, CEO of the Apriorit, is an expert within Digital Security solution business design and development, Virtualization and Cloud Computing R&D projects, establishment and management of Software Research direction. Successful entrepreneur, who organized several security start-ups.

Recent Features

Incredible Demos

  • By
    iPad Detection Using JavaScript or PHP

    The hottest device out there right now seems to be the iPad. iPad this, iPad that, iPod your mom. I'm underwhelmed with the device but that doesn't mean I shouldn't try to account for such devices on the websites I create. In Apple's...

  • By
    Face Detection with jQuery

    I've always been intrigued by recognition software because I cannot imagine the logic that goes into all of the algorithms. Whether it's voice, face, or other types of detection, people look and sound so different, pictures are shot differently, and from different angles, I...

Discussion

    Wrap your code in <pre class="{language}"></pre> tags, link to a GitHub gist, JSFiddle fiddle, or CodePen pen to embed!