diff --git a/Ghidra/Features/Decompiler/certification.manifest b/Ghidra/Features/Decompiler/certification.manifest index 5e64a5bf6d..ccf3b465a8 100644 --- a/Ghidra/Features/Decompiler/certification.manifest +++ b/Ghidra/Features/Decompiler/certification.manifest @@ -1,6 +1,7 @@ ##VERSION: 2.0 ##MODULE IP: Crystal Clear Icons - LGPL 2.1 ##MODULE IP: FAMFAMFAM Icons - CC 2.5 +##MODULE IP: Modified Nuvola Icons - LGPL 2.1 ##MODULE IP: Oxygen Icons - LGPL 3.0 ##MODULE IP: Tango Icons - Public Domain Module.manifest||GHIDRA||||END| @@ -75,6 +76,8 @@ src/main/help/help/topics/DecompilePlugin/images/ForwardSlice.png||GHIDRA||||END src/main/help/help/topics/DecompilePlugin/images/Undefined.png||GHIDRA||||END| src/main/help/help/topics/DecompilePlugin/images/camera-photo.png||Tango Icons - Public Domain|||Tango|END| src/main/help/help/topics/DecompilePlugin/images/decompileFunction.gif||GHIDRA||reviewed||END| +src/main/help/help/topics/DecompilePlugin/images/document-properties.png||Tango Icons - Public Domain|||tango|END| +src/main/help/help/topics/DecompilePlugin/images/openFolder.png||Modified Nuvola Icons - LGPL 2.1||||END| src/main/help/help/topics/DecompilePlugin/images/page_edit.png||FAMFAMFAM Icons - CC 2.5||||END| src/main/help/help/topics/DecompilePlugin/images/page_white_copy.png||FAMFAMFAM Icons - CC 2.5||||END| src/main/help/help/topics/DecompilePlugin/images/reload3.png||Crystal Clear Icons - LGPL 2.1||||END| diff --git a/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml b/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml index 8eb4b3efb0..4b8fc27786 100644 --- a/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml +++ b/Ghidra/Features/Decompiler/src/main/doc/decompileplugin.xml @@ -1,4 +1,8 @@ + + +]> @@ -32,13 +36,13 @@ The Decompiler is a full Plug-in within Ghidra and can be configured to be enabled or disabled within any particular tool. Default configurations will have the - plug-in enabled, but if its disabled for some reason, it can be enabled from within - a Code Browser by selecting the menu option + plug-in enabled, but if it is disabled for some reason, it can be enabled from within + a Code Browser by selecting the - File -> Configure + File -> Configure... - Then click on the Configure link under the - Ghidra Core section and check the box next to + menu option, then clicking on the Configure link under the + Ghidra Core section and checking the box next to DecompilePlugin. @@ -57,7 +61,7 @@ -  icon +  icon in the tool bar, or @@ -69,8 +73,8 @@ The window automatically decompiles and displays the function at the - current address. The address is set typically by left-clicking in the Listing window, - or invoking the Goto command (pressing the 'g' key) and manually entering + current address. The address is set typically by left-clicking in the Listing, + or invoking the Go To... command (pressing the 'G' key) and manually entering the address or some other label, but the Decompiler window follows any type of navigation in the Code Browser, triggering decompilation of the new function being displayed. @@ -92,7 +96,7 @@ Capabilities - Some of the primary capabilities of the decompiler include: + Some of the primary capabilities of the Decompiler include: @@ -101,7 +105,7 @@ Recovering Expressions - The decompiler does full data-flow analysis which allows it to + The Decompiler does full data-flow analysis which allows it to perform slicing on functions: complicated expressions, which have been split into distinct operations/instructions and then mixed together with other instructions by the compiling/optimizing process, are @@ -113,7 +117,7 @@ Recovering High-Level Scoped Variables - The decompiler understands how compilers + The Decompiler understands how compilers use processor stacks and registers to implement variables with different scopes within a function. Data-flow analysis allows it to follow what was originally a single variable as it moves from @@ -128,7 +132,7 @@ Recovering Function Parameters - The decompiler understands the parameter passing conventions of + The Decompiler understands the parameter-passing conventions of the compiler and can reconstruct the original form of function calls. @@ -138,7 +142,7 @@ Using Data-type, Name, and Signature Annotations - The decompiler automatically pulls in + The Decompiler automatically pulls in all the different data types and variable names that the user has applied to functions, and the C output is altered to reflect this. High-level variables are appropriately named, structure @@ -152,7 +156,7 @@ Propagating Local Data-types - The decompiler infers the data-type of unlabeled variables + The Decompiler infers the data-type of unlabeled variables by propagating information from other sources throughout a function. @@ -161,7 +165,7 @@ Recovering Structure Definitions - The decompiler can be used to create structures that match the usage + The Decompiler can be used to create structures that match the usage pattern of particular functions and variables, automatically discovering component offsets and data-types. @@ -181,10 +185,10 @@ P-code P-code is Ghidra's Intermediate Representation (IR) language. When analyzing a function, - the decompiler translates every machine instruction into p-code first and performs its - analysis directly on the operators and variables of the language. Output of the decompiler + the Decompiler translates every machine instruction into p-code first and performs its + analysis directly on the operators and variables of the language. Output of the Decompiler is also best understood in terms of p-code. This section presents the key concepts of - p-code. For a more detailed discussion see the document "P-Code Reference Manual". + p-code. For a more detailed discussion see the document "P-Code Reference Manual." Address Space @@ -210,7 +214,7 @@ For a p-code model of a specific processor, all elements of the processor state (including RAM, registers, flags, etc.) must be contained in some address space. The model will define multiple address spaces - to accomplish this, and beyond the raw translation of machine instructions to p-code, the decompiler + to accomplish this, and beyond the raw translation of machine instructions to p-code, the Decompiler can add additional spaces. Address space definitions that are common across many different processors include: @@ -240,7 +244,7 @@ A space dedicated to temporary registers. - It is used to hold intermediate values when modeling instruction behavior, and the decompiler + It is used to hold intermediate values when modeling instruction behavior, and the Decompiler uses it to allocate space for variables that don't directly correspond to the low level processor state. The name unique is reserved for this purpose and is present in all processor models. @@ -252,7 +256,7 @@ A space that represents bytes explicitly indexed through a stack pointer. - This is an example of an address space added by the decompiler beyond what the raw processor + This is an example of an address space added by the Decompiler beyond what the raw processor model defines. The stack space is a logical construction representing the set of bytes a single function might access through its stack pointer. Each stack address represents the offset of a byte in some underlying space (usually ram) relative @@ -296,7 +300,7 @@ Varnodes by themselves do not necessarily have a data-type associated with them. - The decompiler ultimately assigns a formal data-type, but at the lowest level of p-code, + The Decompiler ultimately assigns a formal data-type, but at the lowest level of p-code, varnodes inherit one the building block data-types from the p-code operations that act on them: @@ -411,10 +415,10 @@ Operator Tokens Most opcodes naturally correspond to a particular C operator token, - and in decompiler output, many of the operator tokens displayed correspond - directly to a p-code operation present in the decompiler's internal + and in Decompiler output, many of the operator tokens displayed correspond + directly to a p-code operation present in the Decompiler's internal representation. The biggest exception are the Branching - operations; the decompiler uses standard high-level language control-flow + operations; the Decompiler uses standard high-level language control-flow structures, like if/else, switch, and do/while blocks, instead of the low-level branching operations. But even here, there is some correspondence @@ -796,7 +800,7 @@ P-code Control Flow - P-code has natural control-flow, with the subtlety that flow + P-code has natural control flow, with the subtlety that flow happens both within and across machine instructions. Most p-code operators have fall-through semantics, meaning that flow moves to the next operator in the sequence associated with the instruction, or, if the operator is the @@ -807,7 +811,7 @@ Ghidra labels a machine instruction with one of the following Flow Types that describe - its overall control-flow. The Flow Type is derived directly from the control-flow of the p-code for the instruction, + its overall control flow. The Flow Type is derived directly from the control flow of the p-code for the instruction, with the basic types corresponding directly with a specific branching p-code operator. @@ -846,8 +850,8 @@ not specified. - The decompiler treats a CALLOTHER operation as a black box. It will keep track of data - flowing into and out of the operation but won't simplify or transform it. In decompiler + The Decompiler treats a CALLOTHER operation as a black box. It will keep track of data + flowing into and out of the operation but won't simplify or transform it. In Decompiler output, a CALLOTHER is usually displayed using its unique name, with functional syntax showing its inputs and output. @@ -857,17 +861,17 @@ Callother-Fixup, which is substituted for the CALLOTHER operation during decompilation, or by other Analyzers that use p-code. Callother-Fixups are applied by Ghidra for specific processor or compiler variants, - and a user can choose to apply them to an individual Program. (See ) + and a user can choose to apply them to an individual Program (see ). Internal Decompiler Functions - Certain p-code operations can show up in decompiler output that cannot be represented + Certain p-code operations can show up in Decompiler output that cannot be represented as either an operator token, a cast operation, or other depiction that is natural to - the language. The decompiler generally tries to eliminate these, but this isn't always - possible. The decompiler resorts to a functional syntax for these kinds + the language. The Decompiler generally tries to eliminate these, but this isn't always + possible. The Decompiler resorts to a functional syntax for these kinds of p-code operations, displaying them as if they were built-in functions for the language. @@ -1002,7 +1006,7 @@ The HighFunction A HighFunction is the collection of specific information - produced by the decompiler about a function, referring to the root class in the Ghidra + produced by the Decompiler about a function, referring to the root class in the Ghidra source which holds this information. The HighFunction is made up of the following explicit objects: @@ -1022,18 +1026,18 @@ - The decompiler's output provides a standalone view of the function which is distinct + The Decompiler's output provides a standalone view of the function which is distinct from any annotations about the function that are present in the Program database - and displayed in the Listing view (although the output may be informed by these annotations). + and displayed in the Listing (although the output may be informed by these annotations). The terms HighFunction, HighVariable, and - HighSymbol refer to this decompiler specific view of the function. + HighSymbol refer to this Decompiler specific view of the function. HighSymbol A HighSymbol is one of the explicit symbols recovered by the - decompiler. It is made up of a name and data-type and can describe either: + Decompiler. It is made up of a name and data-type and can describe either: @@ -1051,10 +1055,10 @@ An important aspect of HighSymbols is that they are distinct from the standard Ghidra symbols stored in the Program database and are part of - the decompiler's separate view of the function. When the decompiler displays + the Decompiler's separate view of the function. When the Decompiler displays declarations for symbols in its output for instance, it is displaying HighSymbols, which may not directly match up with database symbols. - The decompiler is generally + The Decompiler is generally informed by annotations in the database and may copy specific symbols from the database into its view, but it is generally free to invent new symbols discovered during its analysis. @@ -1069,12 +1073,12 @@ Varnodes in the Decompiler - Varnodes are the central variable concept for the decompiler. - They form the individual nodes in the decompiler's data-flow representation + Varnodes are the central variable concept for the Decompiler. + They form the individual nodes in the Decompiler's data-flow representation of functions and are used during all stages of analysis. During the initial stages of analysis, varnodes simply represent specific storage locations that are accessed - in sequence by individual p-code operations. The decompiler immediately converts - the p-code into a graph based data-flow representation, called Static Single + in sequence by individual p-code operations. The Decompiler immediately converts + the p-code into a graph-based data-flow representation, called Static Single Assignment (SSA) form. In this form, the varnodes take on some additional attributes. @@ -1098,15 +1102,15 @@ - The scope extends via control-flow to each p-code operation that reads the + The scope extends via control flow to each p-code operation that reads the specific varnode as an operand. The value of the varnode between the defining p-code operation and the reading operations does not change. The scope of a varnode can be thought of as a set - of addresses within the function's body connected by control-flow. The address of the defining + of addresses within the function's body connected by control flow. The address of the defining p-code operation is referred to as the varnode's first use point or first use offset. - In the decompiler output for a specific high-level language like C or Java, + In the Decompiler output for a specific high-level language like C or Java, a varnode still has a scope and represents a variable in the high-level language only across this connected region of the code. A set of varnodes, with disjoint scopes, provides a complete @@ -1120,7 +1124,7 @@ A HighVariable is a set varnodes that, taken together, represent the storage of an entire variable in the high-level language - being output by the decompiler. Each varnode describes where the variable's + being output by the Decompiler. Each varnode describes where the variable's value is stored across some section of code. @@ -1148,7 +1152,7 @@ Merging Merging is the part of the analysis process where - the decompiler decides what varnodes get grouped together to create the final + the Decompiler decides what varnodes get grouped together to create the final HighVariables in the output. Each varnode's scope (see the discussion in ) provides the fundamental restriction on this process. Two varnodes cannot be merged if their scopes intersect. But this leaves a lot of @@ -1161,14 +1165,14 @@ to as forced merging. - The decompiler may also merge varnodes that could just as easily exist as separate + The Decompiler may also merge varnodes that could just as easily exist as separate variables. This is called speculative merging. - In addition to the intersection condition on varnode scopes, the decompiler only - speculatively merges variables that share the same data-type. Beyond this, the decompiler + In addition to the intersection condition on varnode scopes, the Decompiler only + speculatively merges variables that share the same data-type. Beyond this, the Decompiler prioritizes variable pairs that are read and written within the same instruction and - then pairs that are "near" each other in the control-flow of the function. + then pairs that are near each other in the control flow of the function. To a limited extent, users are able to control this kind of merging - (See ). + (see ). @@ -1185,12 +1189,12 @@ a calling convention and holds its specific rules and resource details. - Prototype models are architecture specific, and depending on the compiler, a single Program may make - use of multiple models. Subsequently, each distinct model has a name like __stdcall or - __thiscall. The decompiler makes use of the prototype model, as assigned to the function by the user or + Prototype models are architecture-specific, and depending on the compiler, a single Program may make + use of multiple models. Subsequently, each distinct model has a name like __stdcall or + __thiscall. The Decompiler makes use of the prototype model, as assigned to the function by the user or discovered in some other way, when performing its analysis of parameters. - It is possible for users to extend the set of prototype models available to a Program, - see . + It is possible for users to extend the set of prototype models available to a Program + (see ). A prototype model is typically used as a whole and is assigned by name to individual functions. But some of @@ -1209,12 +1213,61 @@ If the parameter is stored on the stack, the storage location is viewed as a constant offset in the stack space, where the offset is relative to the incoming value of stack pointer - (See the discussion in ). + (see the discussion in ). - The return value for the function, similarly, is stored at a single memory location. It - is guaranteed to be at that location only at points where the function is exited. There may be multiple exit - points, but they all share the same return value storage location. + The return value for the function, unless it is passed back on the stack, is also stored at a single + memory location. It is guaranteed to be at that location only at points where the function is exited. There may be multiple exit + points, but they all share the same return value storage location. For return values passed back on the stack, compilers + generally implement a special input register to hold the location where the value will be stored. See the + discussion of and the __return_storage_ptr__ below. + + + + Auto-Parameters + + Compiled binaries may pass values as parameters between functions that aren't in the formal + list of parameters as defined by the original source code for the program. These are referred to + as auto-parameters or sometimes hidden + parameters within the documentation. If the prototype model requires it, Ghidra will automatically + create an auto-parameter for a function to honor a user's request for a specific formal signature. + See Function Editor Dialog. + Because reverse engineers need to see them, the + Decompiler will generally display auto-parameters explicitly in function prototypes as part of its output, even though + they would not be present in the original source. + Ghidra explicitly defines two auto-parameters: + + + + + this + + + Within Object Oriented languages, a function defined as a class method + often has a this parameter pointing to an instantiation of the + class' structure data-type. Within Ghidra, functions with the __thiscall + calling convention are automatically assigned a this parameter. + If the function is part of a class namespace and the class has an associated structure, the + this parameter will be a pointer to the structure, otherwise + it will be a pointer to the void data-type. + + + + + __return_storage_ptr__ + + + Most calling conventions allow the value returned by a function, if it is large enough, to be passed back + on the stack instead of in a register. This is usually implemented by having the calling function + pass an additional input parameter that holds a pointer to the location on + the stack where the return value should be stored. Ghidra labels this special parameter as + __return_storage_ptr__, which will be a pointer to the + data-type of the return value. + + + + + @@ -1226,7 +1279,7 @@ These encompass a calling convention's saved registers, where a calling function can store values it doesn't want to change unexpectedly, but also may include other registers that are known not to change, like the stack pointer. - The decompiler uses the information to determine which locations can be safely propagated across + The Decompiler uses the information to determine which locations can be safely propagated across a called function. @@ -1252,15 +1305,15 @@ Disassemble machine instructions from the underlying bytes and - Produce the raw p-code consumed by the decompiler and other analyzers. + Produce the raw p-code consumed by the Decompiler and other analyzers. Specification files are selected based on the Language Id - assigned to the Program at the time it is imported into Ghidra. - (See Import Program) + assigned to the Program at the time it is imported into Ghidra + (see Import Program). x86:LE:32:default:windows @@ -1275,15 +1328,16 @@ Processor family Endianess Size of the address bus - Process variant + Processor variant Compiler producing the Program - A field with the value 'default' indicates either the preferred processor variant or the preferred compiler. + A field with the value default indicates either the preferred processor variant or the preferred compiler. Within the Ghidra installation, specification files are stored based on the overarching - processor family, such as 'MIPS' or 'x86'. For a specific family, files are located under + processor family, such as MIPS or + x86. For a specific family, files are located under <Root>/Ghidra/Processors/<Family>/data/languages @@ -1303,7 +1357,7 @@ These are the human readable SLEIGH language files. A single specification is rooted in one of the *.slaspec files, which may recursively include one or more *.sinc files. The format of these files is described - in the document "SLEIGH: A Language for Rapid Processor Specification". + in the document "SLEIGH: A Language for Rapid Processor Specification." @@ -1383,54 +1437,55 @@ Individual machine instructions make up the biggest source of information when the - decompiler analyzes a function. Instructions are translated from their - processor specific form into Ghidra's IR language (see ), + Decompiler analyzes a function. Instructions are translated from their + processor-specific form into Ghidra's IR language (see ), which provides both the control-flow behavior of the instruction and the detailed - semantics describing how the processor and memory state is affected. The translation is controlled by + semantics describing how the processor and memory state are affected. The translation is controlled by the underlying processor model and, except in limited circumstances, cannot be directly altered - from the tool. Flow Overrides (see below) can change how certain control-flow is translated, - and, depending on the processor, context registers may affect p-code (see ). + from the tool. Flow Overrides (see below) can change how certain control flow is translated + and, depending on the processor, how context registers affect p-code (see ). Outside of the tool, users can modify the model specification itself. - See the document "SLEIGH: A Language for Rapid Processor Specification". + See the document "SLEIGH: A Language for Rapid Processor Specification." - Decompiling a function starts by analyzing control-flow starting from the function's - first instruction. Control-flow is traced to additional instructions using flow information - from the underlying processor model. All paths are traced through instructions with - fall through, conditional jump, and other + Decompiling a function starts by analyzing the control flow of machine instructions. + Control flow is traced from the first instruction, through additional instructions depending + on their flow semantics (see ). All paths are traced through instructions with + any form of fall-through or jump semantics until an instruction with terminator semantics is - reached, which is usually a "return from subroutine" - instruction. Flow is not traced into called functions, in this situation. Instructions + reached, which is usually a formal return (return from subroutine) instruction. + Flow is not traced into called functions, in this situation. Instructions with call semantics are treated only as if they fall through. - An entry point is the address of the function's first instruction. + An entry point is the address of the instruction first + executed when the function is called. A function body is the set of addresses reached by control-flow - analysis (and the machine instructions at those addresses). + analysis and the machine instructions at those addresses. Entry Point The entry point address for a function plays a pivotal role for - analysis using the Ghidra decompiler. Ghidra generally associates + analysis using the Decompiler. Ghidra generally associates a formal Function Symbol and an underlying Function object at this address, which are the key elements that - need to be present to trigger decompilation. - (See Functions) + need to be present to trigger decompilation + (see Functions). The Function object stores the function body, parameters, local variables, and other information critical to the decompilation process. Function Symbols and Function objects are generally created automatically by a Ghidra - analyzer when initially importing a binary executable and running auto-analysis. - If necessary however, a user can manually create a Function object from the Listing window - by using Create Function command (pressing the 'f' key), when the cursor - is placed on the function's entry point. - (See Create Function) + analyzer when initially importing a binary executable and running Auto Analysis. + If necessary, however, a user can manually create a Function object from a Listing window + by using the Create Function command (pressing the 'F' key), when the cursor + is placed on the function's entry point + (see Create Function). @@ -1444,11 +1499,11 @@ to a navigation event to an arbitrary address. - The decompiler does not use the formal function body when it computes - control-flow; it recomputes its own idea of the function body starting from the entry point + The Decompiler does not use the formal function body when it computes + control flow; it recomputes its own idea of the function body starting from the entry point it is handed. If the formal function body was created manually, using a selection for instance, - or in other extreme circumstances, the decompiler's view of the function body may not match - the formal view. This can lead to confusing behavior, where clicking in a decompiler window + or in other extreme circumstances, the Decompiler's view of the function body may not match + the formal view. This can lead to confusing behavior, where clicking in a Decompiler window may unexpectedly navigate the window away from the function. @@ -1464,7 +1519,7 @@ Flow Overrides are applied by Analyzers or manually by the user. - The decompiler automatically incorporates any relevant Flow Overrides into its + The Decompiler automatically incorporates any relevant Flow Overrides into its analysis of a function. This can have a significant impact on results. The types of possible Flow Overrides include: @@ -1480,7 +1535,7 @@ the call target becomes the branch destination, and the instruction is no longer assumed to fall through. RETURN instructions become an - indirect branch, and the decompiler will attempt to recover branch + indirect branch, and the Decompiler will attempt to recover branch destinations using switch analysis. @@ -1535,20 +1590,20 @@
Comments - The decompiler automatically incorporates comments from the Program database into its + The Decompiler automatically incorporates comments from the Program database into its output. Comments in Ghidra are centralized and can be created and displayed by multiple - Program views, including the decompiler. Comments created from a decompiler window will - show up in the Listing window for instance, and vice versa. + Program views, including the Decompiler. Comments created from a Decompiler window will + show up in a Listing window for instance, and vice versa. - For the purposes of understanding comments within the decompiler, keep in mind that: + For the purposes of understanding comments within the Decompiler, keep in mind that: An individual comment is associated with a specific address in the Program. - There are 5 different kinds of comments. + There are 5 different types of comments: Plate @@ -1578,7 +1633,7 @@ Display - The decompiler collects and displays comments associated with any address in the + The Decompiler collects and displays comments associated with any address in the formal function body currently decompiling. The comments are integrated line by line into the decompiled code, and an individual comment is displayed on the line before the @@ -1588,27 +1643,27 @@ Because a single line of code typically encompasses multiple machine instructions, there is a possibility that multiple comments at different addresses apply to - the same line. In this case, the decompiler displays each comment on its + the same line. In this case, the Decompiler displays each comment on its own line, in address order, directly before the line of code. - Because the output of the decompiler can be a heavily transformed version compared - to the original machine instructions, its possible that individual instructions + Because the output of the Decompiler can be a heavily transformed version compared + to the original machine instructions, it is possible that individual instructions no longer have explicit tokens representing them in the output. Comments attached - to these instruction will still be displayed in the decompiler output with the + to these instruction will still be displayed in the Decompiler output with the closest associated line of code, usually within the same basic block. - By default, the decompiler displays only the Pre comments + By default, the Decompiler displays only the Pre comments within the body of the function. It also displays Plate comments, but only if they are attached to the entry point - of the function. In this case, they are displayed first in the decompiler output, + of the function. In this case, they are displayed first in the Decompiler output, along with WARNING comments, before the function declaration. Other comment - types can be configured to display in decompiler output, by changing the - decompiler Display options (See ). + types can be configured to be part of Decompiler output by changing the + Decompiler display options (see ). - Unlike the Listing window, the decompiler does not alter how a comment is + Unlike a Listing window, the Decompiler does not alter how a comment is displayed based on its type. All enabled types of comment are displayed in the same way, on a separate line before the line of code associated with the address. @@ -1618,7 +1673,7 @@ Unreachable Blocks - The decompiler may decide as part of its analysis that individual + The Decompiler may decide as part of its analysis that individual basic blocks are unreachable and not display them in the output. In this case, any comments associated with addresses in the unreachable block will also not be displayed. @@ -1628,9 +1683,9 @@ Warning Comments - The decompiler can generate internal warnings during its analysis and will incorporate - them into the output as comments in the same way as the user defined - comments described above. They are not part of Ghidra's comment system however and + The Decompiler can generate internal warnings during its analysis and will incorporate + them into the output as comments in the same way as the user-defined + comments described above. They are not part of Ghidra's comment system, however, and cannot be edited. They can be distinguished from normal comments by the word 'WARNING' at the beginning of the comment. @@ -1644,10 +1699,10 @@ Variable Annotations Variable annotations are the most important way to get names and data-types - that are meaningful to the user incorporated into the decompiler's output. + that are meaningful to the user incorporated into the Decompiler's output. A variable in this context is loosely defined as any piece of memory that code in the Program treats as a logical entity. - The decompiler works to incorporate all forms of annotation into its output + The Decompiler works to incorporate all forms of annotation into its output for any variable pertinent to the function being analyzed. @@ -1676,9 +1731,9 @@ local to a function. - Global variables annotations are created from the tool by applying a data-type to a memory - location in the Listing window, either by invoking a command from the Data - pop-up menu, or dragging a data-type from the Data Type Manager + Global variable annotations are created from the tool by applying a data-type to a memory + location in a Listing window, either by invoking a command from the Data + pop-up menu or by dragging a data-type from the Data Type Manager window directly onto the memory location. Refer to the documentation: @@ -1689,7 +1744,7 @@ - Local variables annotations are created from the Listing from various editor dialogs. See in particular: + Local variables annotations are created from the Listing using various editor dialogs. See, in particular: @@ -1734,7 +1789,7 @@ In order to widely accommodate different use cases, Ghidra's symbol table has extremely lax naming rules. Ghidra may allow names that conflict with the stricter rules of the language - the decompiler is attempting to produce. The decompiler does not currently + the Decompiler is attempting to produce. The Decompiler does not currently have an option that checks for this. Users should be aware of: @@ -1755,7 +1810,7 @@ Ghidra allows different functions to have the same name, even within the same namespace, in order to model languages that support function overloading. In most languages, such functions would be expected to have distinct prototypes to allow - the symbols to be distinguished in context. Ghidra and the decompiler however do not check + the symbols to be distinguished in context. Ghidra and the Decompiler, however, do not check for this, as prototypes may not be known. @@ -1767,10 +1822,10 @@ Variable Scope - All variables belong either to a global or local - scope, which directly affects how the variable is treated in the decompiler's data-flow + All variables belong to either a global or local + scope, which directly affects how the variable is treated in the Decompiler's data-flow analysis. - Annotations created by applying a data-type directly to a memory location in the listing + Annotations created by applying a data-type directly to a memory location in the Listing are automatically added to the formal global namespace. Ghidra can create other custom namespaces that are considered global in this sense, and renaming actions provide options that let individual global annotations be moved into @@ -1779,7 +1834,7 @@ create variable annotations that are local to that function. - A global variable annotation forces the decompiler to treat the memory location as if its value + A global variable annotation forces the Decompiler to treat the memory location as if its value persists beyond the end of the function. The variable must exist at all points of the function body, generally at the same memory location. @@ -1789,7 +1844,7 @@ at the instruction that first writes to them, and then exist only up to the last instruction that reads them. The memory location storing a local variable at one point of the function may be reused for different variables at other points. - This can cause ambiguity in how the decompiler should treat a given memory location used + This can cause ambiguity in how the Decompiler should treat a given memory location used for storing local variables, which the user may want to steer. See the discussion in . @@ -1806,17 +1861,17 @@ data-types that are tailored for the Program being analyzed. Data-types that are explicitly part of a variable annotation are, to the extent possible, automatically incorporated - into the decompiler's analysis. + into the Decompiler's analysis. Data-types Supported by the Decompiler - The decompiler understands traditional primitive data-types, in all their various sizes, + The Decompiler understands traditional primitive data-types in all their various sizes, like integers, floating-point numbers, booleans, and characters. It also understands pointers, structures, and arrays, letting it support arbitrarily complicated composite data-types. Ghidra provides some data-types with specialized display capabilities that don't have a natural representation - in the high-level language output by the decompiler. The decompiler treats these as + in the high-level language output by the Decompiler. The Decompiler treats these as black-box data-types, preserving the name, but treating the underlying data either as an integer or simply as an array of bytes. @@ -1824,26 +1879,26 @@ Undefined - The undefined data-types are supported, in their various sizes: + The undefined data-types are supported in their various sizes: undefined1, undefined2, undefined4, etc. In Ghidra, the undefined - data-types, let the user specify the size of a variable, while formally declaring that + data-types let the user specify the size of a variable, while formally declaring that other details about the data-type are unknown. - For the decompiler, undefined data-types as an annotation have the important special meaning - that the decompiler should let its analysis determine the final data-type presented in the - output for the variable (See below). + For the Decompiler, undefined data-types, as an annotation, have the important special meaning + that the Decompiler should let its analysis determine the final data-type presented in the + output for the variable (see below). Void The void data-type is supported but treated specially by - the decompiler, as does Ghidra in general. A void can be + the Decompiler, as does Ghidra in general. A void can be used to indicate the absence of a return value in function prototypes, but cannot be used as a general annotation on variables. A void pointer, void *, - is possible; the decompiler treats it as a pointer to an unknown data-type. + is possible; the Decompiler treats it as a pointer to an unknown data-type. @@ -1851,7 +1906,7 @@ Integer data-types, both signed and unsigned, are supported up to a size of 8 bytes. Larger sizes are supported internally but are generally represented as an array of bytes in - decompiler output. Odd integer sizes are also supported. + Decompiler output. Nonstandard integer sizes of 3, 5, 6, and 7 bytes are also supported. The standard C data-type names: int, short, @@ -1871,15 +1926,15 @@ Floating-point sizes of 4, 8, 10, and 16 are supported, mapping in all cases currently to the float, double, float10, and float16 - data-types respectively. The decompiler currently cannot display floating-point constants + data-types, respectively. The Decompiler currently cannot display floating-point constants that are bigger than 8 bytes. Character - ASCII or Unicode encoded character data-types are supported for sizes of 1, 2, and 4. The size effectively - chooses between the UTF8, UTF16, and UTF32 character encodings respectively. The standard + ASCII- and Unicode-encoded character data-types are supported for sizes of 1, 2, and 4. The size effectively + chooses between the UTF8, UTF16, and UTF32 character encodings, respectively. The standard C data-type names char and wchar_t are mapped to one of these sizes based on the processor and compiler selected when importing the Program. @@ -1888,11 +1943,11 @@ String - Terminated strings, encoded either in ASCII or Unicode, are supported. The decompiler converts + Terminated strings, encoded either in ASCII or Unicode, are supported. The Decompiler converts Ghidra's dedicated string data-types like string to - an "array of characters" data-type, such as char[], + an array-of-characters data-type, such as char[], where the character size matches the encoding. - A "pointer to character" data-type like + A pointer-to-character data-type like @@ -1903,11 +1958,11 @@ - is also treated as a potential string reference. The decompiler can infer terminated strings if this + is also treated as a potential string reference. The Decompiler can infer terminated strings if this kind of data-type propagates to constant values during its analysis. - Strings should be fully rendered in decompiler output, + Strings should be fully rendered in Decompiler output, with non-printable characters escaped using either traditional sequences like '\r', '\n' or using Unicode escape sequences like '\xFF'. @@ -1916,22 +1971,22 @@ Pointer Pointer data-types are fully supported. A pointer to any other supported data-type is - possible. The data-type being pointed to, whether its a primitive, structure, or another pointer, - informs how the decompiler renders a dereferenced pointer. - The decompiler assumes that a pointer variable may refer to an array of + possible. The data-type being pointed to, whether it is a primitive, structure, or another pointer, + informs how the Decompiler renders a dereferenced pointer. + The Decompiler assumes that a pointer variable may refer to an array of the underlying data-type and will use array notation if there is evidence of more than one element. The default pointer size is set based on the processor and compiler selected when the Program is - imported and generally matches the size of the ram (or equivalent) - address space. Different pointer sizes within the same Program are possible. The decompiler generally + imported and generally matches the size of the ram or equivalent + address space. Different pointer sizes within the same Program are possible. The Decompiler generally expects the pointer size to match the size of the address space being pointed to, but individual architectures can model different size pointers into the space (such as near pointers). For processors with more than one memory address space, pointer data-types currently cannot be directly - annotated to indicate a preferred address space. Where there is ambiguity, the decompiler attempts to + annotated to indicate a preferred address space. Where there is ambiguity, the Decompiler attempts to determine the correct address space from the context of its use within the function. @@ -1945,21 +2000,21 @@ Structure - Structured data-types are fully supported. The decompiler does not automatically infer structures - when analyzing a function; it propagates structured data-types into the function from explicitly - annotated sources, like input parameters or global variables. Decompiler directed creation of - structures can be triggered by the user, see . + Structure data-types are fully supported. The Decompiler does not automatically infer structures + when analyzing a function; it propagates them into the function from explicitly + annotated sources, like input parameters or global variables. Decompiler-directed creation of + structures can be triggered by the user (see ). Enumeration - Enumerations are fully supported. The decompiler can propagate enumerations from explicitly + Enumerations are fully supported. The Decompiler can propagate enumerations from explicitly annotated sources throughout a function onto constants, which are then displayed with the appropriate label from the definition of the enumeration. If the constant does not match a - single value in the enumeration definition, the decompiler attempts to build a matching + single value in the enumeration definition, the Decompiler attempts to build a matching value by or-ing together multiple labels. - The decompiler can be made to break out constants representing packed flags, + The Decompiler can be made to break out constants representing packed flags, for instance, by labeling individual bit values within an enumeration. @@ -1968,7 +2023,7 @@ A Function Definition in Ghidra is a data-type that encodes information about the parameters and return value for a generic/unspecified function. - A formal function pointer is supported by the decompiler as a pointer + A formal function pointer is supported by the Decompiler as a pointer data-type that points to a Function Definition. A Function Definition specifically encodes: @@ -1982,7 +2037,7 @@ The data-type associated with the return value. - An indicator of the prototype model that should be + The name of a generic calling convention associated with the function. @@ -1990,25 +2045,27 @@ The Function Definition itself does not encode any storage information. Once the Function - Definition is associated with a Program, the indicator maps to one of the prototype models for the - specific processor and compiler. A Function Definition is currently limited to a prototype model + Definition is associated with a Program, its generic calling convention maps to one of the + specific prototype models for the processor and compiler. The prototype model is then used + to assign storage for parameters and return values, wherever the Function Definition is applied. + A Function Definition is currently limited to a prototype model with one of the following names: - __stdcall + __stdcall - __thiscall + __thiscall - __fastcall + __fastcall - __cdecl + __cdecl - __vectorcall + __vectorcall @@ -2019,69 +2076,68 @@ Forcing Data-types - The decompiler performs type propagation as part of its analysis - on functions. Data-type information is collected from variable annotations (and other sources), - which is then propagated via data-flow throughout the function to other variables and + The Decompiler performs type propagation as part of its analysis + on functions. Data-type information is collected from variable annotations and other sources, + which is then propagated via data flow throughout the function to other variables and constants where the data-type may not be immediately apparent. - With few exceptions, a variable annotation is forcing on the decompiler in the sense + With few exceptions, a variable annotation is forcing on the Decompiler in the sense that the storage location being annotated is considered an unalterable data-type source. During type propagation, the data-type may propagate to other variables, but the variable representing the storage location being annotated is guaranteed to have the given name and that data-type; it will not be overridden. - Users should be aware that variable annotations are forcing on the decompiler and may directly + Users should be aware that variable annotations are forcing on the Decompiler and may directly override aspects of its analysis. Because of this, variable annotations are the most powerful way - for the user to affect decompiler output, but setting an incomplete (or incorrect) data-type as - part of an annotation may produce poorer decompiler output. + for the user to affect Decompiler output, but setting an incomplete or incorrect data-type as + part of an annotation may produce poorer Decompiler output. The major exception to forcing annotations is if the data-type in the annotation is undefined. - Ghidra reserves the following names to represent formally undefined data-types: + Ghidra reserves specific names to represent formally undefined data-types, such as: undefined1 undefined2 undefined4 undefined8 - ... These allow annotations to be made even when the user doesn't have information about a variable's data-type. The number in the name only specifies the number of bytes in the variable. - The decompiler views a variable annotation with an undefined data-type only as an indication of what name + The Decompiler views a variable annotation with an undefined data-type only as an indication of what name should be used if a variable at that storage address exists. The data-type for the variable is filled in, using type propagation from other sources. For annotations that specifically label a function's formal parameters or return value, - the Signature Source also affects how they're treated by the decompiler. + the Signature Source also affects how they're treated by the Decompiler. If the Signature Source is set to anything other than DEFAULT, there is a forced - one-to-one correspondence between variable annotations and actual parameters in the decompiler's - view of the function. This is stronger than just forcing the data-type; the existence (or not) of + one-to-one correspondence between variable annotations and actual parameters in the Decompiler's + view of the function. This is stronger than just forcing the data-type; the existence or nonexistence of the variable itself is forced by the annotation in this case. If the Signature Source is forcing and there are no parameter annotations, a void prototype is forced on the function. A forcing Signature Source is set typically if debug symbols for the function are read in during - Program import (IMPORTED), or if the user manually edits the function prototype + Program import (IMPORTED) or if the user manually edits the function prototype directly (USER_DEFINED). If an annotation and the Signature Source force a parameter to exist, specifying an - undefined data-type in the annotation still directs the decompiler to fill in + undefined data-type in the annotation still directs the Decompiler to fill in the variable's data-type using type propagation. The same holds true for the return value; an - undefined annotation fixes the size of the return value, but the decompiler + undefined annotation fixes the size of the return value, but the Decompiler fills in its own data-type. - The decompiler may still use an undefined data-type to label a variable, + The Decompiler may still use an undefined data-type to label a variable, even after type propagation. If a variable is simply copied around within a function and there - are no other substantive operations or annotations on the variable, the decompiler may decide the undefined + are no other substantive operations or annotations on the variable, the Decompiler may decide the undefined data-type is appropriate. @@ -2094,10 +2150,10 @@ Every variable annotation is associated with a single storage location, where the value of the variable is stored during execution: generally a register, stack location, or an address in the load image of the Program. The storage location does not necessarily hold the value for that - variable at all points of execution, and its possible for the variable value to be held in + variable at all points of execution, and it is possible for the variable value to be held in different storage locations at different points of execution. The set of execution points where the storage location does hold the variable value is called the annotation - scope; this is distinct from (but influenced by) the scope of the + scope; this is distinct from, but influenced by, the scope of the variable itself. The different types of storage location are listed below. @@ -2106,21 +2162,21 @@ A load-image address is a concrete address in the load image of the Program, typically in the ram address space. This kind of storage must be backed by a formal memory block for the Program, which typically corresponds to a specific - program section (such as the .text or .bss section). Because it is in the - load image directly, an annotation with this storage shows up directly in the Listing + program section, such as the .text or .bss section. Because it is in the + load image directly, an annotation with this storage shows up directly in any Listing window and can be directly manipulated there. In much of the Ghidra documentation, these annotations - are referred to as Data. See the section - Data in particular. + are referred to as Data. See the + Data section, in particular. - Although specific architectures may vary, generally a storage location at a load image address + Although specific architectures may vary, a storage location at a load image address generally represents a formal global variable, and the annotation is in scope - across all Program execution. For the decompiler, the storage location is treated as a + across all Program execution. For the Decompiler, the storage location is treated as a a single persistent variable in all functions that reference it. Within a - function, all distinct references to the storage location (varnodes) are merged. The decompiler + function, all distinct references (varnodes) to the storage location are merged. The Decompiler expects a value at the storage location to exist from before the start of the function, and any change to the value must be explicitly represented as an assignment to - the variable in decompiler output. + the variable in Decompiler output. @@ -2135,26 +2191,26 @@ single function and the variable must be local to that function. - Within the Listing window, a stack annotation is displayed as part of the function header - (at the entry point address of the function), with a syntax similar to: + Within a Listing window, a stack annotation is displayed as part of the function header + at the entry point address of the function, with a syntax similar to: undefined4 Stack[-0x14]:4 local_14 The middle field (the Variable Location field) indicates that the storage location is on the - stack, and the value in brackets indicates the offset of the storage location, relative to the incoming + stack, and the value in brackets indicates the offset of the storage location relative to the incoming stack pointer. The value after the colon indicates the number of bytes in the storage location. Currently, the entire body of the function is included - in the scope of any stack annotation, and the decompiler will allow only a single variable to exist + in the scope of any stack annotation, and the Decompiler will allow only a single variable to exist at the stack address. A stack annotation can be a formal parameter to the function, but otherwise the - decompiler does not expect to see a value that exists before the start of the function. + Decompiler does not expect to see a value that exists before the start of the function. - The decompiler will continue to perform copy propagation and other transforms on - stack locations associated with a variable annotation. In particular, within decompiler output, - a specific write operation to a stack address may not show up as an explicit assignment to its variable, - if the value is simply copied to another location. + The Decompiler will continue to perform copy propagation and other transforms on + stack locations associated with a variable annotation. In particular, within Decompiler output, + if the value is simply copied to another location, + a specific write operation to a stack address may not show up as an explicit assignment to its variable. @@ -2162,7 +2218,7 @@ A variable annotation can refer to a specific register for the processor associated with the Program. In general, such an annotation will be for a variable local to a particular function. - Within the Listing window, this annotation is displayed as part of the function header, with + Within a Listing window, this annotation is displayed as part of the function header, with syntax like: int EAX:4 iVar1 @@ -2171,32 +2227,32 @@ the annotation, and the value after the colon indicates the number of bytes in the register. - For a local variable annotations with a register storage location, there is an expectation that the + For local variable annotations with a register storage location, there is an expectation that the register may be reused for different variables at different points of execution within the function. There may be more than one annotation, for different variables, that share the same register storage location. An annotation is associated with a first use point that describes where - the register first holds a value for the particular variable. (See the discussion - ) + the register first holds a value for the particular variable (see the discussion - ). The entire scope of the annotation is limited to the address regions between the first use point - and any points where the value is read. The decompiler may extend the scope as part of its + and any points where the value is read. The Decompiler may extend the scope as part of its merging process, but the full extent is not stored in the annotation. - Temporary Registers + Temporary Register Variable annotations can have a temporary register as a storage location. A temporary register is not specific to a processor but is produced at various stages of the decompilation process. See the discussion of the unique space in . These registers do not have a meaningful name, and - the specific storage address may change on successive decompilations. So within the - Listing window, this annotation is displayed as part of the function header, + the specific storage address may change on successive decompilations. So, within a + Listing window, this annotation is displayed as part of the function header with syntax like: int HASH:5f96367122:4 iVar2 The Variable Location field displays the internal hash used to uniquely - identify the temporary register within the data-flow of the function. + identify the temporary register within the data flow of the function. A temporary register annotation must be for a local variable, and as with an ordinary register, @@ -2212,7 +2268,7 @@ Every formal Function in Ghidra is associated with a set of variable annotations and other properties that make up the function prototype. Due to the nature of reverse engineering, - the function prototype may only include partial information and may be built up over time. Individual + the function prototype may include only partial information and may be built up over time. Individual elements include: @@ -2222,9 +2278,10 @@ Each formal input to the function can have a Variable Annotation that describes its name, data-type, - and storage location, at the moment control-flow enters the function. If annotations exist, they are shown - in the Listing Window as part of the Function header, and they usually correspond directly with symbols in the - function declaration produced by the decompiler. + and storage location. The storage location applies at the moment control flow enters the function. + If annotations exist, they are shown + in a Listing window as part of the Function header, and they usually correspond directly with symbols in the + function declaration produced by the Decompiler. @@ -2233,10 +2290,26 @@ The value returned by a function can have a special Variable Annotation that describes its data-type - and storage location, at the moment control-flow exits the function. If it exists, the annotation is shown - in the Listing Window as part of the Function header with the name <RETURN>, and it usually + and storage location. The storage location applies at the moment control flow exits the function. If it exists, the annotation is shown + in a Listing window as part of the Function header with the name <RETURN>, and it usually corresponds directly with the return value in the function declaration produced by - the decompiler. + the Decompiler. + + + + + Auto-Parameters + + + Specific prototypes may require auto-parameters like this + or __return_storage_ptr__. These are special input parameters + that compilers may use to implement specific high-level language concepts. See the discussion + in . Within Ghidra, auto-parameters are automatically created by the + Function Editor Dialog + if the desired prototype requires them. + Within a Listing window, auto-parameters look like other parameter annotations, but the storage field shows the + string (auto). Decompiler output will generally display auto-parameters as explicit variables + rather than hiding them. @@ -2244,20 +2317,20 @@ Calling Convention - The calling convention used by the function can be specified as part of the function prototype. The convention + The calling convention used by the function is specified as part of the function prototype. The convention is specified by name, referring to the formal that describes how storage locations are selected for individual parameters along with other information about how the compiler treats - the function. Available models are determined by the processor and compiler, but can be extended by the user. - See . + the function. Available models are determined by the processor and compiler, but may be extended by the user + (see ). - In the absence of parameter and return value annotations, the decompiler will use the prototype model as + In the absence of input parameter and return value annotations, the Decompiler will use the prototype model as part of its analysis to discover the input parameters and the return value of the function. - The name "unknown" is reserved to indicate that nothing is known about the calling convention. If - set to "unknown", depending on context, the decompiler may assign the calling convention based on - the Prototype Evaluation option (See ), or it + The name unknown is reserved to indicate that nothing is known about the calling convention. If + set to unknown, depending on context, the Decompiler may assign the calling convention based on + the Prototype Evaluation option (see ), or it may use the default calling convention for the architecture. @@ -2267,10 +2340,10 @@ Functions have a boolean property called variable arguments, which can be turned on - if the function is capable of being passed a variable number of inputs. This property informs the decompiler that + if the function is capable of being passed a variable number of inputs. This property informs the Decompiler that the function may take additional parameters beyond any with an explicit variable annotation. This affects decompilation of any function which calls the variable arguments function, allowing - the decompiler to discover unlisted parameters at a given call site. + the Decompiler to discover unlisted parameters at a given call site. @@ -2278,9 +2351,9 @@ No Return - A function can be marked explicitly as not returning, meaning that once - a call is made to the function, execution will never return to the caller. The decompiler uses this to - compute the correct control-flow in any calling functions. + A function can be marked with the no return property, meaning that once + a call is made to the function, execution will never return to the caller. The Decompiler uses this to + compute the correct control flow in any calling functions. @@ -2288,13 +2361,13 @@ In-Line - If the boolean property in-line is turned on for a particular function, - it directs the decompiler to inline the effects of the function into the decompilation of any of its calling functions. - The function will no longer appear as a direct function call in the decompilation, but all of its data-flow + If the in-line property is turned on for a particular function, + it directs the Decompiler to inline the effects of the function into the decompilation of any of its calling functions. + The function will no longer appear as a direct function call in the decompilation, but all of its data flow will be incorporated into the calling function. - This is useful for bookkeeping functions, where its important for the decompiler to + This is useful for bookkeeping functions, where it is important for the Decompiler to see its effects on the calling function. Functions that set up the stack frame for a caller or functions that look up or dispatch a switch destination are typical examples that should be marked in-line. @@ -2305,7 +2378,7 @@ This property is similar in spirit to marking a function as in-line. - A call-fixup directs the decompiler to replace any call to the function with a specific + A call-fixup directs the Decompiler to replace any call to the function with a specific chunk of raw p-code. The decompilation of any calling function no longer shows the function call, but the chunk of p-code incorporates the called function's effects. @@ -2316,7 +2389,7 @@ Call-fixups are specified by name. The name and associated p-code chunk are typically defined in the compiler specification for the Program. Users can extend the available set - of call-fixups. See . + of call-fixups (see ). @@ -2329,7 +2402,7 @@ Ghidra records a Signature Source for every function, indicating the origin of its prototype information. This is similar to the Symbol Source attached to Ghidra's symbol annotations - (See the documentation for + (see the documentation for Filtering in the Symbol Table). The possible types are: @@ -2354,25 +2427,25 @@ If the Signature Source is set to anything other than DEFAULT, the - function's prototype information is forcing on the decompiler. See the discussion - in + function's prototype information is forcing on the Decompiler (see the discussion + in ). Discovering Parameters The input parameter and return value annotations of the function prototype, like - any variable annotations, can be forcing on the decompiler. - See the complete discussion in . + any variable annotations, can be forcing on the Decompiler + (see the complete discussion in ). But keep in mind: - The input parameters and return value are all forced on the decompiler as a unit based on the + The input parameters and return value are all forced on the Decompiler as a unit based on the Signature Source. They are all forced if the type is set to anything other than DEFAULT; otherwise none of them are forced. - If the function prototype's annotations are not forced, the decompiler will attempt to discover the parameters + If the function prototype's annotations are not forcing, the Decompiler will attempt to discover the parameters and return value using the calling convention. The prototype model underlying the calling convention dictates which storage locations can be considered as parameters and their formal ordering. @@ -2387,7 +2460,7 @@ can be built for the function. - The decompiler will disregard the calling convention's rules in this situation and use the custom storage + The Decompiler will disregard the calling convention's rules in this situation and use the custom storage locations for parameters and the return value. Other aspects of the calling convention, like the unaffected list, will still be used. @@ -2397,8 +2470,8 @@
Data Mutability - Mutability is a description of how values in a specific memory region - (either a single variable or a larger block) can change during Program execution, based either on + Mutability is a description of how values in a specific memory region, + either a single variable or a larger block, can change during Program execution based either on properties or established rules. Ghidra recognizes the mutability settings: @@ -2407,17 +2480,17 @@ Volatile - Mutability affects decompiler analysis and can have a large impact the output. + Mutability affects Decompiler analysis and can have a large impact on the output. - Most memory has normal mutability, meaning: + Most memory has normal mutability; the value at the memory location may change over the course of executing the Program, but for a given section of code, the value will not change unless an instruction explicitly writes to it. Mutability can be set on an entire block of memory in the Program, typically from the Memory Map. - It can also be set as part of a single Variable Annotation. From the Listing Window for instance, + It can also be set as part of a single Variable Annotation. From a Listing window, for instance, use the Settings dialog. @@ -2425,9 +2498,9 @@ The constant mutability setting indicates that values within the memory region are read-only and don't change during Program execution. If a read-only variable is - accessed in a function being analyzed by the decompiler, its constant value, if present in the - Program's load image, replaces the variable within data-flow for the - function. The decompiler may propagate the constant and fold it in to other operations, which + accessed in a function being analyzed by the Decompiler, its constant value, if present in the + Program's load image, replaces the variable within data flow for the + function. The Decompiler may propagate the constant and fold it in to other operations, which can have a substantial impact on the final output. @@ -2436,17 +2509,17 @@ The volatile mutability setting indicates that values within the memory region may change unexpectedly, even if the code currently executing does not directly - write to it. If a volatile variable is accessed in a function being analyzed by the decompiler, + write to it. If a volatile variable is accessed in a function being analyzed by the Decompiler, each specific access is replaced with a built-in function call, which prevents constant propagation and other transforms across the access. The built-in functions are named based on - whether the access is a read or write and then the size - of the access. Within decompiler output, the first parameter to a built-in function is a symbol + whether the access is a read or write and on the size + of the access. Within the Decompiler output, the first parameter to a built-in function is a symbol indicating the volatile variable. The function returns a value in the case of a volatile read or takes a second parameter in the case of a volatile write. - write_volatile_1(DAT_mem_002b,0x20); X = read_volatile_2(SREG); + write_volatile_1(DAT_mem_002b,0x20); @@ -2456,14 +2529,14 @@
Constant Annotations - Ghidra provides numerous actions to control how specific constants are formatted or displayed. - An annotation can be applied directly to a constant in the Decompiler Window, which always affects - decompiler output. Or, an annotation can be applied to the constant operand of a specific machine - instruction displayed in the Listing Window. In this case, to the extent possible, the decompiler - attempts to track the operand and apply the annotation to the matching constant in the decompiler output. - However, the constant may be transformed from its value in the original machine instruction during the decompiler's - analysis. The decompiler will follow the constant through simple transformations, but if the constant strays - too far from its original value, the annotation will not be applied. The transforms followed are: + Ghidra provides numerous actions to control how a specific constant is formatted or displayed. + An annotation can be applied directly to a constant in a Decompiler window, which always affects + Decompiler output. Or, an annotation can be applied to the constant operand of a specific machine + instruction displayed in a Listing window. In this case, to the extent possible, the Decompiler + attempts to track the operand and apply the annotation to the matching constant in the Decompiler output. + However, the constant may be transformed from its value in the original machine instruction during the Decompiler's + analysis. The Decompiler will follow the constant through one of the following simple transformations, but + otherwise the annotation will not be applied. Signed or zero extension @@ -2479,56 +2552,54 @@ Ghidra can create an association between a name and a constant, called an equate. An equate is a descriptive string that is intended to replace the numeric form of the constant, and equates across the entire Program can be viewed from the - Equate Table. + Equates Table. An equate can be applied to a machine instruction with a constant operand by using the Set Equate - menu from the Listing Window. If the decompiler successfully follows the operand to a matching constant, - the equate's name is displayed as part of the decompiler's output as well as in the Listing Window. - A transformed operand is displayed as an expression, where the transforming operations are applied to - the equate symbol (representing the original constant). + menu from a Listing window. If the Decompiler successfully follows the operand to a matching constant, + the equate's name is displayed as part of the Decompiler's output as well as in any Listing window. + A transformed operand is displayed as an expression, where the transforming operation is applied to + the equate symbol representing the original constant. - Alternately an equate can be applied directly to a constant from the Decompiler Window using its + Alternatively, an equate can be applied directly to a constant from a Decompiler window using its menu. The constant may or may not have a corresponding instruction - operand but will be displayed in decompiler output using the descriptive string. - - - + operand but will be displayed in Decompiler output using the descriptive string. Format Conversions - Ghidra can apply a format conversion to integer constants that are displayed - in decompiler output. + Ghidra can apply a format conversion to any integer constant that is displayed + in Decompiler output. A conversion can be applied to the machine instruction containing the constant as an operand using the Convert menu option - from the Listing Window. If the decompiler successfully traces the operand to a matching constant, - the format conversion is applied in the decompiler output as well as in the Listing Window. + from a Listing window. If the Decompiler successfully traces the operand to a matching constant, + the format conversion is applied in the Decompiler output as well as in the Listing window. - Alternately, a conversion can be applied directly to an integer constant in the - Decompiler Window using its menu option. The constant may or may not - have a corresponding instruction operand but is displayed in decompiler output using the conversion. + Alternately, a conversion can be applied directly to an integer constant in a + Decompiler window using its menu option. The constant may or may not + have a corresponding instruction operand but is displayed in Decompiler output using the conversion. - Conversions applied by the decompiler are currently limited to: + Conversions applied by the Decompiler are currently limited to: Binary - 0b01100001 - Decimal- 97 + Decimal - 97 Hexadecimal - 0x61 Octal - 0141 Char - 'a' - An appropriate header matching the format is prepended to the representation string, either "0b", "0x" or just - "0". The decompiler will not switch the signedness of the constant but preserves the signed or unsigned data-type - as determined by analysis. + If necessary, a header matching the format is prepended to the representation string, either "0b", "0x" or just + "0". A conversion will not switch the signedness of the constant; the signed or unsigned data-type associated + with the constant, as determined by analysis, is preserved. If the constant is negative, with a signed data-type, + the representation string will always start with a '-' character.
@@ -2539,15 +2610,15 @@ A register value in this context is a region of code in the Program where a specific register holds a known constant value. Ghidra maintains an explicit list of these values for the Program (see the documentation for Register Values), - which the decompiler can use when analyzing a function. - A register value benefits decompiler analysis, especially if the original compiler was aware - of the constant value, as the decompiler can recover address references calculated as offsets relative to the register + which the Decompiler can use when analyzing a function. + A register value benefits Decompiler analysis, especially if the original compiler was aware + of the constant value, as the Decompiler can recover address references calculated as offsets relative to the register and otherwise propagate the constant. - A register value is set by highlighting the region of code in the Listing Window and then invoking the - Set Register Values ... command - from the pop-up menu. The beginning and end of a region is indicated in the Listing Window with + A register value is set by highlighting the region of code in a Listing window and then invoking the + Set Register Values... command + from the pop-up menu. The beginning and end of a region is indicated in a Listing window with assume directives, and regions can be generally viewed from the Register Manager window. @@ -2555,15 +2626,15 @@ In order for a particular register value to affect decompilation, the region of code associated with the value must contain the entry point of the function, and of course the function must read from the register. Only the initial reads of the register are replaced with the constant value. - The decompiler will continue to respect later instructions that write to the register (even if the - instruction is inside the register value's region) + The Decompiler will continue to respect later instructions that write to the register, even if the + instruction is inside the register value's region. If a register value's region starts in the middle of a function, decompilation is not affected at all. Context Registers - There is a special class of registers, called context registers whose + There is a special class of registers called context registers whose values have a different affect on analysis and decompilation than described above. @@ -2573,7 +2644,7 @@ The value in a context register is examined when Ghidra decodes machine instructions from the underlying bytes in the Program. A specific value generally corresponds to a specific execution mode - of the processor. The ARM processor T bit for instance, which selects whether the + of the processor. The ARM processor T bit, for instance, which selects whether the processor is executing ARM or THUMB instructions, is modeled as a context register in Ghidra. The same set of bytes in the Program can be decoded to machine instructions in more than one way, depending on context register values. @@ -2588,17 +2659,17 @@ If a context register value is changed for a region that has already been disassembled, in order to see the affect of the change, the machine instructions in the region need to be cleared, and disassembly needs - to be triggered again. See the documentation on the - Clear Plugin. + to be triggered again (see the documentation on the + Clear Plugin). - Values for a context register are set in the same way as any other register, using the - Set Register Values ... command + Values for a context register are set in the same way as for any other register, using the + Set Register Values... command described above. Within the Register Manager window, - context registers are generally grouped together under the (pseudo-register) heading, contextreg. + context registers are generally grouped together under the contextreg pseudo-register heading. For details about how context registers are used in processor modeling, see - the document "SLEIGH: A Language for Rapid Processor Specification". + the document "SLEIGH: A Language for Rapid Processor Specification." Because context registers affect machine instructions, they also affect the underlying p-code and @@ -2615,20 +2686,48 @@ Decompiler Options - This lists configuration options that explicitly affect the behavior of the decompiler or - its output, independent of the code that is being decompiled. The bulk of these are + This page lists configuration options that explicitly affect the behavior of the Decompiler or + its output, independent of the code that is being decompiled. The bulk of these options are accessible by selecting the Code Browser menu - Edit -> Tool Options + Edit -> Tool Options... - and then picking the Decompiler sub-folder. These options are associated - with the particular tool (Code Browser) being used and will apply to decompilation of any Program - being analyzed by that tool. The three categories of options are: + and then picking the Decompiler folder. The options are associated + with the particular Code Browser or other tool being used and will apply to decompilation of any Program + being analyzed by that tool. There are three categories of options, which are listed by clicking either on the + Decompiler folder or one of its two subsections. - - affecting the engine behavior - affecting the decompiler's transformation process - affecting the final presentation of decompiler output + + + + + + + + + + Decompiler - lists that affect the engine behavior. + + +   + + + + + + +  Analysis - lists that affect the Decompiler's transformation process. + + +   + + + + + + +  Display - lists that affect the final presentation of Decompiler output. + @@ -2636,18 +2735,18 @@ Options that are specific to the particular Program being analyzed are accessed by selecting the Code Browser menu - Edit -> Options for <Program> + Edit -> Options for <program>.... - Picking the Decompiler tab shows - that only affect the decompiler. Picking the tab + Picking the Decompiler section shows + that only affect the Decompiler. Picking the section shows a table of the available prototype models, call-fixups, and callother-fixups. These - affect more than just the decompiler but are also documented here. + affect more than just the Decompiler but are also documented here.
General Options - These options govern what resources are available to the Plug-in and the decompiler engine but do + These options govern what resources are available to the Plug-in and the Decompiler engine but do not affect how analysis is performed or results are displayed. @@ -2658,7 +2757,7 @@ Cache Size (Functions) - Decompilation results for a single function can be compute intensive to produce. + Producing decompilation results for a single function can be computationally intensive. This option specifies the number of functions whose decompilation results can be cached simultaneously. When navigating to a function that has been recently cached, as when navigating back and forth between a few functions, @@ -2670,7 +2769,7 @@ Decompiler Max-Payload (MBytes) - This is a limit on the number of bytes that can be produced by the decompiler process as output + This option limits the number of bytes that can be produced by the Decompiler process as output when decompiling a single function. A payload includes the actual characters to be displayed in the window, additional token markup, symbol information, and other details of the underlying syntax tree. The limit is specified in megabytes of data. If the limit is exceeded for a single @@ -2683,11 +2782,11 @@ Decompiler Timeout (seconds) - This option sets an upper limit on the number of seconds the decompiler spends attempting + This option sets an upper limit on the number of seconds the Decompiler spends attempting to analyze one function before aborting. - It is currently not enforced for the Decompilation - Window. Instead it applies to the DecompilerSwitchAnalyzer, the analyzeHeadless command, scripts, or other - plug-ins that make use of the decompiler service. + It is currently not enforced for a Decompiler + window. Instead it applies to the DecompilerSwitchAnalyzer, the analyzeHeadless command, scripts, and other + plug-ins that make use of the Decompiler service. @@ -2695,10 +2794,10 @@ Max Instructions per Function - This option sets a maximum number of machine instructions that the decompiler will attempt + This option sets a maximum number of machine instructions that the Decompiler will attempt to analyze for a single function, as a safeguard against analyzing a long sequence - of zeroes or other constant data. The decompiler will quickly throw an exception if it - traces control-flow into more than the indicated number of instructions. + of zeroes or other constant data. The Decompiler will quickly throw an exception if it + traces control flow into more than the indicated number of instructions. @@ -2710,7 +2809,7 @@
Analysis Options - These options directly affect how the decompiler performs its analysis, either by + These options directly affect how the Decompiler performs its analysis, either by toggling specific analysis passes or changing how it treats various annotations. @@ -2721,16 +2820,16 @@ Alias Blocking - When deciding if an individual stack location has become dead, the decompiler + When deciding if an individual stack location has become dead, the Decompiler must consider aliases, pointers onto the stack that could - be used to modify the location within a called function. One strong heuristic the decompiler - uses is; if the user has explicitly created a variable on the stack between the + be used to modify the location within a called function. One strong heuristic the Decompiler + uses is: if the user has explicitly created a variable on the stack between the base location referenced by the pointer and the individual stack location, then - the decompiler can assume that the pointer is not an alias of the stack location. + the Decompiler can assume that the pointer is not an alias of the stack location. The alias is blocked by the explicit variable. However, if the user's explicit variable is labeling something that isn't - really an explicit variable, like a field within a larger structure for instance, - the decompiler may incorrectly consider the stack location as dead and start removing + really an explicit variable, like a field within a larger structure, for instance, + the Decompiler may incorrectly consider the stack location as dead and start removing live code. @@ -2738,10 +2837,10 @@ setting to specify what data-types should be considered blocking. The four options are: - None - No data-type is considered blocking. - Structures - Only structured data-types are blocking. - Structures and Arrays - All Data-types - All data-types are blocking. + None - No data-type is considered blocking + Structures - Only structures are blocking + Structures and Arrays - Only structures and arrays are blocking + All Data-types - All data-types are blocking Selecting None is the equivalent of turning off the heuristic. Selecting anything @@ -2750,11 +2849,56 @@ + + Eliminate unreachable code + + + When toggled on, the Decompiler eliminates code that it + considers unreachable. This usually happens when, due to constant propagation and other + analysis, the Decompiler decides that a boolean value controlling a conditional branch can + only take one possible value and removes the branch corresponding to the other value. Toggling + this to off lets the user see the dead code, which is typically demarcated + by the control-flow structure: + + if (false) { ... } + + + + + + Ignore unimplemented instructions + + + When toggled on, the Decompiler treats instructions whose semantics + have been formally marked unimplemented as if they do + nothing (no operation). Crucially, control flow falls through to the next instruction. + In this case, the Decompiler inserts the warning "Control flow ignored unimplemented + instructions" as a comment in the function header, but the exact point at which + instruction was ignored may not be clear. + If this option is toggled off, the Decompiler inserts the built-in + function halt_unimplemented() at the point of the unimplemented instruction, and + control flow does not fall through. + + + + + Infer constant pointers + + + When toggled on, the Decompiler infers a data-type for constants + it determines are likely pointers. In the basic heuristic, + each constant is considered as an address, and if that address starts a known data or function element + in the program, the constant is assumed to be a pointer. The constants are treated like + any other source of data-type information, and the inferred data-types are freely propagated by + the Decompiler to other parts of the function. + + + Recover -for- loops - When this is toggle on, the decompiler attempts to pinpoint + When toggled on, the Decompiler attempts to pinpoint variables that control the iteration over specific loops in the function body. When these loop variables are discovered, the loop is rendered using a standard for loop header @@ -2764,72 +2908,30 @@ - If the toggle is off, the loop is displayed using + When toggled off, the loop is displayed using while syntax, with any initializer and iterating statements mixed in with the loop body or preceding basic blocks. - - Eliminate unreachable code - - - When this is toggled on, the decompiler eliminates code that it - considers unreachable. This usually happens when, due to constant propagation and other - analysis, the decompiler decides that a boolean value controlling a conditional branch can - only take one possible value and removes the branch corresponding to the other value. Toggling - this to off lets the user see the dead code, which is typically demarcated - by the control-flow structure -- if (false) { ... }. - - - - - Ignore unimplemented instructions - - - When toggled on, the decompiler treats instructions whose semantics - have been formally marked unimplemented as if they do - nothing (no operation). Crucially, control-flow falls through to the next instruction. - In this case, the decompiler inserts the warning "Control flow ignored unimplemented - instructions" as a comment in the function header, but the exact point at which - instruction was ignored may not be clear. - If this option is toggled off, the decompiler inserts the built-in - function halt_unimplemented() at the point of the unimplemented instruction, and - control-flow does not fall through. - - - - - Infer constant pointers - - - When toggled on, the decompiler infers a data-type for constants - it determines are likely pointers. In the basic heuristic, - each constant is considered as an address, and if that address starts a known data or function element - in the program, the constant is assumed to be a pointer. The constants are treated like - any other source of data-type information, and the inferred data-types are freely propagated by - the decompiler to other parts of the function. - - - Respect read-only flags - When toggled on, the decompiler treats any values in memory + When toggled on, the Decompiler treats any values in memory marked read-only as constant. If a read-only memory location is explicitly referenced by the function being decompiled, it is considered to be unchanging, and the initial - value present in the Program is pulled in to the data-flow of the function as a constant. + value present in the Program is pulled into the data flow of the function as a constant. Due to Constant Propagation and other transformations, read-only memory - can have a large effect on decompiler output. + can have a large effect on Decompiler output. - Typically as part of the import process, Ghidra marks memory blocks as read-only if they + Typically, as part of the import process, Ghidra marks memory blocks as read-only if they are tagged as such by a section header or other meta-data in the original binary. Users can actively set whether specific memory regions are considered read-only through the Memory Manager, and individual data elements can be marked as constant via the Mutability setting - (See ). + (see ). @@ -2837,7 +2939,7 @@ Simplify extended integer operations - This toggles whether the decompiler attempts to simplify double precision arithmetic operations, + This toggles whether the Decompiler attempts to simplify double precision arithmetic operations, where a single logical operation is split into two parts, calculating the high and low pieces of the result in separate instructions. Decompiler support for this kind of transform is currently limited, and only certain constructions are simplified. @@ -2848,10 +2950,10 @@ Simplify predication - When this option is active, the decompiler simplifies code sequences containing + When this option is active, the Decompiler simplifies code sequences containing predicated instructions. A predicated instruction is executed conditionally based on a boolean value, the predicate, - and a sequence of instructions can share the same predicate. The decompiler merges the + and a sequence of instructions can share the same predicate. The Decompiler merges the resulting if/else blocks that share the same predicate so that the condition is only printed once. @@ -2861,7 +2963,7 @@ Use in-place assignment operators - When toggled on, the decompiler employs in-place assignment operators, + When toggled on, the Decompiler employs in-place assignment operators, such as += and <<=, in its output syntax. @@ -2874,7 +2976,7 @@
Display Options - These options do not change the decompiler's analysis but only affect how the results are presented. + These options do not change the Decompiler's analysis but only affect how the results are presented. @@ -2888,11 +2990,21 @@ + + Color Default + + + Assign the color to any characters emitted by the Decompiler that do not fall into one of token types + listed below. This includes delimiter characters like commas and parentheses as well as various operator + characters. + + + Color for <token> - Assign colors to the different types of language tokens emitted by the decompiler. + Assign colors to the different types of language tokens emitted by the Decompiler. These include: @@ -2909,21 +3021,11 @@ - - Color Default - - - Assign the color to any characters emitted by the decompiler that do not fall into one of token types - listed above. This includes delimiter characters like commas and parentheses as well as various operator - characters. - - - Color for Current Variable Highlight - Assign the background color used to highlight the token currently under the cursor in a Decompiler Window. + Assign the background color used to highlight the token currently under the cursor in a Decompiler window. @@ -2931,8 +3033,8 @@ Color for Highlighting Find Matches - Assign the background color used to highlight characters matching the current Find pattern. - See . + Assign the background color used to highlight characters matching the current Find pattern + (see ). @@ -2940,7 +3042,7 @@ Comment line indent level - Set the number of characters that comment lines are indented within decompiler output. This applies only + Set the number of characters that comment lines are indented within Decompiler output. This applies only to comments within the body of the function being displayed. Comments at the head of the function are not indented. @@ -2950,7 +3052,7 @@ Comment style - Set the language syntax used to delimit comments emitted as part of decompiler output. For C and Java, + Set the language syntax used to delimit comments emitted as part of Decompiler output. For C and Java, the choices are /* C style comments */ and // C++ style comments. @@ -2959,7 +3061,7 @@ Disable printing of type casts - Set whether the syntax for type casts is emitted in decompiler output. + Set whether the syntax for type casts is emitted in Decompiler output. If this is toggled on, type cast syntax is never displayed, even when rules of the language require it. So individual statements may no longer be formally accurate. @@ -2969,26 +3071,27 @@ Display <kind-of> Comments - Set whether a specific kind of comment can be incorporated into decompiler output. Comments in - Ghidra are categorized based on their placement within the Listing Window, and the decompiler - in general tries to display comments where appropriate. See the discussion in . - Each kind of comment has its own toggle and can be individually included or excluded from decompiler output. + Set whether a specific type of comment should be incorporated into Decompiler output. + Each type has its own toggle and can be individually included or excluded from Decompiler output. - PLATE - Whether plate comments within the body of the function are displayed + EOL - PRE + PLATE - Whether plate comments within the body of the function are displayed POST - EOL + PRE + A comment's type indicates how it is placed within a Listing window, not how it is placed in + a Decompiler window. All comments within the body of the function are displayed in the same way + by the decompiler, regardless of their type (see the discussion in ). @@ -2996,9 +3099,9 @@ Display Header comment - Toggle whether the decompiler emits comments at the head (before the beginning) of a function. + Toggle whether the Decompiler emits comments at the head (before the beginning) of a function. The header is built from Plate comments placed at the entry point of the - function. See the discussion in . + function (see the discussion in ). The inclusion of other Plate comments is controlled by the Display PLATE comments toggle, described above. @@ -3007,11 +3110,11 @@ Display Line Numbers - Toggle whether line numbers are displayed in any Decompiler Window. If toggled - on, each Decompiler Window reserves space to display a numbers down the left - side of the window, labeling each line of output produced by the decompiler. + Toggle whether line numbers are displayed in any Decompiler window. If toggled + on, each Decompiler window reserves space to display a numbers down the left + side of the window, labeling each line of output produced by the Decompiler. Line numbers are associated with the window itself and are not formally part of - the decompiler's output. + the Decompiler's output. @@ -3019,19 +3122,19 @@ Display Namespaces - Control how the decompiler displays namespace information associated + Control how the Decompiler displays namespace information associated with function and variable symbols. The possible settings are: + + Minimally - Display the minimal path that distinguishes the symbol + Always - Always display the entire namespace path Never - Never display the namespace path - - Minimally - Display the minimal path that distinguishes the symbol - @@ -3050,9 +3153,9 @@ Display Warning comments - Toggle whether decompiler generated WARNING comments are displayed as part - of the output. The decompiler generates these comments, independent of those laid down by users, to - indicate unusual conditions or possible errors (See ). + Toggle whether Decompiler generated WARNING comments are displayed as part + of the output. The Decompiler generates these comments, independent of those laid down by users, to + indicate unusual conditions or possible errors (see ). @@ -3060,8 +3163,8 @@ Font - Set the typeface used to render characters in any Decompiler Window. Indentation is generally clearer - using a monospaced (fixed width) font, but any font available to the system can be used. The size of + Set the typeface used to render characters in any Decompiler window. Indentation is generally clearer + using a monospaced (fixed-width) font, but any font available to Ghidra can be used. The size of the font can also be controlled from this option. @@ -3070,18 +3173,18 @@ Integer format - Set how integer constants are formatted in the decompiler output. + Set how integer constants are formatted in the Decompiler output. The possible settings are: - Best Fit - Select the most natural representation + Force Hexadecimal - Always use a hexadecimal representation Force Decimal - Always use a decimal representation - Force Hexadecimal - Always use a hexadecimal representation + Best Fit - Select the most natural representation @@ -3095,8 +3198,8 @@ Maximum characters in a code line - Set the maximum number of characters in a line of code emitted by the decompiler before a line break - is forced. The decompiler will not split an individual token across lines. So line breaks frequently + Set the maximum number of characters in a line of code emitted by the Decompiler before a line break + is forced. The Decompiler will not split an individual token across lines. So line breaks frequently will come before the maximum number of characters is reached, and technically a single token can extend the line beyond the maximum. @@ -3107,9 +3210,9 @@ Set the amount of indenting used to print statements within a nested scope in the - decompiler output. Each level of nesting (for function bodies, + Decompiler output. Each level of nesting (function bodies, loop bodies, if/else bodies, etc.) - bodies adds this number characters. + adds this number characters. @@ -3117,10 +3220,10 @@ Print 'NULL' for null pointers - Set how null pointers are displayed in decompiler output. If this is toggled - on, the decompiler will print a constant pointer value of zero (a null pointer) + Set how null pointers are displayed in Decompiler output. If this is toggled + on, the Decompiler will print a constant pointer value of zero (a null pointer) using the special token NULL. Otherwise the pointer value is represented with the '0' character, - which is then type cast into a pointer. + which is then cast to a pointer. @@ -3129,10 +3232,10 @@ Set whether the calling convention is printed as part of the function - declaration in decompiler output. If this option is turned on, the name of the calling convention + declaration in Decompiler output. If this option is turned on, the name of the calling convention is printed just prior to the return value data-type within the function declaration. All functions in Ghidra have an associated calling convention (or prototype model) that is used during - decompiler analysis. See the discussion in . + Decompiler analysis (see the discussion in ). @@ -3144,7 +3247,7 @@
Program Options - Changes to these options affect only the decompiler and only for + Changes to these options affect only the Decompiler and only for the current Program being analyzed. @@ -3156,9 +3259,9 @@ Sets the calling convention (prototype model) used when decompiling a function where - the convention is not known (i.e. marked as "unknown"). Many architectures have multiple - calling conventions, __stdcall, __thiscall etc. See the - discussion in . + the convention is not known; i.e., marked as unknown. Many architectures have multiple + calling conventions, __stdcall, __thiscall, etc. + (see the discussion in ). @@ -3169,7 +3272,7 @@
Specification Extensions - This tab displays elements from the Program's compiler specification and + This entry displays elements from the Program's compiler specification and processor specification and allows the user to add or remove extensions, including prototype models, call-fixups, and callother-fixups. @@ -3182,12 +3285,12 @@ with it. - Users can change or reimport an extension, if new information points to a better definition. + Users can change or reimport an extension if new information points to a better definition. Users have full control over an extension, and unlike a core element, can tailor it specifically to the Program. - This options tab presents a table of all specification elements. + This options entry presents a table of all specification elements. Each element, whether core or an extension, is displayed on a separate row with three columns: @@ -3197,7 +3300,7 @@ The core elements of the specification have a blank Status column, and any extension - is labeled either as "extension" or "override". + is labeled either as extension or override. Extension Types @@ -3213,14 +3316,14 @@ prototype - This element is a that holds a specific named set - of parameter passing details. It - can be applied to individual functions by name, typically via the "Calling Convention" menu + This element is a named that holds a specific set + of parameter-passing details. It + can be applied to individual functions by name, typically via the Calling Convention menu in the Function Editor Dialog. See the documentation on for how they affect decompilation. - The XML tag, <prototype> always has a name attribute + The XML <prototype> tag always has a name attribute that defines the formal name of the prototype model, which must be unique across all models. @@ -3267,7 +3370,7 @@ This element is a Callother-fixup, which can be used to substitute a specific p-code sequence for CALLOTHER p-code operations. A CALLOTHER - is a black-box, or unspecified p-code operation, see . + is a black-box, or unspecified p-code operation (see ). The <callotherfixup> tag has a @@ -3316,7 +3419,7 @@ extension - Indicates that the element is a program specific extension that has been + Indicates that the element is a program-specific extension that has been added to the specification. @@ -3327,7 +3430,7 @@ Indicates that the element, which must be a callotherfixup, is an extension that overrides a core element with the same target. The extension - effectively replaces the p-code injection of the core element with a user supplied one. + effectively replaces the p-code injection of the core element with a user-supplied one. If this type of extension is later removed, the core element becomes active again. @@ -3338,7 +3441,7 @@ If the user has either imported additional extensions or selected an extension for removal but has not yet clicked the Apply button in the Options dialog, the Status column - may show one of the following values, indicating a pending change. + may show one of the following values, indicating a pending change: @@ -3384,7 +3487,7 @@ Importing a New Extension The Import button at the bottom of the - "Specification Extensions" pane allows the user to import one of the + Specification Extensions pane allows the user to import one of the three element types, prototype, callfixup, or callotherfixup, into the program as a new extension. @@ -3404,8 +3507,8 @@ The XML file describing the extension must have one of the tags, <prototype>, <callfixup>, or <callotherfixup>, as its single root element. Users can find numerous examples within the compiler - and processor specification files that come as part of Ghidra's installation. - See . + and processor specification files that come as part of Ghidra's installation + (see ). In the case of prototype and callfixup @@ -3420,7 +3523,7 @@ Removing an Extension - The Remove button at the bottom of the "Specification Extensions" pane allows + The Remove button at the bottom of the Specification Extensions pane allows the user to remove a previously installed extension. A row from the table is selected first, which must have a Status of extension or override. Core elements of the specification cannot be removed. @@ -3446,7 +3549,7 @@ Decompiler Window - To display the decompiler window, position the cursor on a + To display the Decompiler window, position the cursor on a function in the Code Browser, then select the @@ -3455,17 +3558,17 @@ -  icon from the tool bar, or the +  icon from the tool bar, or the Decompile option from the Window menu in the tool. - A decompiler window always displays one function at a time. + A Decompiler window always displays one function at a time. The initial window that comes up in the Code Browser is called the Main - window (See ), and it automatically decompiles and displays the function at the + window (see ), and it automatically decompiles and displays the function at the current address, following the user's navigation. Other Snapshot windows can also - be opened that show different functions at the same time (See ). But any window + be opened that show different functions at the same time (see ). But any window only shows one function at a time. @@ -3512,14 +3615,14 @@ one or more warnings during the process. These warnings are integrated into the output as source code comments starting with label WARNING:. They occur either at the beginning of the function as part of the function header or at the point in the code directly - associated with the warning. (See ) + associated with the warning (see ).
Main Window - Initially pushing + Initially pressing @@ -3527,24 +3630,24 @@ -  or selecting +  or selecting Decompile from the Window menu in the tool brings up the main window. The main window always displays the function at the current address within the Code Browser and follows as the user navigates within the Program. Any mouse click, menu option, or other action causing the cursor to move to a new address in the Listing also causes the main window to display the function containing that address. Navigation to new functions is also possible from within the window by double-clicking on function - tokens (See ). + tokens (see ). - Cross Highlighting + Cross-Highlighting The main window maintains a map between the individual variable and operator tokens displayed in - the window and the machine instructions which correspond to them. This can give the user instant - feedback about the correspondence between the decompiler and disassembly views of the function, - and it is frequently useful to have both the Listing window and the Decompiler window side - by side. Clicking on tokens in the Decompiler window causes the Listing window to navigate + the window and the machine instructions which correspond to them. Disassembled machine instructions + are displayed by any Listing window, and having both a Listing and Decompiler window side by side + lets the user see this correspondence between the decompiled and disassembled views of the function. + Clicking on tokens in the Decompiler window causes the Listing window to navigate to the corresponding instruction, and clicking instructions in the Listing window causes the Decompiler window to navigate to the corresponding line. Highlighting a region of code in either window causes the corresponding region in the other window to be highlighted. @@ -3564,7 +3667,7 @@ Comment tokens map to the machine address associated with the comment. - In general, the map between machine instructions and tokens is not one to one because the decompiler + In general, the map between machine instructions and tokens is not one-to-one because the Decompiler transforms its underlying representation of the function. An instruction may no longer have any operator that corresponds to it in the decompiled result. Tokens may be transformed from the natural operation of the machine instruction they are associated @@ -3584,32 +3687,35 @@ -  icon - in another Decompiler window's toolbar causes a Snapshot window - to be created, which initially shows decompilation of the same function. Multiple - Snapshot windows can be brought up to show decompilation of different functions +  icon + in any Decompiler window's toolbar causes a Snapshot window + to be created, which shows decompilation of the same function. + Unlike the main window however, the Snapshot window + does not change the function it displays in response to external navigation events. + A Snapshot window can be used to hold a function fixed while the user navigates to + different functions in Listing or other windows. + + + Multiple Snapshot windows can be brought up to show decompilation of different functions simultaneously. Snapshot windows are visually distinguished from the main Decompiler window by their colored outline. - - The Snapshot - window, unlike the main window, is not linked to the Listing window - and does not change the function it displays in response to external navigation events. - A Snapshot window can be used to hold a function fixed while the user navigates to - different functions in the Listing or other windows. - Navigating to new functions within a Snapshot window is possible when the window is active. The window responds to the actions - Go To ... (pressing the 'g' key) + Go To... (pressing the 'G' key) Go to previous location (Back) Go to next location (Forward) + + Double-clicking on specific tokens within the Snapshot window may also cause it to navigate + to a new location (see ). +
@@ -3617,7 +3723,7 @@ If the current location within the Code Browser is in disassembled code, but that code is not contained in a Formal Function Body, - then the decompiler window invents a function body on the fly called an + then the Decompiler invents a function body, on the fly, called an Undefined Function. The background color of the window is changed to gray to indicate this special state. @@ -3629,30 +3735,30 @@ The entry point address of the Undefined Function is chosen by - backtracking through the code's control-flow from the current location to the start of + backtracking through the code's control flow from the current location to the start of a basic block that has no flow coming in except possibly from call instructions. During decompilation, a function body is computed from the selected entry point (as with any function) - based on control-flow up to instructions with terminator semantics. + based on control flow up to instructions with terminator semantics. - The current address, as indicated by the cursor in the Listing Window for instance, is - generally not the entry of the invented function, but the current address will be + The current address, as indicated by the cursor in the Listing for instance, is + generally not the entry point of the invented function, but the current address will be contained somewhere in the body. For display purposes in the window, the invented function is given a name based on the computed entry point address with the prefix UndefinedFunction. The function is assigned the default calling convention, and parameters are discovered as part of - the decompiler's analysis. + the Decompiler's analysis.
Tool Bar - This is a group of actions that can be triggered by pressing a button in the tool/title - bar at the top of individual decompiler windows, both main and - Snapshot. The action applies to the function and decompiler results + The following actions are available by pressing the corresponding icon in the title/tool + bar at the top of each individual Decompiler window. + The action applies to the function and Decompiler results displayed in that particular window. @@ -3665,7 +3771,7 @@ -  - button +  - button Exports the decompiled result of the current function to a file. A file chooser @@ -3676,7 +3782,7 @@ This action exports a single function at a time. The user can export all functions simultaneously from the Code Browser, by selecting the menu - File -> Export Program ... and then choosing + File -> Export Program... and then choosing C/C++ from the drop-down menu. See the full documentation for the Export dialog. @@ -3693,13 +3799,13 @@ -  - button +  - button Creates a new Snapshot window. The Snapshot window - initially displays the same function as the decompiler window on which the action was triggered, - but if that window navigates to other functions, the Snapshot does not - follow and continues to display the original function. (See ) + displays the same function as the Decompiler window on which the action was triggered, + and if that window navigates to other functions, the Snapshot does not + follow but continues to display the original function (see ). @@ -3713,7 +3819,7 @@ -  - button +  - button Triggers a re-decompilation of the current function displayed in the window. @@ -3722,8 +3828,8 @@ This action is not necessary for normal reverse engineering tasks. Re-decompilation is automatically triggered for all - decompiler windows by any change to the Program, so the most up-to-date decompilation is - always available to the user without this action. This action is a primarily a debugging + Decompiler windows by any change to the Program, so the most up-to-date decompilation is + always available to the user without this action. This action is primarily a debugging aid for plug-in developers. @@ -3738,17 +3844,17 @@ -  - button +  - button - Copies the currently selected text in the decompiler window to the clipboard. + Copies the currently selected text in the Decompiler window to the clipboard. Debug Function Decompilation - This action is located in the drop-down menu on the right side of the decompiler + This action is located in the drop-down menu on the right side of the Decompiler window tool/title bar. @@ -3756,7 +3862,7 @@ the current function is collected and saved to an output file in XML format. A file chooser dialog is presented to the user to choose the output file. The file is useful when submitting bug reports - about the decompiler as it is generally much smaller than + about the Decompiler as it is generally much smaller than the entire Program and only contains information specific to the function. Information is generated by performing the full decompilation of the function and collecting all the data and @@ -3769,12 +3875,13 @@ Graph AST Control Flow - Generate a control-flow graph based upon the results in the active Decompiler Window, + This action is located in the drop-down menu on the right side of the Decompiler + window tool/title bar. + + + Generate a control-flow graph based upon the results in the active Decompiler window, and render it using the current Graph Service. - - If no Graph Service is available then this action will not be present. -
@@ -3782,14 +3889,14 @@
Mouse Actions - Left Click + Left-Click - Moves the decompiler window cursor and highlights the token. Within the + Moves the Decompiler window cursor and highlights the token. Within the main window, if a token has a machine address - associated with it, a left click generates a + associated with it, a left-click generates a navigation event to that address, which may cause other - windows to display code near that address. - (See ) + windows to display code near that address + (see ). Selecting a '(' or ')' token causes it and its matching parenthesis to be @@ -3801,28 +3908,33 @@ - Right Click + Right-Click - Moves the decompiler window cursor, highlights the token, and brings up the menu of - context sensitive actions. Any highlighting and navigation is identical to a - left click. The menu actions presented depend primarily on the token type and + Moves the Decompiler window cursor, highlights the token, and brings up the menu of + context-sensitive actions. Any highlighting and navigation is identical to a + left-click. The menu actions presented depend primarily on the token type and are tailored to the context at that point in the code. - Double Click + Double-Click - Navigates based on the selected symbol or other token (See below). If the selected token represents a formal symbol, - such as a function name or a global variable, double clicking causes a + Navigates based on the selected symbol or other token (see below). + If the selected token represents a formal symbol, + such as a function name or a global variable, double-clicking causes a navigation event to the address associated with the symbol. + + + This action is performed by clicking twice on the desired token with the left + mouse button. Function Symbols - Double clicking a called function name causes the + Double-clicking a called function name causes the window itself to navigate away from its current function to the called function, triggering a new decompilation if necessary and changing its display. @@ -3830,24 +3942,24 @@ Global Variables - Double clicking a global - variable name does not have any effect on the decompiler window itself, - but other windows, like the Listing window, may navigate to the + Double-clicking a global + variable name does not have any effect on the Decompiler window itself, + but Listing or other windows may navigate to the storage address of the global variable. Constants - Double clicking a token representing a constant causes the constant to be treated - as an address, and a navigation event to that address is generated. The decompiler + Double-clicking a token representing a constant causes the constant to be treated + as an address, and a navigation event to that address is generated. The Decompiler window itself navigates depending again on whether the address represents a new function or not. Labels - Double clicking the label within a goto statement causes the window to navigate + Double-clicking the label within a goto statement causes the window to navigate to the target of the goto, within the function. The cursor is set and the window view is adjusted if necessary to ensure that the target is visible. @@ -3855,7 +3967,7 @@ Braces - Double clicking a '{' or '}' token, causes the window to navigate to the matching brace + Double-clicking a '{' or '}' token, causes the window to navigate to the matching brace within the window. The cursor is set and the window view is adjusted if necessary to ensure that the matching brace is visible. @@ -3866,32 +3978,40 @@ - Control Double Click + Ctrl-Double-Click Opens a new Snapshot window, navigating it to the selected symbol. This is a convenience for immediately decompiling and displaying a called function in a new window, without disturbing the active window. The behavior is similar to the - Double Click action, the selected token must represent a function name symbol or possibly + Double-Click action, the selected token must represent a function name symbol or possibly a constant address, but the navigation occurs in the new Snapshot window. + + This action is performed by clicking twice on the desired token with the left + mouse button, while holding down the Ctrl key. + - Control Shift Click + Ctrl-Shift-Click Generates a navigation event to the address, within the current function, associated with the clicked token. This allows Snapshot windows to do basic - cross-highlighting in the same way as the main decompiler window. - A Control double-click causes the Listing and other windows to navigate to and display the same - portion of code currently being displayed in the Snapshot window. (See ) + cross-highlighting in the same way as the main Decompiler window. + A ctrl-shift-click causes Listing and other windows to navigate to and display the same + portion of code currently being displayed in the Snapshot window (see ). + + + This action is performed by clicking on the desired token with the left mouse + button, while holding down both the Ctrl and Shift keys. - Middle Click + Middle-Click - Highlights every occurrence of a variable, constant, or operator under the current - cursor location, within the decompiler window. + Highlights every occurrence of a variable, constant, or operator represented by the selected + token, within the Decompiler window. @@ -3901,8 +4021,8 @@ Pop-up Menu Actions All the actions described in this section can be activated from the menu that pops up - when right-clicking on a token within the decompiler window. The pop-up menu is context sensitive and - the type of token in particular (See ) determines what actions are available. + when right-clicking on a token within the Decompiler window. The pop-up menu is context sensitive and + the type of token in particular (see ) determines what actions are available. The token clicked provides a local context for the action and may be used to pinpoint the exact variable or operation affected. @@ -3916,7 +4036,7 @@ The structure definition is filled in by examining how the variable is used, assuming it is a pointer to the structure, tracing - data-flow to all the expressions the variable is used in. LOAD and STORE operations + data flow to all the expressions the variable is used in. LOAD and STORE operations trigger new fields and additive offsets are traced to calculate the offset of the fields within the structure definition. @@ -3925,8 +4045,8 @@ retyped to be a pointer to the structure. Within the window, the function is decompiled again and references to new fields in the structure should be immediately apparent. These can be renamed or retyped from the window - to further refine the new structure definition. - (See ) + to further refine the new structure definition + (see ). @@ -3958,31 +4078,39 @@ Set or change a comment at the address of the selected token. - These actions bring up the general Comment dialog (See Comments), - which associates the comment with a specific address in the Program. For the - decompiler actions, this address is of the machine instruction most closely linked to the selected token. - Comments will be visible in the Listing and other Ghidra windows viewing the same + These actions bring up the general Comment dialog (see Comments), + which associates the comment with a specific address in the Program. For comment + actions in the Decompiler, this address is of the machine instruction most closely linked to the selected token. + Any comments generated from a Decompiler window will be visible in Listing and other windows viewing the same section of code. - The decompiler windows can display all comment types, but this may be affected by the Display options - (See ). + A Decompiler window can display all comment types, but this may be affected by the Display options + (see ). - Set Plate Comment ... + Set Plate Comment... Brings up the dialog for setting or editing a Plate comment. - Set Pre Comment ... + Set Pre Comment... Brings up the dialog for setting or editing a Pre comment. + + Set... + + Brings up the dialog for setting or editing a comment based on the selected token. + A Plate comment is edited if the token is part + of the function's header. A Pre comment is edited otherwise. + + @@ -3990,49 +4118,49 @@ Commit Local Names - Commit the names of any local variables discovered during the decompiler's analysis + Commit the names of any local variables discovered during the Decompiler's analysis to the Program database as new Variable Annotations. The recovered data-type is not committed as part of the annotation, only the name and storage location. - Parameters are not affected by this command, see . + Parameters are not affected by this command (see ). The purpose of the command is to synchronize the local variables in the - decompiler's view of a function with the formal Variable Annotations in the disassembly view, + Decompiler's view of a function with the formal Variable Annotations in the disassembly view, without otherwise affecting the decompilation. After executing this command, additional changes - to local variable can be performed directly on the corresponding annotations in the Listing Window, - using various methods (See ). + to local variables can be performed directly on the corresponding annotations displayed in Listing windows, + using various methods (see ). Data-types are not forced for new annotations, they are created with - an undefined data-type, which allows the decompiler to refine + an undefined data-type, which allows the Decompiler to refine its view of the variable's data-type as new information becomes available - (See ). + (see ). Commit Params/Return - Commit the decompiler's analysis of the input parameters and return value of the current + Commit the Decompiler's analysis of the input parameters and return value of the current function as annotations to the Program database. - In the absence of either imported or user defined - information about a function's prototype, the decompiler performs its own analysis of what + In the absence of either imported or user-defined + information about a function's prototype, the Decompiler performs its own analysis of what the prototype is, determining the storage location and data-type of all parameters and the return value. This action commits this analysis permanently for the current function displayed in the window, creating a matching Variable Annotation for each input - parameter and the return value. The new annotations will be displayed in the - Listing Window as part of the function header, and the action effectively - synchronizes the disassembly view and decompiler's view of the function prototype. + parameter and the return value. The new annotations will be displayed in a + Listing window as part of the function header, and the action effectively + synchronizes the disassembly view and Decompiler's view of the function prototype. Committed prototype information is used both when decompiling the function itself and when - decompiling other functions that call it. - The committed annotations are forcing on the decompiler, and it will - no longer perform prototype recovery analysis for that function. The decompiler assumes the committed parameters, + decompiling other functions that call it. The committed annotations are forcing + on the Decompiler (see ), and it will + no longer perform prototype recovery analysis for that function. The Decompiler assumes the committed parameters, and only the committed parameters, exist and will not modify their data-types, with the exception of parameters that are explicitly marked as having an undefined data-type. The user must manually modify individual variables or clear the entire prototype - if they want a change (See ). + if they want a change (see ). @@ -4075,10 +4203,10 @@ This command primarily targets a constant token in the Decompiler window, but if there is a scalar operand in an instruction that corresponds - with the selected constant, the same conversion is also applied to the scalar in the Listing + with the selected constant, the same conversion is also applied to the scalar in any Listing window. This is equivalent to selecting the - Convert command from the - Listing. There may not be a scalar operand directly corresponding to the selected constant, in + Convert command from a + Listing window. There may not be a scalar operand directly corresponding to the selected constant, in which case the conversion will be applied only in the Decompiler window. @@ -4094,13 +4222,13 @@ - Copy/Copy Special ... + Copy - Copy selected code from the decompiler window into the clipboard. + Copy selected code from the Decompiler window into the clipboard. This is part of the standard copy - capabilities for all Ghidra windows and is suitable for copying (sections of) decompiler output + capabilities for all Ghidra windows and is suitable for copying (sections of) Decompiler output into other documents. @@ -4119,7 +4247,7 @@ Enum Editor. - Any change to the definition of the data-type is automatically incorporated by the decompiler into its output + Any change to the definition of the data-type is automatically incorporated by the Decompiler into its output (see ). @@ -4131,7 +4259,7 @@ details about how it passes parameters. - The action is available from any token in the decompiler window. Most tokens trigger editing + The action is available from any token in the Decompiler window. Most tokens trigger editing of the current function itself, but a called function can be edited by putting the cursor on its name specifically. @@ -4163,14 +4291,14 @@ See documentation for the Function Editor Dialog. - The decompiler automatically incorporates any changes into its output. + The Decompiler automatically incorporates any changes into its output. - Find ... + Find... - Search for strings within the active window, in the current decompiler output. + Search for strings within the active window, in the current Decompiler output. The command brings up a dialog where a search pattern can entered as a raw string or regular expression. @@ -4184,8 +4312,8 @@ Highlight - All these actions highlight a specific set of variable tokens tracing the data-flow - of the selected variable within the current function. Data-flow is the directed flow + All these actions highlight a specific set of variable tokens tracing the data flow + of the selected variable within the current function, defined as the directed flow of data from input variables through operations that manipulate their value to their output variables. The operations and variables chain together to form data-flow paths. @@ -4275,7 +4403,7 @@ Secondary Highlight - A secondary highlight is a semi-permanent token highlight in the decompiler + A secondary highlight is a semi-permanent token highlight in the Decompiler window that, unlike normal highlights, will not go away as the user clicks other tokens. The color and text being highlighted is controlled by the user and will persist for for the duration of the Ghidra session or until the user @@ -4326,30 +4454,32 @@ Override the function prototype corresponding to the function under the cursor. - This action can be triggered at call sites, where the function + This action can be triggered at a call site, where the function being decompiled is calling into another function. Users must select either the token representing - the called function's name or the tokens representing the function pointer at the call site. - A dialog is brought up where the a complete function declaration, specifying - the return data-type along with the name and data-type for each input parameter. Additionally, - the "Calling Convention", "In Line", and "No Return" properties of the function prototype - can be set (See ). + the called function's name or one of the tokens representing the function pointer at the call site. + The action brings up a dialog where the function prototype + corresponding to the call site can be edited. The dialog provides fine-grained control of + the return data-type along with the name and data-type of each input parameter. + The function prototype properties Calling Convention, + In Line, and No Return + can also be set (see ). - Confirming the dialog forces the new function prototype on the decompiler's view of the called function, + Confirming the dialog forces the new function prototype on the Decompiler's view of the called function, but only for the single selected call site. This action is suitable for either indirect calls or direct calls to functions taking a variable number of arguments; situations where a complete description of all parameters is not available. For direct calls with a fixed number of arguments, it is almost always better to provide - parameter information by setting the function's prototype directly. See the - command for instance. In this situation, the "Override Signature" + parameter information by setting the function's prototype directly (see the + command). In this situation, the "Override Signature" command is still possible, but it will bring up a confirmation dialog. - Reference ... + References Find Uses of <data-type> @@ -4363,7 +4493,7 @@ References to parameters and global variables are always listed. If the Dynamic Data Type Discovery option is on (see To Find Location References to Data Types), - the decompiler's propagation analysis is invoked on all functions to discover local + the Decompiler's propagation analysis is invoked on all functions to discover local variables as well. @@ -4420,7 +4550,7 @@ prototype was previously placed by the command. As with this command, users must select either the token representing the called function's name or the tokens representing the function pointer at the call site. The action causes the - override to be removed immediately. Parameter information will be drawn from the decompiler's + override to be removed immediately. Parameter information will be drawn from the Decompiler's normal analysis. @@ -4432,7 +4562,7 @@ The current function can be renamed by selecting the name token within the function's - declaration at the top of the decompiler window, or individual called functions + declaration at the top of the Decompiler window, or individual called functions can be renamed by selecting their name token within a call expression. This action brings up a dialog containing a text field prepopulated with the name to be changed. The current namespace (and any parent namespaces) is @@ -4442,8 +4572,8 @@ A new or child namespace can - be specified by prepending the base name with the namespace using the C++ '::' - separator characters. Any namespace path entered this way is considered relative + be specified by prepending the base name with the namespace using the C++ "::" + delimiter characters. Any namespace path entered this way is considered relative to the namespace set in the drop-down menu, so the Global namespace may need to be selected if the user wants to specify an absolute path. If any path element of the namespace does not exist, it is created. @@ -4454,22 +4584,6 @@ - - Rename Label - - Rename the label corresponding to the token under the cursor. - - - A label can be renamed by triggering this action while the corresponding label token is - under the cursor. This action brings up the - Edit Label Dialog. - - - The change will be immediately visible across all references to the label - (including in any Decompiler, Listing, and Functions windows). - - - Rename Field @@ -4483,8 +4597,9 @@ If the initial name looks like field_0x.., it may be that the field offset was - discovered by the decompiler, and the field does not exist in the structure definition. - In this case, a new field is created at that offset, with the new name and a data-type of "undefined". + discovered by the Decompiler, and the field does not exist in the structure definition. + In this case, a new field is created at that offset, with the new name and a data-type of + undefined. The change to the definition is visible globally @@ -4492,7 +4607,7 @@ is triggered again to incorporate the new name, but the output is otherwise unaffected. - Within a decompiler window, field name tokens are presented in context, + Within a Decompiler window, field name tokens are presented in context, showing how they are used within the code flow of the current function. Combined with and , this action allows a @@ -4514,19 +4629,35 @@ A new or child namespace can - be specified by prepending the base name with the namespace using the C++ '::' - separator characters. Any namespace path entered this way is considered relative + be specified by prepending the base name with the namespace using the C++ "::" + delimiter characters. Any namespace path entered this way is considered relative to the namespace set in the drop-down menu, so the Global namespace may need to be selected if the user wants to specify an absolute path. If any path element of the namespace does not exist, it is created. The change will be immediately visible across all references to the variable, - including the Decompiler and Listing windows. A new decompilation is triggered + including in Decompiler and Listing windows. A new decompilation is triggered to incorporate the new name, but the output is otherwise unaffected. + + Rename Label + + Rename the label corresponding to the token under the cursor. + + + A label can be renamed by triggering this action while the corresponding label token is + under the cursor. This action brings up the + Edit Label Dialog. + + + The change will be immediately visible across all references to the label + (including in any Decompiler, Listing, and Functions windows). + + + Rename Variable @@ -4543,11 +4674,11 @@ change, but otherwise the the output is unaffected. - Local variables and parameters presented by the decompiler may be invented on-the-fly + Local variables and parameters presented by the Decompiler may be invented on-the-fly and don't necessarily have a formal annotation in Ghidra (see ). Performing this action on a variable will create an annotation if one didn't exist previously, which will - generally be visible as part of the function header in the Listing window. + generally be visible as part of the function header in any Listing window. A new annotation will not commit the data-type of the variable, and data-types applied later, and elsewhere in the function, can still propagate into the variable. @@ -4576,7 +4707,7 @@ The change to the definition is visible globally throughout the Program, anywhere the data-type is referenced, and is forcing - on the decompiler (see ). Decompilation is triggered + on the Decompiler (see ). Decompilation is triggered again, and the new data-type is propagated from the point of the field reference(s). Changes to the output may be large and indirect. @@ -4595,7 +4726,7 @@ The change is visible globally throughout the Program, anywhere the variable is - referenced, and is forcing on the decompiler + referenced, and is forcing on the Decompiler (see ). Decompilation is triggered again, and the new data-type is propagated from the variable reference(s). Changes to the output may be large and indirect. @@ -4609,7 +4740,7 @@ This action is only available from the data-type token in the function declaration, at the - top of the decompiler's output. It brings up a dialog prepopulated with the current + top of the Decompiler's output. It brings up a dialog prepopulated with the current data-type returned by the function. The user can select any fixed length data-type in the Program. Editing and confirming this dialog immediately changes the data-type. If an annotation for the return value (named <RETURN>) did not exist previously, one is created. @@ -4624,7 +4755,7 @@ Setting a data-type on the return value using this action affects decompilation for the function itself and, additionally, any function that calls this function. Within a calling - function, the decompiler propagates the data-type into the variable or expression incorporating + function, the Decompiler propagates the data-type into the variable or expression incorporating the return value at each call site. @@ -4641,17 +4772,17 @@ the data-type. - The change to the data-type is forcing on the decompiler + The change to the data-type is forcing on the Decompiler (see ). Decompilation is triggered again, and the new data-type is propagated from the variable reference(s). Changes to the output may be large and indirect. - Local variables and parameters presented by the decompiler may be invented on-the-fly + Local variables and parameters presented by the Decompiler may be invented on-the-fly and don't necessarily have a formal annotation in Ghidra (see ). Performing this action on a variable will create an annotation if one didn't exist previously, which will generally be - visible as part of the function header in the Listing window. + visible as part of the function header in any Listing window. Performing this action on a function parameter causes a formal annotation to @@ -4668,7 +4799,7 @@ - Set Equate ... + Set Equate... Change the display of the integer or character constant under the cursor to an equate string. @@ -4685,12 +4816,12 @@ OK, completes the action, and the selected equate is substituted for its constant. - This command primarily targets a constant token in the Decompiler window, but + This command primarily targets a constant token in a Decompiler window, but if there is a scalar operand in an instruction that corresponds - with the selected constant, the same equate is also applied to the scalar in the Listing + with the selected constant, the same equate is also applied to the scalar in any Listing window. This is equivalent to selecting the - Set Equate command from the - Listing. There may not be a scalar operand directly corresponding to the selected constant, in + Set Equate command from a + Listing window. There may not be a scalar operand directly corresponding to the selected constant, in which case the equate will be applied only in the Decompiler window. @@ -4706,11 +4837,11 @@ possible range. - The decompiler defines high-level variables in terms of varnodes that + The Decompiler defines high-level variables in terms of varnodes that are merged together to produce the final variable. Some merging is speculative, which reduces the number of variables overall, but is not strictly necessary for valid decompilation. The merged - variable can be represented with two or more variables that have a smaller range. See the - documentation on . + variable can be represented with two or more variables that have a smaller range (see the + documentation on ). This command is only available if the selected token is part of a high-level variable that has diff --git a/Ghidra/Features/Decompiler/src/main/doc/decompileplugin_common.xsl b/Ghidra/Features/Decompiler/src/main/doc/decompileplugin_common.xsl index 3bc53ab61b..558da89ea8 100644 --- a/Ghidra/Features/Decompiler/src/main/doc/decompileplugin_common.xsl +++ b/Ghidra/Features/Decompiler/src/main/doc/decompileplugin_common.xsl @@ -32,13 +32,13 @@ - - - - - - - + + + + + + + diff --git a/Ghidra/Features/Decompiler/src/main/doc/decompileplugin_pdf.xsl b/Ghidra/Features/Decompiler/src/main/doc/decompileplugin_pdf.xsl index c9a5c088ac..e315726e5c 100644 --- a/Ghidra/Features/Decompiler/src/main/doc/decompileplugin_pdf.xsl +++ b/Ghidra/Features/Decompiler/src/main/doc/decompileplugin_pdf.xsl @@ -6,6 +6,25 @@ + + + + + + + + + + + + + + + + + + diff --git a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerAnnotations.html b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerAnnotations.html index 101ef19df7..da482eee40 100644 --- a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerAnnotations.html +++ b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerAnnotations.html @@ -22,34 +22,35 @@

Individual machine instructions make up the biggest source of information when the - decompiler analyzes a function. Instructions are translated from their - processor specific form into Ghidra's IR language (see “P-code”), + Decompiler analyzes a function. Instructions are translated from their + processor-specific form into Ghidra's IR language (see P-code), which provides both the control-flow behavior of the instruction and the detailed - semantics describing how the processor and memory state is affected. The translation is controlled by + semantics describing how the processor and memory state are affected. The translation is controlled by the underlying processor model and, except in limited circumstances, cannot be directly altered - from the tool. Flow Overrides (see below) can change how certain control-flow is translated, - and, depending on the processor, context registers may affect p-code (see “Context Registers”). + from the tool. Flow Overrides (see below) can change how certain control flow is translated + and, depending on the processor, how context registers affect p-code (see Context Registers).

Outside of the tool, users can modify the model specification itself. - See the document "SLEIGH: A Language for Rapid Processor Specification". + See the document "SLEIGH: A Language for Rapid Processor Specification."

- Decompiling a function starts by analyzing control-flow starting from the function's - first instruction. Control-flow is traced to additional instructions using flow information - from the underlying processor model. All paths are traced through instructions with - fall through, conditional jump, and other + Decompiling a function starts by analyzing the control flow of machine instructions. + Control flow is traced from the first instruction, through additional instructions depending + on their flow semantics (see P-code Control Flow). All paths are traced through instructions with + any form of fall-through or jump semantics until an instruction with terminator semantics is - reached, which is usually a "return from subroutine" - instruction. Flow is not traced into called functions, in this situation. Instructions + reached, which is usually a formal return (return from subroutine) instruction. + Flow is not traced into called functions, in this situation. Instructions with call semantics are treated only as if they fall through.

- An entry point is the address of the function's first instruction. + An entry point is the address of the instruction first + executed when the function is called.

A function body is the set of addresses reached by control-flow - analysis (and the machine instructions at those addresses). + analysis and the machine instructions at those addresses.

@@ -57,21 +58,21 @@

The entry point address for a function plays a pivotal role for - analysis using the Ghidra decompiler. Ghidra generally associates + analysis using the Decompiler. Ghidra generally associates a formal Function Symbol and an underlying Function object at this address, which are the key elements that - need to be present to trigger decompilation. - (See Functions) + need to be present to trigger decompilation + (see Functions). The Function object stores the function body, parameters, local variables, and other information critical to the decompilation process.

Function Symbols and Function objects are generally created automatically by a Ghidra - analyzer when initially importing a binary executable and running auto-analysis. - If necessary however, a user can manually create a Function object from the Listing window - by using Create Function command (pressing the 'f' key), when the cursor - is placed on the function's entry point. - (See Create Function) + analyzer when initially importing a binary executable and running Auto Analysis. + If necessary, however, a user can manually create a Function object from a Listing window + by using the Create Function command (pressing the 'F' key), when the cursor + is placed on the function's entry point + (see Create Function).

@@ -92,11 +93,11 @@ - The decompiler does not use the formal function body when it computes - control-flow; it recomputes its own idea of the function body starting from the entry point + The Decompiler does not use the formal function body when it computes + control flow; it recomputes its own idea of the function body starting from the entry point it is handed. If the formal function body was created manually, using a selection for instance, - or in other extreme circumstances, the decompiler's view of the function body may not match - the formal view. This can lead to confusing behavior, where clicking in a decompiler window + or in other extreme circumstances, the Decompiler's view of the function body may not match + the formal view. This can lead to confusing behavior, where clicking in a Decompiler window may unexpectedly navigate the window away from the function.
@@ -108,14 +109,14 @@

Control-flow behavior for a machine instruction is generally determined by its underlying - p-code (see “P-code Control Flow”), but this can be changed by applying a Flow Override. + p-code (see P-code Control Flow), but this can be changed by applying a Flow Override. A Flow Override maintains the overall semantics of a branching instruction but changes how the branch is interpreted. For instance, a JMP instruction, which traditionally represents a branch within a single function, can be overridden to represent a call to a new function. Flow Overrides are applied by Analyzers or manually by the user.

- The decompiler automatically incorporates any relevant Flow Overrides into its + The Decompiler automatically incorporates any relevant Flow Overrides into its analysis of a function. This can have a significant impact on results. The types of possible Flow Overrides include:

@@ -130,7 +131,7 @@ the call target becomes the branch destination, and the instruction is no longer assumed to fall through. RETURN instructions become an - indirect branch, and the decompiler will attempt to recover branch + indirect branch, and the Decompiler will attempt to recover branch destinations using switch analysis.

@@ -181,13 +182,13 @@ Comments

- The decompiler automatically incorporates comments from the Program database into its + The Decompiler automatically incorporates comments from the Program database into its output. Comments in Ghidra are centralized and can be created and displayed by multiple - Program views, including the decompiler. Comments created from a decompiler window will - show up in the Listing window for instance, and vice versa. + Program views, including the Decompiler. Comments created from a Decompiler window will + show up in a Listing window for instance, and vice versa.

- For the purposes of understanding comments within the decompiler, keep in mind that: + For the purposes of understanding comments within the Decompiler, keep in mind that:

    @@ -195,7 +196,7 @@ An individual comment is associated with a specific address in the Program.
  • - There are 5 different kinds of comments. + There are 5 different types of comments:
    • Plate @@ -228,7 +229,7 @@ Display

- The decompiler collects and displays comments associated with any address in the + The Decompiler collects and displays comments associated with any address in the formal function body currently decompiling. The comments are integrated line by line into the decompiled code, and an individual comment is displayed on the line before the @@ -238,24 +239,24 @@

Because a single line of code typically encompasses multiple machine instructions, there is a possibility that multiple comments at different addresses apply to - the same line. In this case, the decompiler displays each comment on its + the same line. In this case, the Decompiler displays each comment on its own line, in address order, directly before the line of code.

- Because the output of the decompiler can be a heavily transformed version compared - to the original machine instructions, its possible that individual instructions + Because the output of the Decompiler can be a heavily transformed version compared + to the original machine instructions, it is possible that individual instructions no longer have explicit tokens representing them in the output. Comments attached - to these instruction will still be displayed in the decompiler output with the + to these instruction will still be displayed in the Decompiler output with the closest associated line of code, usually within the same basic block.

- By default, the decompiler displays only the Pre comments + By default, the Decompiler displays only the Pre comments within the body of the function. It also displays Plate comments, but only if they are attached to the entry point - of the function. In this case, they are displayed first in the decompiler output, + of the function. In this case, they are displayed first in the Decompiler output, along with WARNING comments, before the function declaration. Other comment - types can be configured to display in decompiler output, by changing the - decompiler Display options (See Display <kind-of> Comments). + types can be configured to be part of Decompiler output by changing the + Decompiler display options (see Display <kind-of> Comments).

@@ -263,7 +264,7 @@
- Unlike the Listing window, the decompiler does not alter how a comment is + Unlike a Listing window, the Decompiler does not alter how a comment is displayed based on its type. All enabled types of comment are displayed in the same way, on a separate line before the line of code associated with the address. @@ -276,7 +277,7 @@ Unreachable Blocks

- The decompiler may decide as part of its analysis that individual + The Decompiler may decide as part of its analysis that individual basic blocks are unreachable and not display them in the output. In this case, any comments associated with addresses in the unreachable block will also not be displayed. @@ -288,9 +289,9 @@ Warning Comments

- The decompiler can generate internal warnings during its analysis and will incorporate - them into the output as comments in the same way as the user defined - comments described above. They are not part of Ghidra's comment system however and + The Decompiler can generate internal warnings during its analysis and will incorporate + them into the output as comments in the same way as the user-defined + comments described above. They are not part of Ghidra's comment system, however, and cannot be edited. They can be distinguished from normal comments by the word 'WARNING' at the beginning of the comment.

@@ -308,10 +309,10 @@

Variable annotations are the most important way to get names and data-types - that are meaningful to the user incorporated into the decompiler's output. + that are meaningful to the user incorporated into the Decompiler's output. A variable in this context is loosely defined as any piece of memory that code in the Program treats as a logical entity. - The decompiler works to incorporate all forms of annotation into its output + The Decompiler works to incorporate all forms of annotation into its output for any variable pertinent to the function being analyzed.

@@ -344,9 +345,9 @@ local to a function.

- Global variables annotations are created from the tool by applying a data-type to a memory - location in the Listing window, either by invoking a command from the Data - pop-up menu, or dragging a data-type from the Data Type Manager + Global variable annotations are created from the tool by applying a data-type to a memory + location in a Listing window, either by invoking a command from the Data + pop-up menu or by dragging a data-type from the Data Type Manager window directly onto the memory location. Refer to the documentation:

@@ -357,7 +358,7 @@

- Local variables annotations are created from the Listing from various editor dialogs. See in particular: + Local variables annotations are created from the Listing using various editor dialogs. See, in particular:

    @@ -381,16 +382,16 @@ @@ -408,7 +409,7 @@ In order to widely accommodate different use cases, Ghidra's symbol table has extremely lax naming rules. Ghidra may allow names that conflict with the stricter rules of the language - the decompiler is attempting to produce. The decompiler does not currently + the Decompiler is attempting to produce. The Decompiler does not currently have an option that checks for this. Users should be aware of:

    @@ -426,7 +427,7 @@ Ghidra allows different functions to have the same name, even within the same namespace, in order to model languages that support function overloading. In most languages, such functions would be expected to have distinct prototypes to allow - the symbols to be distinguished in context. Ghidra and the decompiler however do not check + the symbols to be distinguished in context. Ghidra and the Decompiler, however, do not check for this, as prototypes may not be known.

    @@ -440,10 +441,10 @@ Variable Scope

- All variables belong either to a global or local - scope, which directly affects how the variable is treated in the decompiler's data-flow + All variables belong to either a global or local + scope, which directly affects how the variable is treated in the Decompiler's data-flow analysis. - Annotations created by applying a data-type directly to a memory location in the listing + Annotations created by applying a data-type directly to a memory location in the Listing are automatically added to the formal global namespace. Ghidra can create other custom namespaces that are considered global in this sense, and renaming actions provide options that let individual global annotations be moved into @@ -452,7 +453,7 @@ create variable annotations that are local to that function.

- A global variable annotation forces the decompiler to treat the memory location as if its value + A global variable annotation forces the Decompiler to treat the memory location as if its value persists beyond the end of the function. The variable must exist at all points of the function body, generally at the same memory location.

@@ -462,9 +463,9 @@ at the instruction that first writes to them, and then exist only up to the last instruction that reads them. The memory location storing a local variable at one point of the function may be reused for different variables at other points. - This can cause ambiguity in how the decompiler should treat a given memory location used + This can cause ambiguity in how the Decompiler should treat a given memory location used for storing local variables, which the user may want to steer. See the discussion - in “Variable Storage”. + in Variable Storage.

@@ -481,19 +482,19 @@ data-types that are tailored for the Program being analyzed. Data-types that are explicitly part of a variable annotation are, to the extent possible, automatically incorporated - into the decompiler's analysis. + into the Decompiler's analysis.

Data-types Supported by the Decompiler

- The decompiler understands traditional primitive data-types, in all their various sizes, + The Decompiler understands traditional primitive data-types in all their various sizes, like integers, floating-point numbers, booleans, and characters. It also understands pointers, structures, and arrays, letting it support arbitrarily complicated composite data-types. Ghidra provides some data-types with specialized display capabilities that don't have a natural representation - in the high-level language output by the decompiler. The decompiler treats these as + in the high-level language output by the Decompiler. The Decompiler treats these as black-box data-types, preserving the name, but treating the underlying data either as an integer or simply as an array of bytes.

@@ -503,16 +504,16 @@ Undefined

- The undefined data-types are supported, in their various sizes: + The undefined data-types are supported in their various sizes: undefined1, undefined2, undefined4, etc. In Ghidra, the undefined - data-types, let the user specify the size of a variable, while formally declaring that + data-types let the user specify the size of a variable, while formally declaring that other details about the data-type are unknown.

- For the decompiler, undefined data-types as an annotation have the important special meaning - that the decompiler should let its analysis determine the final data-type presented in the - output for the variable (See “Forcing Data-types” below). + For the Decompiler, undefined data-types, as an annotation, have the important special meaning + that the Decompiler should let its analysis determine the final data-type presented in the + output for the variable (see Forcing Data-types below).

@@ -521,10 +522,10 @@

The void data-type is supported but treated specially by - the decompiler, as does Ghidra in general. A void can be + the Decompiler, as does Ghidra in general. A void can be used to indicate the absence of a return value in function prototypes, but cannot be used as a general annotation on variables. A void pointer, void *, - is possible; the decompiler treats it as a pointer to an unknown data-type. + is possible; the Decompiler treats it as a pointer to an unknown data-type.

@@ -534,7 +535,7 @@

Integer data-types, both signed and unsigned, are supported up to a size of 8 bytes. Larger sizes are supported internally but are generally represented as an array of bytes in - decompiler output. Odd integer sizes are also supported. + Decompiler output. Nonstandard integer sizes of 3, 5, 6, and 7 bytes are also supported.

The standard C data-type names: int, short, @@ -558,7 +559,7 @@ Floating-point sizes of 4, 8, 10, and 16 are supported, mapping in all cases currently to the float, double, float10, and float16 - data-types respectively. The decompiler currently cannot display floating-point constants + data-types, respectively. The Decompiler currently cannot display floating-point constants that are bigger than 8 bytes.

@@ -567,8 +568,8 @@ Character

- ASCII or Unicode encoded character data-types are supported for sizes of 1, 2, and 4. The size effectively - chooses between the UTF8, UTF16, and UTF32 character encodings respectively. The standard + ASCII- and Unicode-encoded character data-types are supported for sizes of 1, 2, and 4. The size effectively + chooses between the UTF8, UTF16, and UTF32 character encodings, respectively. The standard C data-type names char and wchar_t are mapped to one of these sizes based on the processor and compiler selected when importing the Program. @@ -579,11 +580,11 @@ String

- Terminated strings, encoded either in ASCII or Unicode, are supported. The decompiler converts + Terminated strings, encoded either in ASCII or Unicode, are supported. The Decompiler converts Ghidra's dedicated string data-types like string to - an "array of characters" data-type, such as char[], + an array-of-characters data-type, such as char[], where the character size matches the encoding. - A "pointer to character" data-type like + A pointer-to-character data-type like

    @@ -596,11 +597,11 @@

- is also treated as a potential string reference. The decompiler can infer terminated strings if this + is also treated as a potential string reference. The Decompiler can infer terminated strings if this kind of data-type propagates to constant values during its analysis.

- Strings should be fully rendered in decompiler output, + Strings should be fully rendered in Decompiler output, with non-printable characters escaped using either traditional sequences like '\r', '\n' or using Unicode escape sequences like '\xFF'.

@@ -611,22 +612,22 @@

Pointer data-types are fully supported. A pointer to any other supported data-type is - possible. The data-type being pointed to, whether its a primitive, structure, or another pointer, - informs how the decompiler renders a dereferenced pointer. - The decompiler assumes that a pointer variable may refer to an array of + possible. The data-type being pointed to, whether it is a primitive, structure, or another pointer, + informs how the Decompiler renders a dereferenced pointer. + The Decompiler assumes that a pointer variable may refer to an array of the underlying data-type and will use array notation if there is evidence of more than one element.

The default pointer size is set based on the processor and compiler selected when the Program is - imported and generally matches the size of the ram (or equivalent) - address space. Different pointer sizes within the same Program are possible. The decompiler generally + imported and generally matches the size of the ram or equivalent + address space. Different pointer sizes within the same Program are possible. The Decompiler generally expects the pointer size to match the size of the address space being pointed to, but individual architectures can model different size pointers into the space (such as near pointers).

For processors with more than one memory address space, pointer data-types currently cannot be directly - annotated to indicate a preferred address space. Where there is ambiguity, the decompiler attempts to + annotated to indicate a preferred address space. Where there is ambiguity, the Decompiler attempts to determine the correct address space from the context of its use within the function.

@@ -644,10 +645,10 @@ Structure

- Structured data-types are fully supported. The decompiler does not automatically infer structures - when analyzing a function; it propagates structured data-types into the function from explicitly - annotated sources, like input parameters or global variables. Decompiler directed creation of - structures can be triggered by the user, see “Auto Create Structure”. + Structure data-types are fully supported. The Decompiler does not automatically infer structures + when analyzing a function; it propagates them into the function from explicitly + annotated sources, like input parameters or global variables. Decompiler-directed creation of + structures can be triggered by the user (see Auto Create Structure).

@@ -655,12 +656,12 @@ Enumeration

- Enumerations are fully supported. The decompiler can propagate enumerations from explicitly + Enumerations are fully supported. The Decompiler can propagate enumerations from explicitly annotated sources throughout a function onto constants, which are then displayed with the appropriate label from the definition of the enumeration. If the constant does not match a - single value in the enumeration definition, the decompiler attempts to build a matching + single value in the enumeration definition, the Decompiler attempts to build a matching value by or-ing together multiple labels. - The decompiler can be made to break out constants representing packed flags, + The Decompiler can be made to break out constants representing packed flags, for instance, by labeling individual bit values within an enumeration.

@@ -671,7 +672,7 @@

A Function Definition in Ghidra is a data-type that encodes information about the parameters and return value for a generic/unspecified function. - A formal function pointer is supported by the decompiler as a pointer + A formal function pointer is supported by the Decompiler as a pointer data-type that points to a Function Definition. A Function Definition specifically encodes:

@@ -686,7 +687,7 @@ The data-type associated with the return value.
  • - An indicator of the prototype model that should be + The name of a generic calling convention associated with the function.
  • @@ -695,26 +696,28 @@

    The Function Definition itself does not encode any storage information. Once the Function - Definition is associated with a Program, the indicator maps to one of the prototype models for the - specific processor and compiler. A Function Definition is currently limited to a prototype model + Definition is associated with a Program, its generic calling convention maps to one of the + specific prototype models for the processor and compiler. The prototype model is then used + to assign storage for parameters and return values, wherever the Function Definition is applied. + A Function Definition is currently limited to a prototype model with one of the following names:

    • - __stdcall + __stdcall
    • - __thiscall + __thiscall
    • - __fastcall + __fastcall
    • - __cdecl + __cdecl
    • - __vectorcall + __vectorcall
    @@ -728,13 +731,13 @@ Forcing Data-types

    - The decompiler performs type propagation as part of its analysis - on functions. Data-type information is collected from variable annotations (and other sources), - which is then propagated via data-flow throughout the function to other variables and + The Decompiler performs type propagation as part of its analysis + on functions. Data-type information is collected from variable annotations and other sources, + which is then propagated via data flow throughout the function to other variables and constants where the data-type may not be immediately apparent.

    - With few exceptions, a variable annotation is forcing on the decompiler in the sense + With few exceptions, a variable annotation is forcing on the Decompiler in the sense that the storage location being annotated is considered an unalterable data-type source. During type propagation, the data-type may propagate to other variables, but the variable representing the storage location being annotated is guaranteed to have @@ -746,15 +749,15 @@

    - Users should be aware that variable annotations are forcing on the decompiler and may directly + Users should be aware that variable annotations are forcing on the Decompiler and may directly override aspects of its analysis. Because of this, variable annotations are the most powerful way - for the user to affect decompiler output, but setting an incomplete (or incorrect) data-type as - part of an annotation may produce poorer decompiler output. + for the user to affect Decompiler output, but setting an incomplete or incorrect data-type as + part of an annotation may produce poorer Decompiler output.

    The major exception to forcing annotations is if the data-type in the annotation is undefined. - Ghidra reserves the following names to represent formally undefined data-types: + Ghidra reserves specific names to represent formally undefined data-types, such as:

      @@ -762,7 +765,6 @@
    • undefined2
    • undefined4
    • undefined8
    • -
    • ...

    @@ -770,29 +772,29 @@ The number in the name only specifies the number of bytes in the variable.

    - The decompiler views a variable annotation with an undefined data-type only as an indication of what name + The Decompiler views a variable annotation with an undefined data-type only as an indication of what name should be used if a variable at that storage address exists. The data-type for the variable is filled in, using type propagation from other sources.

    For annotations that specifically label a function's formal parameters or return value, - the Signature Source also affects how they're treated by the decompiler. + the Signature Source also affects how they're treated by the Decompiler. If the Signature Source is set to anything other than DEFAULT, there is a forced - one-to-one correspondence between variable annotations and actual parameters in the decompiler's - view of the function. This is stronger than just forcing the data-type; the existence (or not) of + one-to-one correspondence between variable annotations and actual parameters in the Decompiler's + view of the function. This is stronger than just forcing the data-type; the existence or nonexistence of the variable itself is forced by the annotation in this case. If the Signature Source is forcing and there are no parameter annotations, a void prototype is forced on the function.

    A forcing Signature Source is set typically if debug symbols for the function are read in during - Program import (IMPORTED), or if the user manually edits the function prototype + Program import (IMPORTED) or if the user manually edits the function prototype directly (USER_DEFINED).

    If an annotation and the Signature Source force a parameter to exist, specifying an - undefined data-type in the annotation still directs the decompiler to fill in + undefined data-type in the annotation still directs the Decompiler to fill in the variable's data-type using type propagation. The same holds true for the return value; an - undefined annotation fixes the size of the return value, but the decompiler + undefined annotation fixes the size of the return value, but the Decompiler fills in its own data-type.

    @@ -801,9 +803,9 @@
    - The decompiler may still use an undefined data-type to label a variable, + The Decompiler may still use an undefined data-type to label a variable, even after type propagation. If a variable is simply copied around within a function and there - are no other substantive operations or annotations on the variable, the decompiler may decide the undefined + are no other substantive operations or annotations on the variable, the Decompiler may decide the undefined data-type is appropriate.
    @@ -819,10 +821,10 @@ Every variable annotation is associated with a single storage location, where the value of the variable is stored during execution: generally a register, stack location, or an address in the load image of the Program. The storage location does not necessarily hold the value for that - variable at all points of execution, and its possible for the variable value to be held in + variable at all points of execution, and it is possible for the variable value to be held in different storage locations at different points of execution. The set of execution points where the storage location does hold the variable value is called the annotation - scope; this is distinct from (but influenced by) the scope of the + scope; this is distinct from, but influenced by, the scope of the variable itself. The different types of storage location are listed below.

    @@ -833,21 +835,21 @@ A load-image address is a concrete address in the load image of the Program, typically in the ram address space. This kind of storage must be backed by a formal memory block for the Program, which typically corresponds to a specific - program section (such as the .text or .bss section). Because it is in the - load image directly, an annotation with this storage shows up directly in the Listing + program section, such as the .text or .bss section. Because it is in the + load image directly, an annotation with this storage shows up directly in any Listing window and can be directly manipulated there. In much of the Ghidra documentation, these annotations - are referred to as Data. See the section - Data in particular. + are referred to as Data. See the + Data section, in particular.

    - Although specific architectures may vary, generally a storage location at a load image address + Although specific architectures may vary, a storage location at a load image address generally represents a formal global variable, and the annotation is in scope - across all Program execution. For the decompiler, the storage location is treated as a + across all Program execution. For the Decompiler, the storage location is treated as a a single persistent variable in all functions that reference it. Within a - function, all distinct references to the storage location (varnodes) are merged. The decompiler + function, all distinct references (varnodes) to the storage location are merged. The Decompiler expects a value at the storage location to exist from before the start of the function, and any change to the value must be explicitly represented as an assignment to - the variable in decompiler output. + the variable in Decompiler output.

    @@ -859,33 +861,33 @@ of a particular function in the Program. Formally, a stack address is defined as an offset relative to the incoming value of the stack pointer and exists in the stack address space associated with the function. See the discussion - in “Address Space”. A stack annotation then is a variable annotation + in Address Space. A stack annotation then is a variable annotation with a stack address as its storage location. It exists only in the scope of a single function and the variable must be local to that function.

    - Within the Listing window, a stack annotation is displayed as part of the function header - (at the entry point address of the function), with a syntax similar to: + Within a Listing window, a stack annotation is displayed as part of the function header + at the entry point address of the function, with a syntax similar to:

    undefined4 Stack[-0x14]:4 local_14

    The middle field (the Variable Location field) indicates that the storage location is on the - stack, and the value in brackets indicates the offset of the storage location, relative to the incoming + stack, and the value in brackets indicates the offset of the storage location relative to the incoming stack pointer. The value after the colon indicates the number of bytes in the storage location.

    Currently, the entire body of the function is included - in the scope of any stack annotation, and the decompiler will allow only a single variable to exist + in the scope of any stack annotation, and the Decompiler will allow only a single variable to exist at the stack address. A stack annotation can be a formal parameter to the function, but otherwise the - decompiler does not expect to see a value that exists before the start of the function. + Decompiler does not expect to see a value that exists before the start of the function.

    - The decompiler will continue to perform copy propagation and other transforms on - stack locations associated with a variable annotation. In particular, within decompiler output, - a specific write operation to a stack address may not show up as an explicit assignment to its variable, - if the value is simply copied to another location. + The Decompiler will continue to perform copy propagation and other transforms on + stack locations associated with a variable annotation. In particular, within Decompiler output, + if the value is simply copied to another location, + a specific write operation to a stack address may not show up as an explicit assignment to its variable.

    @@ -895,7 +897,7 @@

    A variable annotation can refer to a specific register for the processor associated with the Program. In general, such an annotation will be for a variable local to a particular function. - Within the Listing window, this annotation is displayed as part of the function header, with + Within a Listing window, this annotation is displayed as part of the function header, with syntax like:

    @@ -906,28 +908,28 @@ the annotation, and the value after the colon indicates the number of bytes in the register.

    - For a local variable annotations with a register storage location, there is an expectation that the + For local variable annotations with a register storage location, there is an expectation that the register may be reused for different variables at different points of execution within the function. There may be more than one annotation, for different variables, that share the same register storage location. An annotation is associated with a first use point that describes where - the register first holds a value for the particular variable. (See the discussion - “Varnodes in the Decompiler”) + the register first holds a value for the particular variable (see the discussion - Varnodes in the Decompiler). The entire scope of the annotation is limited to the address regions between the first use point - and any points where the value is read. The decompiler may extend the scope as part of its + and any points where the value is read. The Decompiler may extend the scope as part of its merging process, but the full extent is not stored in the annotation.

    -Temporary Registers

    +Temporary Register

    Variable annotations can have a temporary register as a storage location. A temporary register is not specific to a processor but is produced at various stages of the decompilation process. See the discussion of the unique - space in “Address Space”. These registers do not have a meaningful name, and - the specific storage address may change on successive decompilations. So within the - Listing window, this annotation is displayed as part of the function header, + space in Address Space. These registers do not have a meaningful name, and + the specific storage address may change on successive decompilations. So, within a + Listing window, this annotation is displayed as part of the function header with syntax like:

    @@ -935,7 +937,7 @@

    The Variable Location field displays the internal hash used to uniquely - identify the temporary register within the data-flow of the function. + identify the temporary register within the data flow of the function.

    A temporary register annotation must be for a local variable, and as with an ordinary register, @@ -953,7 +955,7 @@

    Every formal Function in Ghidra is associated with a set of variable annotations and other properties that make up the function prototype. Due to the nature of reverse engineering, - the function prototype may only include partial information and may be built up over time. Individual + the function prototype may include only partial information and may be built up over time. Individual elements include:

    @@ -962,38 +964,53 @@

    Each formal input to the function can have a Variable Annotation that describes its name, data-type, - and storage location, at the moment control-flow enters the function. If annotations exist, they are shown - in the Listing Window as part of the Function header, and they usually correspond directly with symbols in the - function declaration produced by the decompiler. + and storage location. The storage location applies at the moment control flow enters the function. + If annotations exist, they are shown + in a Listing window as part of the Function header, and they usually correspond directly with symbols in the + function declaration produced by the Decompiler.

    Return Value

    The value returned by a function can have a special Variable Annotation that describes its data-type - and storage location, at the moment control-flow exits the function. If it exists, the annotation is shown - in the Listing Window as part of the Function header with the name <RETURN>, and it usually + and storage location. The storage location applies at the moment control flow exits the function. If it exists, the annotation is shown + in a Listing window as part of the Function header with the name <RETURN>, and it usually corresponds directly with the return value in the function declaration produced by - the decompiler. + the Decompiler. +

    +
    +
    Auto-Parameters
    +
    +

    + Specific prototypes may require auto-parameters like this + or __return_storage_ptr__. These are special input parameters + that compilers may use to implement specific high-level language concepts. See the discussion + in Auto-Parameters. Within Ghidra, auto-parameters are automatically created by the + Function Editor Dialog + if the desired prototype requires them. + Within a Listing window, auto-parameters look like other parameter annotations, but the storage field shows the + string (auto). Decompiler output will generally display auto-parameters as explicit variables + rather than hiding them.

    Calling Convention

    - The calling convention used by the function can be specified as part of the function prototype. The convention - is specified by name, referring to the formal “Prototype Model” that describes how storage + The calling convention used by the function is specified as part of the function prototype. The convention + is specified by name, referring to the formal Prototype Model that describes how storage locations are selected for individual parameters along with other information about how the compiler treats - the function. Available models are determined by the processor and compiler, but can be extended by the user. - See “Specification Extensions”. + the function. Available models are determined by the processor and compiler, but may be extended by the user + (see Specification Extensions).

    - In the absence of parameter and return value annotations, the decompiler will use the prototype model as + In the absence of input parameter and return value annotations, the Decompiler will use the prototype model as part of its analysis to discover the input parameters and the return value of the function.

    - The name "unknown" is reserved to indicate that nothing is known about the calling convention. If - set to "unknown", depending on context, the decompiler may assign the calling convention based on - the Prototype Evaluation option (See Prototype Evaluation), or it + The name unknown is reserved to indicate that nothing is known about the calling convention. If + set to unknown, depending on context, the Decompiler may assign the calling convention based on + the Prototype Evaluation option (see Prototype Evaluation), or it may use the default calling convention for the architecture.

    @@ -1001,30 +1018,30 @@

    Functions have a boolean property called variable arguments, which can be turned on - if the function is capable of being passed a variable number of inputs. This property informs the decompiler that + if the function is capable of being passed a variable number of inputs. This property informs the Decompiler that the function may take additional parameters beyond any with an explicit variable annotation. This affects decompilation of any function which calls the variable arguments function, allowing - the decompiler to discover unlisted parameters at a given call site. + the Decompiler to discover unlisted parameters at a given call site.

    No Return

    - A function can be marked explicitly as not returning, meaning that once - a call is made to the function, execution will never return to the caller. The decompiler uses this to - compute the correct control-flow in any calling functions. + A function can be marked with the no return property, meaning that once + a call is made to the function, execution will never return to the caller. The Decompiler uses this to + compute the correct control flow in any calling functions.

    In-Line

    - If the boolean property in-line is turned on for a particular function, - it directs the decompiler to inline the effects of the function into the decompilation of any of its calling functions. - The function will no longer appear as a direct function call in the decompilation, but all of its data-flow + If the in-line property is turned on for a particular function, + it directs the Decompiler to inline the effects of the function into the decompilation of any of its calling functions. + The function will no longer appear as a direct function call in the decompilation, but all of its data flow will be incorporated into the calling function.

    - This is useful for bookkeeping functions, where its important for the decompiler to + This is useful for bookkeeping functions, where it is important for the Decompiler to see its effects on the calling function. Functions that set up the stack frame for a caller or functions that look up or dispatch a switch destination are typical examples that should be marked in-line.

    @@ -1033,7 +1050,7 @@

    This property is similar in spirit to marking a function as in-line. - A call-fixup directs the decompiler to replace any call to the function with a specific + A call-fixup directs the Decompiler to replace any call to the function with a specific chunk of raw p-code. The decompilation of any calling function no longer shows the function call, but the chunk of p-code incorporates the called function's effects.

    @@ -1044,7 +1061,7 @@

    Call-fixups are specified by name. The name and associated p-code chunk are typically defined in the compiler specification for the Program. Users can extend the available set - of call-fixups. See “Specification Extensions”. + of call-fixups (see Specification Extensions).

    @@ -1059,7 +1076,7 @@ Ghidra records a Signature Source for every function, indicating the origin of its prototype information. This is similar to the Symbol Source attached to Ghidra's symbol annotations - (See the documentation for + (see the documentation for Filtering in the Symbol Table). The possible types are:

    @@ -1090,8 +1107,8 @@

    If the Signature Source is set to anything other than DEFAULT, the - function's prototype information is forcing on the decompiler. See the discussion - in “Forcing Data-types” + function's prototype information is forcing on the Decompiler (see the discussion + in Forcing Data-types).

    @@ -1100,8 +1117,8 @@

    The input parameter and return value annotations of the function prototype, like - any variable annotations, can be forcing on the decompiler. - See the complete discussion in “Forcing Data-types”. + any variable annotations, can be forcing on the Decompiler + (see the complete discussion in Forcing Data-types). But keep in mind:

    @@ -1110,13 +1127,13 @@
    - The input parameters and return value are all forced on the decompiler as a unit based on the + The input parameters and return value are all forced on the Decompiler as a unit based on the Signature Source. They are all forced if the type is set to anything other than DEFAULT; otherwise none of them are forced.

    - If the function prototype's annotations are not forced, the decompiler will attempt to discover the parameters + If the function prototype's annotations are not forcing, the Decompiler will attempt to discover the parameters and return value using the calling convention. The prototype model underlying the calling convention dictates which storage locations can be considered as parameters and their formal ordering.

    @@ -1133,7 +1150,7 @@ can be built for the function.

    - The decompiler will disregard the calling convention's rules in this situation and use the custom storage + The Decompiler will disregard the calling convention's rules in this situation and use the custom storage locations for parameters and the return value. Other aspects of the calling convention, like the unaffected list, will still be used.

    @@ -1145,8 +1162,8 @@ Data Mutability

    - Mutability is a description of how values in a specific memory region - (either a single variable or a larger block) can change during Program execution, based either on + Mutability is a description of how values in a specific memory region, + either a single variable or a larger block, can change during Program execution based either on properties or established rules. Ghidra recognizes the mutability settings:

    @@ -1157,17 +1174,17 @@

    - Mutability affects decompiler analysis and can have a large impact the output. + Mutability affects Decompiler analysis and can have a large impact on the output.

    - Most memory has normal mutability, meaning: + Most memory has normal mutability; the value at the memory location may change over the course of executing the Program, but for a given section of code, the value will not change unless an instruction explicitly writes to it.

    Mutability can be set on an entire block of memory in the Program, typically from the Memory Map. - It can also be set as part of a single Variable Annotation. From the Listing Window for instance, + It can also be set as part of a single Variable Annotation. From a Listing window, for instance, use the Settings dialog.

    @@ -1177,9 +1194,9 @@

    The constant mutability setting indicates that values within the memory region are read-only and don't change during Program execution. If a read-only variable is - accessed in a function being analyzed by the decompiler, its constant value, if present in the - Program's load image, replaces the variable within data-flow for the - function. The decompiler may propagate the constant and fold it in to other operations, which + accessed in a function being analyzed by the Decompiler, its constant value, if present in the + Program's load image, replaces the variable within data flow for the + function. The Decompiler may propagate the constant and fold it in to other operations, which can have a substantial impact on the final output.

    @@ -1190,18 +1207,18 @@

    The volatile mutability setting indicates that values within the memory region may change unexpectedly, even if the code currently executing does not directly - write to it. If a volatile variable is accessed in a function being analyzed by the decompiler, + write to it. If a volatile variable is accessed in a function being analyzed by the Decompiler, each specific access is replaced with a built-in function call, which prevents constant propagation and other transforms across the access. The built-in functions are named based on - whether the access is a read or write and then the size - of the access. Within decompiler output, the first parameter to a built-in function is a symbol + whether the access is a read or write and on the size + of the access. Within the Decompiler output, the first parameter to a built-in function is a symbol indicating the volatile variable. The function returns a value in the case of a volatile read or takes a second parameter in the case of a volatile write.

    -	  write_volatile_1(DAT_mem_002b,0x20);
     	  X = read_volatile_2(SREG);
    +	  write_volatile_1(DAT_mem_002b,0x20);
     	

    @@ -1214,14 +1231,14 @@ Constant Annotations

    - Ghidra provides numerous actions to control how specific constants are formatted or displayed. - An annotation can be applied directly to a constant in the Decompiler Window, which always affects - decompiler output. Or, an annotation can be applied to the constant operand of a specific machine - instruction displayed in the Listing Window. In this case, to the extent possible, the decompiler - attempts to track the operand and apply the annotation to the matching constant in the decompiler output. - However, the constant may be transformed from its value in the original machine instruction during the decompiler's - analysis. The decompiler will follow the constant through simple transformations, but if the constant strays - too far from its original value, the annotation will not be applied. The transforms followed are: + Ghidra provides numerous actions to control how a specific constant is formatted or displayed. + An annotation can be applied directly to a constant in a Decompiler window, which always affects + Decompiler output. Or, an annotation can be applied to the constant operand of a specific machine + instruction displayed in a Listing window. In this case, to the extent possible, the Decompiler + attempts to track the operand and apply the annotation to the matching constant in the Decompiler output. + However, the constant may be transformed from its value in the original machine instruction during the Decompiler's + analysis. The Decompiler will follow the constant through one of the following simple transformations, but + otherwise the annotation will not be applied.

      @@ -1241,23 +1258,20 @@ Ghidra can create an association between a name and a constant, called an equate. An equate is a descriptive string that is intended to replace the numeric form of the constant, and equates across the entire Program can be viewed from the - Equate Table. + Equates Table.

      An equate can be applied to a machine instruction with a constant operand by using the Set Equate - menu from the Listing Window. If the decompiler successfully follows the operand to a matching constant, - the equate's name is displayed as part of the decompiler's output as well as in the Listing Window. - A transformed operand is displayed as an expression, where the transforming operations are applied to - the equate symbol (representing the original constant). + menu from a Listing window. If the Decompiler successfully follows the operand to a matching constant, + the equate's name is displayed as part of the Decompiler's output as well as in any Listing window. + A transformed operand is displayed as an expression, where the transforming operation is applied to + the equate symbol representing the original constant.

      - Alternately an equate can be applied directly to a constant from the Decompiler Window using its - “Set Equate ...” menu. The constant may or may not have a corresponding instruction - operand but will be displayed in decompiler output using the descriptive string. -

      -

      - + Alternatively, an equate can be applied directly to a constant from a Decompiler window using its + Set Equate... menu. The constant may or may not have a corresponding instruction + operand but will be displayed in Decompiler output using the descriptive string.

    @@ -1265,36 +1279,37 @@ Format Conversions

    - Ghidra can apply a format conversion to integer constants that are displayed - in decompiler output. + Ghidra can apply a format conversion to any integer constant that is displayed + in Decompiler output.

    A conversion can be applied to the machine instruction containing the constant as an operand using the Convert menu option - from the Listing Window. If the decompiler successfully traces the operand to a matching constant, - the format conversion is applied in the decompiler output as well as in the Listing Window. + from a Listing window. If the Decompiler successfully traces the operand to a matching constant, + the format conversion is applied in the Decompiler output as well as in the Listing window.

    - Alternately, a conversion can be applied directly to an integer constant in the - Decompiler Window using its “Convert” menu option. The constant may or may not - have a corresponding instruction operand but is displayed in decompiler output using the conversion. + Alternately, a conversion can be applied directly to an integer constant in a + Decompiler window using its Convert menu option. The constant may or may not + have a corresponding instruction operand but is displayed in Decompiler output using the conversion.

    - Conversions applied by the decompiler are currently limited to: + Conversions applied by the Decompiler are currently limited to:

    • Binary - 0b01100001
    • -
    • Decimal- 97
    • +
    • Decimal - 97
    • Hexadecimal - 0x61
    • Octal - 0141
    • Char - 'a'

    - An appropriate header matching the format is prepended to the representation string, either "0b", "0x" or just - "0". The decompiler will not switch the signedness of the constant but preserves the signed or unsigned data-type - as determined by analysis. + If necessary, a header matching the format is prepended to the representation string, either "0b", "0x" or just + "0". A conversion will not switch the signedness of the constant; the signed or unsigned data-type associated + with the constant, as determined by analysis, is preserved. If the constant is negative, with a signed data-type, + the representation string will always start with a '-' character.

    @@ -1307,15 +1322,15 @@ A register value in this context is a region of code in the Program where a specific register holds a known constant value. Ghidra maintains an explicit list of these values for the Program (see the documentation for Register Values), - which the decompiler can use when analyzing a function. - A register value benefits decompiler analysis, especially if the original compiler was aware - of the constant value, as the decompiler can recover address references calculated as offsets relative to the register + which the Decompiler can use when analyzing a function. + A register value benefits Decompiler analysis, especially if the original compiler was aware + of the constant value, as the Decompiler can recover address references calculated as offsets relative to the register and otherwise propagate the constant.

    - A register value is set by highlighting the region of code in the Listing Window and then invoking the - Set Register Values ... command - from the pop-up menu. The beginning and end of a region is indicated in the Listing Window with + A register value is set by highlighting the region of code in a Listing window and then invoking the + Set Register Values... command + from the pop-up menu. The beginning and end of a region is indicated in a Listing window with assume directives, and regions can be generally viewed from the Register Manager window.

    @@ -1323,8 +1338,8 @@ In order for a particular register value to affect decompilation, the region of code associated with the value must contain the entry point of the function, and of course the function must read from the register. Only the initial reads of the register are replaced with the constant value. - The decompiler will continue to respect later instructions that write to the register (even if the - instruction is inside the register value's region) + The Decompiler will continue to respect later instructions that write to the register, even if the + instruction is inside the register value's region. If a register value's region starts in the middle of a function, decompilation is not affected at all.

    @@ -1333,7 +1348,7 @@ Context Registers

    - There is a special class of registers, called context registers whose + There is a special class of registers called context registers whose values have a different affect on analysis and decompilation than described above.

    @@ -1349,7 +1364,7 @@

    The value in a context register is examined when Ghidra decodes machine instructions from the underlying bytes in the Program. A specific value generally corresponds to a specific execution mode - of the processor. The ARM processor T bit for instance, which selects whether the + of the processor. The ARM processor T bit, for instance, which selects whether the processor is executing ARM or THUMB instructions, is modeled as a context register in Ghidra. The same set of bytes in the Program can be decoded to machine instructions in more than one way, depending on context register values. @@ -1364,17 +1379,17 @@

    If a context register value is changed for a region that has already been disassembled, in order to see the affect of the change, the machine instructions in the region need to be cleared, and disassembly needs - to be triggered again. See the documentation on the - Clear Plugin. + to be triggered again (see the documentation on the + Clear Plugin).

    - Values for a context register are set in the same way as any other register, using the - Set Register Values ... command + Values for a context register are set in the same way as for any other register, using the + Set Register Values... command described above. Within the Register Manager window, - context registers are generally grouped together under the (pseudo-register) heading, contextreg. + context registers are generally grouped together under the contextreg pseudo-register heading. For details about how context registers are used in processor modeling, see - the document "SLEIGH: A Language for Rapid Processor Specification". + the document "SLEIGH: A Language for Rapid Processor Specification."

    Because context registers affect machine instructions, they also affect the underlying p-code and diff --git a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerConcepts.html b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerConcepts.html index b7440a1df2..d1dae701b8 100644 --- a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerConcepts.html +++ b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerConcepts.html @@ -21,10 +21,10 @@

    P-code is Ghidra's Intermediate Representation (IR) language. When analyzing a function, - the decompiler translates every machine instruction into p-code first and performs its - analysis directly on the operators and variables of the language. Output of the decompiler + the Decompiler translates every machine instruction into p-code first and performs its + analysis directly on the operators and variables of the language. Output of the Decompiler is also best understood in terms of p-code. This section presents the key concepts of - p-code. For a more detailed discussion see the document "P-Code Reference Manual". + p-code. For a more detailed discussion see the document "P-Code Reference Manual."

    @@ -54,7 +54,7 @@

    For a p-code model of a specific processor, all elements of the processor state (including RAM, registers, flags, etc.) must be contained in some address space. The model will define multiple address spaces - to accomplish this, and beyond the raw translation of machine instructions to p-code, the decompiler + to accomplish this, and beyond the raw translation of machine instructions to p-code, the Decompiler can add additional spaces. Address space definitions that are common across many different processors include:

    @@ -79,7 +79,7 @@

    A space dedicated to temporary registers. - It is used to hold intermediate values when modeling instruction behavior, and the decompiler + It is used to hold intermediate values when modeling instruction behavior, and the Decompiler uses it to allocate space for variables that don't directly correspond to the low level processor state. The name unique is reserved for this purpose and is present in all processor models. @@ -89,7 +89,7 @@

    A space that represents bytes explicitly indexed through a stack pointer. - This is an example of an address space added by the decompiler beyond what the raw processor + This is an example of an address space added by the Decompiler beyond what the raw processor model defines. The stack space is a logical construction representing the set of bytes a single function might access through its stack pointer. Each stack address represents the offset of a byte in some underlying space (usually ram) relative @@ -135,7 +135,7 @@

    Varnodes by themselves do not necessarily have a data-type associated with them. - The decompiler ultimately assigns a formal data-type, but at the lowest level of p-code, + The Decompiler ultimately assigns a formal data-type, but at the lowest level of p-code, varnodes inherit one the building block data-types from the p-code operations that act on them:

    @@ -262,10 +262,10 @@

    Most opcodes naturally correspond to a particular C operator token, - and in decompiler output, many of the operator tokens displayed correspond - directly to a p-code operation present in the decompiler's internal + and in Decompiler output, many of the operator tokens displayed correspond + directly to a p-code operation present in the Decompiler's internal representation. The biggest exception are the Branching - operations; the decompiler uses standard high-level language control-flow + operations; the Decompiler uses standard high-level language control-flow structures, like if/else, switch, and do/while blocks, instead of the low-level branching operations. But even here, there is some correspondence @@ -669,7 +669,7 @@ P-code Control Flow

    - P-code has natural control-flow, with the subtlety that flow + P-code has natural control flow, with the subtlety that flow happens both within and across machine instructions. Most p-code operators have fall-through semantics, meaning that flow moves to the next operator in the sequence associated with the instruction, or, if the operator is the @@ -680,7 +680,7 @@

    Ghidra labels a machine instruction with one of the following Flow Types that describe - its overall control-flow. The Flow Type is derived directly from the control-flow of the p-code for the instruction, + its overall control flow. The Flow Type is derived directly from the control flow of the p-code for the instruction, with the basic types corresponding directly with a specific branching p-code operator.

    @@ -737,8 +737,8 @@ not specified.

    - The decompiler treats a CALLOTHER operation as a black box. It will keep track of data - flowing into and out of the operation but won't simplify or transform it. In decompiler + The Decompiler treats a CALLOTHER operation as a black box. It will keep track of data + flowing into and out of the operation but won't simplify or transform it. In Decompiler output, a CALLOTHER is usually displayed using its unique name, with functional syntax showing its inputs and output.

    @@ -748,7 +748,7 @@ Callother-Fixup, which is substituted for the CALLOTHER operation during decompilation, or by other Analyzers that use p-code. Callother-Fixups are applied by Ghidra for specific processor or compiler variants, - and a user can choose to apply them to an individual Program. (See “Specification Extensions”) + and a user can choose to apply them to an individual Program (see Specification Extensions).

    @@ -757,10 +757,10 @@ Internal Decompiler Functions

    - Certain p-code operations can show up in decompiler output that cannot be represented + Certain p-code operations can show up in Decompiler output that cannot be represented as either an operator token, a cast operation, or other depiction that is natural to - the language. The decompiler generally tries to eliminate these, but this isn't always - possible. The decompiler resorts to a functional syntax for these kinds + the language. The Decompiler generally tries to eliminate these, but this isn't always + possible. The Decompiler resorts to a functional syntax for these kinds of p-code operations, displaying them as if they were built-in functions for the language.

    @@ -878,7 +878,7 @@

    A HighFunction is the collection of specific information - produced by the decompiler about a function, referring to the root class in the Ghidra + produced by the Decompiler about a function, referring to the root class in the Ghidra source which holds this information. The HighFunction is made up of the following explicit objects:

    @@ -900,11 +900,11 @@

    - The decompiler's output provides a standalone view of the function which is distinct + The Decompiler's output provides a standalone view of the function which is distinct from any annotations about the function that are present in the Program database - and displayed in the Listing view (although the output may be informed by these annotations). + and displayed in the Listing (although the output may be informed by these annotations). The terms HighFunction, HighVariable, and - HighSymbol refer to this decompiler specific view of the function. + HighSymbol refer to this Decompiler specific view of the function.

    @@ -913,7 +913,7 @@

    A HighSymbol is one of the explicit symbols recovered by the - decompiler. It is made up of a name and data-type and can describe either: + Decompiler. It is made up of a name and data-type and can describe either:

      @@ -933,10 +933,10 @@

      An important aspect of HighSymbols is that they are distinct from the standard Ghidra symbols stored in the Program database and are part of - the decompiler's separate view of the function. When the decompiler displays + the Decompiler's separate view of the function. When the Decompiler displays declarations for symbols in its output for instance, it is displaying HighSymbols, which may not directly match up with database symbols. - The decompiler is generally + The Decompiler is generally informed by annotations in the database and may copy specific symbols from the database into its view, but it is generally free to invent new symbols discovered during its analysis. @@ -953,12 +953,12 @@ Varnodes in the Decompiler

    - Varnodes are the central variable concept for the decompiler. - They form the individual nodes in the decompiler's data-flow representation + Varnodes are the central variable concept for the Decompiler. + They form the individual nodes in the Decompiler's data-flow representation of functions and are used during all stages of analysis. During the initial stages of analysis, varnodes simply represent specific storage locations that are accessed - in sequence by individual p-code operations. The decompiler immediately converts - the p-code into a graph based data-flow representation, called Static Single + in sequence by individual p-code operations. The Decompiler immediately converts + the p-code into a graph-based data-flow representation, called Static Single Assignment (SSA) form. In this form, the varnodes take on some additional attributes.

    @@ -984,15 +984,15 @@

    - The scope extends via control-flow to each p-code operation that reads the + The scope extends via control flow to each p-code operation that reads the specific varnode as an operand. The value of the varnode between the defining p-code operation and the reading operations does not change. The scope of a varnode can be thought of as a set - of addresses within the function's body connected by control-flow. The address of the defining + of addresses within the function's body connected by control flow. The address of the defining p-code operation is referred to as the varnode's first use point or first use offset.

    - In the decompiler output for a specific high-level language like C or Java, + In the Decompiler output for a specific high-level language like C or Java, a varnode still has a scope and represents a variable in the high-level language only across this connected region of the code. A set of varnodes, with disjoint scopes, provides a complete @@ -1008,7 +1008,7 @@

    A HighVariable is a set varnodes that, taken together, represent the storage of an entire variable in the high-level language - being output by the decompiler. Each varnode describes where the variable's + being output by the Decompiler. Each varnode describes where the variable's value is stored across some section of code.

    @@ -1038,9 +1038,9 @@

    Merging is the part of the analysis process where - the decompiler decides what varnodes get grouped together to create the final + the Decompiler decides what varnodes get grouped together to create the final HighVariables in the output. Each varnode's scope (see the discussion in - “Varnodes in the Decompiler”) provides the fundamental restriction on this process. + Varnodes in the Decompiler) provides the fundamental restriction on this process. Two varnodes cannot be merged if their scopes intersect. But this leaves a lot of leeway in what varnodes can be merged.

    @@ -1051,14 +1051,14 @@ to as forced merging.

    - The decompiler may also merge varnodes that could just as easily exist as separate + The Decompiler may also merge varnodes that could just as easily exist as separate variables. This is called speculative merging. - In addition to the intersection condition on varnode scopes, the decompiler only - speculatively merges variables that share the same data-type. Beyond this, the decompiler + In addition to the intersection condition on varnode scopes, the Decompiler only + speculatively merges variables that share the same data-type. Beyond this, the Decompiler prioritizes variable pairs that are read and written within the same instruction and - then pairs that are "near" each other in the control-flow of the function. + then pairs that are near each other in the control flow of the function. To a limited extent, users are able to control this kind of merging - (See “Split Out As New Variable”). + (see Split Out As New Variable).

    @@ -1077,12 +1077,12 @@ a calling convention and holds its specific rules and resource details.

    - Prototype models are architecture specific, and depending on the compiler, a single Program may make - use of multiple models. Subsequently, each distinct model has a name like __stdcall or - __thiscall. The decompiler makes use of the prototype model, as assigned to the function by the user or + Prototype models are architecture-specific, and depending on the compiler, a single Program may make + use of multiple models. Subsequently, each distinct model has a name like __stdcall or + __thiscall. The Decompiler makes use of the prototype model, as assigned to the function by the user or discovered in some other way, when performing its analysis of parameters. - It is possible for users to extend the set of prototype models available to a Program, - see “Specification Extensions”. + It is possible for users to extend the set of prototype models available to a Program + (see Specification Extensions).

    A prototype model is typically used as a whole and is assigned by name to individual functions. But some of @@ -1103,12 +1103,60 @@ If the parameter is stored on the stack, the storage location is viewed as a constant offset in the stack space, where the offset is relative to the incoming value of stack pointer - (See the discussion in “Address Space”). + (see the discussion in Address Space).

    - The return value for the function, similarly, is stored at a single memory location. It - is guaranteed to be at that location only at points where the function is exited. There may be multiple exit - points, but they all share the same return value storage location. + The return value for the function, unless it is passed back on the stack, is also stored at a single + memory location. It is guaranteed to be at that location only at points where the function is exited. There may be multiple exit + points, but they all share the same return value storage location. For return values passed back on the stack, compilers + generally implement a special input register to hold the location where the value will be stored. See the + discussion of Auto-Parameters and the __return_storage_ptr__ below. +

    + +
    +

    +Auto-Parameters

    + +

    + Compiled binaries may pass values as parameters between functions that aren't in the formal + list of parameters as defined by the original source code for the program. These are referred to + as auto-parameters or sometimes hidden + parameters within the documentation. If the prototype model requires it, Ghidra will automatically + create an auto-parameter for a function to honor a user's request for a specific formal signature. + See Function Editor Dialog. + Because reverse engineers need to see them, the + Decompiler will generally display auto-parameters explicitly in function prototypes as part of its output, even though + they would not be present in the original source. + Ghidra explicitly defines two auto-parameters: +

    +
    +
    +
    this
    +
    +

    + Within Object Oriented languages, a function defined as a class method + often has a this parameter pointing to an instantiation of the + class' structure data-type. Within Ghidra, functions with the __thiscall + calling convention are automatically assigned a this parameter. + If the function is part of a class namespace and the class has an associated structure, the + this parameter will be a pointer to the structure, otherwise + it will be a pointer to the void data-type. +

    +
    +
    __return_storage_ptr__
    +
    +

    + Most calling conventions allow the value returned by a function, if it is large enough, to be passed back + on the stack instead of in a register. This is usually implemented by having the calling function + pass an additional input parameter that holds a pointer to the location on + the stack where the return value should be stored. Ghidra labels this special parameter as + __return_storage_ptr__, which will be a pointer to the + data-type of the return value. +

    +
    +
    +
    +

    @@ -1122,7 +1170,7 @@ These encompass a calling convention's saved registers, where a calling function can store values it doesn't want to change unexpectedly, but also may include other registers that are known not to change, like the stack pointer. - The decompiler uses the information to determine which locations can be safely propagated across + The Decompiler uses the information to determine which locations can be safely propagated across a called function.

    @@ -1153,7 +1201,7 @@ Disassemble machine instructions from the underlying bytes and
  • - Produce the raw p-code consumed by the decompiler and other analyzers. + Produce the raw p-code consumed by the Decompiler and other analyzers.
  • @@ -1161,8 +1209,8 @@

    Specification files are selected based on the Language Id - assigned to the Program at the time it is imported into Ghidra. - (See Import Program) + assigned to the Program at the time it is imported into Ghidra + (see Import Program).

      @@ -1180,16 +1228,17 @@
    • Processor family
    • Endianess
    • Size of the address bus
    • -
    • Process variant
    • +
    • Processor variant
    • Compiler producing the Program

    - A field with the value 'default' indicates either the preferred processor variant or the preferred compiler. + A field with the value default indicates either the preferred processor variant or the preferred compiler.

    Within the Ghidra installation, specification files are stored based on the overarching - processor family, such as 'MIPS' or 'x86'. For a specific family, files are located under + processor family, such as MIPS or + x86. For a specific family, files are located under

    <Root>/Ghidra/Processors/<Family>/data/languages @@ -1210,7 +1259,7 @@ These are the human readable SLEIGH language files. A single specification is rooted in one of the *.slaspec files, which may recursively include one or more *.sinc files. The format of these files is described - in the document "SLEIGH: A Language for Rapid Processor Specification". + in the document "SLEIGH: A Language for Rapid Processor Specification."

    Compiled SLEIGH files - *.sla
    @@ -1258,7 +1307,7 @@ Changing any of the specification files described here is not recommended. To make additions to either the compiler specification or the processor specification files, see - “Specification Extensions”, which describes a safe and portable way + Specification Extensions, which describes a safe and portable way to add specific elements.

    diff --git a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerIntro.html b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerIntro.html index d7d9961f39..d3ce6937f5 100644 --- a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerIntro.html +++ b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerIntro.html @@ -39,15 +39,15 @@

    The Decompiler is a full Plug-in within Ghidra and can be configured to be enabled or disabled within any particular tool. Default configurations will have the - plug-in enabled, but if its disabled for some reason, it can be enabled from within - a Code Browser by selecting the menu option + plug-in enabled, but if it is disabled for some reason, it can be enabled from within + a Code Browser by selecting the

    - File -> Configure + File -> Configure...

    - Then click on the Configure link under the - Ghidra Core section and check the box next to + menu option, then clicking on the Configure link under the + Ghidra Core section and checking the box next to DecompilePlugin.

    @@ -77,8 +77,8 @@

    The window automatically decompiles and displays the function at the - current address. The address is set typically by left-clicking in the Listing window, - or invoking the Goto command (pressing the 'g' key) and manually entering + current address. The address is set typically by left-clicking in the Listing, + or invoking the Go To... command (pressing the 'G' key) and manually entering the address or some other label, but the Decompiler window follows any type of navigation in the Code Browser, triggering decompilation of the new function being displayed. @@ -108,7 +108,7 @@ Capabilities

    - Some of the primary capabilities of the decompiler include: + Some of the primary capabilities of the Decompiler include:

    @@ -116,7 +116,7 @@
    Recovering Expressions

    - The decompiler does full data-flow analysis which allows it to + The Decompiler does full data-flow analysis which allows it to perform slicing on functions: complicated expressions, which have been split into distinct operations/instructions and then mixed together with other instructions by the compiling/optimizing process, are @@ -126,7 +126,7 @@

    Recovering High-Level Scoped Variables

    - The decompiler understands how compilers + The Decompiler understands how compilers use processor stacks and registers to implement variables with different scopes within a function. Data-flow analysis allows it to follow what was originally a single variable as it moves from @@ -139,7 +139,7 @@

    Recovering Function Parameters

    - The decompiler understands the parameter passing conventions of + The Decompiler understands the parameter-passing conventions of the compiler and can reconstruct the original form of function calls.

    @@ -147,7 +147,7 @@
    Using Data-type, Name, and Signature Annotations

    - The decompiler automatically pulls in + The Decompiler automatically pulls in all the different data types and variable names that the user has applied to functions, and the C output is altered to reflect this. High-level variables are appropriately named, structure @@ -159,14 +159,14 @@

    Propagating Local Data-types

    - The decompiler infers the data-type of unlabeled variables + The Decompiler infers the data-type of unlabeled variables by propagating information from other sources throughout a function.

    Recovering Structure Definitions

    - The decompiler can be used to create structures that match the usage + The Decompiler can be used to create structures that match the usage pattern of particular functions and variables, automatically discovering component offsets and data-types.

    diff --git a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerOptions.html b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerOptions.html index f7e045916f..8f0a6ba0ff 100644 --- a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerOptions.html +++ b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerOptions.html @@ -17,26 +17,36 @@

    - This lists configuration options that explicitly affect the behavior of the decompiler or - its output, independent of the code that is being decompiled. The bulk of these are + This page lists configuration options that explicitly affect the behavior of the Decompiler or + its output, independent of the code that is being decompiled. The bulk of these options are accessible by selecting the Code Browser menu

    - Edit -> Tool Options + Edit -> Tool Options...

    - and then picking the Decompiler sub-folder. These options are associated - with the particular tool (Code Browser) being used and will apply to decompilation of any Program - being analyzed by that tool. The three categories of options are: + and then picking the Decompiler folder. The options are associated + with the particular Code Browser or other tool being used and will apply to decompilation of any Program + being analyzed by that tool. There are three categories of options, which are listed by clicking either on the + Decompiler folder or one of its two subsections.

    -

    @@ -46,13 +56,13 @@ selecting the Code Browser menu

    - Edit -> Options for <Program> + Edit -> Options for <program>....

    - Picking the Decompiler tab shows “Program Options” - that only affect the decompiler. Picking the “Specification Extensions” tab + Picking the Decompiler section shows Program Options + that only affect the Decompiler. Picking the Specification Extensions section shows a table of the available prototype models, call-fixups, and callother-fixups. These - affect more than just the decompiler but are also documented here. + affect more than just the Decompiler but are also documented here.

    @@ -60,7 +70,7 @@ General Options

    - These options govern what resources are available to the Plug-in and the decompiler engine but do + These options govern what resources are available to the Plug-in and the Decompiler engine but do not affect how analysis is performed or results are displayed.

    @@ -72,7 +82,7 @@

    - Decompilation results for a single function can be compute intensive to produce. + Producing decompilation results for a single function can be computationally intensive. This option specifies the number of functions whose decompilation results can be cached simultaneously. When navigating to a function that has been recently cached, as when navigating back and forth between a few functions, @@ -84,7 +94,7 @@

    - This is a limit on the number of bytes that can be produced by the decompiler process as output + This option limits the number of bytes that can be produced by the Decompiler process as output when decompiling a single function. A payload includes the actual characters to be displayed in the window, additional token markup, symbol information, and other details of the underlying syntax tree. The limit is specified in megabytes of data. If the limit is exceeded for a single @@ -97,11 +107,11 @@

    - This option sets an upper limit on the number of seconds the decompiler spends attempting + This option sets an upper limit on the number of seconds the Decompiler spends attempting to analyze one function before aborting. - It is currently not enforced for the Decompilation - Window. Instead it applies to the DecompilerSwitchAnalyzer, the analyzeHeadless command, scripts, or other - plug-ins that make use of the decompiler service. + It is currently not enforced for a Decompiler + window. Instead it applies to the DecompilerSwitchAnalyzer, the analyzeHeadless command, scripts, and other + plug-ins that make use of the Decompiler service.

    @@ -109,10 +119,10 @@

    - This option sets a maximum number of machine instructions that the decompiler will attempt + This option sets a maximum number of machine instructions that the Decompiler will attempt to analyze for a single function, as a safeguard against analyzing a long sequence - of zeroes or other constant data. The decompiler will quickly throw an exception if it - traces control-flow into more than the indicated number of instructions. + of zeroes or other constant data. The Decompiler will quickly throw an exception if it + traces control flow into more than the indicated number of instructions.

    @@ -126,7 +136,7 @@ Analysis Options

    - These options directly affect how the decompiler performs its analysis, either by + These options directly affect how the Decompiler performs its analysis, either by toggling specific analysis passes or changing how it treats various annotations.

    @@ -138,16 +148,16 @@

    - When deciding if an individual stack location has become dead, the decompiler + When deciding if an individual stack location has become dead, the Decompiler must consider aliases, pointers onto the stack that could - be used to modify the location within a called function. One strong heuristic the decompiler - uses is; if the user has explicitly created a variable on the stack between the + be used to modify the location within a called function. One strong heuristic the Decompiler + uses is: if the user has explicitly created a variable on the stack between the base location referenced by the pointer and the individual stack location, then - the decompiler can assume that the pointer is not an alias of the stack location. + the Decompiler can assume that the pointer is not an alias of the stack location. The alias is blocked by the explicit variable. However, if the user's explicit variable is labeling something that isn't - really an explicit variable, like a field within a larger structure for instance, - the decompiler may incorrectly consider the stack location as dead and start removing + really an explicit variable, like a field within a larger structure, for instance, + the Decompiler may incorrectly consider the stack location as dead and start removing live code.

    @@ -157,12 +167,13 @@

    • -None - No data-type is considered blocking.
    • +None - No data-type is considered blocking
    • -Structures - Only structured data-types are blocking.
    • -
    • Structures and Arrays
    • +Structures - Only structures are blocking
    • -All Data-types - All data-types are blocking.
    • +Structures and Arrays - Only structures and arrays are blocking +
    • +All Data-types - All data-types are blocking

    @@ -172,11 +183,58 @@

    +Eliminate unreachable code +
    +
    +

    + When toggled on, the Decompiler eliminates code that it + considers unreachable. This usually happens when, due to constant propagation and other + analysis, the Decompiler decides that a boolean value controlling a conditional branch can + only take one possible value and removes the branch corresponding to the other value. Toggling + this to off lets the user see the dead code, which is typically demarcated + by the control-flow structure: +

    +
    + if (false) { ... } +
    +

    +

    +
    +
    +Ignore unimplemented instructions +
    +
    +

    + When toggled on, the Decompiler treats instructions whose semantics + have been formally marked unimplemented as if they do + nothing (no operation). Crucially, control flow falls through to the next instruction. + In this case, the Decompiler inserts the warning "Control flow ignored unimplemented + instructions" as a comment in the function header, but the exact point at which + instruction was ignored may not be clear. + If this option is toggled off, the Decompiler inserts the built-in + function halt_unimplemented() at the point of the unimplemented instruction, and + control flow does not fall through. +

    +
    +
    +Infer constant pointers +
    +
    +

    + When toggled on, the Decompiler infers a data-type for constants + it determines are likely pointers. In the basic heuristic, + each constant is considered as an address, and if that address starts a known data or function element + in the program, the constant is assumed to be a pointer. The constants are treated like + any other source of data-type information, and the inferred data-types are freely propagated by + the Decompiler to other parts of the function. +

    +
    +
    Recover -for- loops

    - When this is toggle on, the decompiler attempts to pinpoint + When toggled on, the Decompiler attempts to pinpoint variables that control the iteration over specific loops in the function body. When these loop variables are discovered, the loop is rendered using a standard for loop header @@ -188,72 +246,30 @@

    - If the toggle is off, the loop is displayed using + When toggled off, the loop is displayed using while syntax, with any initializer and iterating statements mixed in with the loop body or preceding basic blocks.

    -Eliminate unreachable code -
    -
    -

    - When this is toggled on, the decompiler eliminates code that it - considers unreachable. This usually happens when, due to constant propagation and other - analysis, the decompiler decides that a boolean value controlling a conditional branch can - only take one possible value and removes the branch corresponding to the other value. Toggling - this to off lets the user see the dead code, which is typically demarcated - by the control-flow structure -- if (false) { ... }. -

    -
    -
    -Ignore unimplemented instructions -
    -
    -

    - When toggled on, the decompiler treats instructions whose semantics - have been formally marked unimplemented as if they do - nothing (no operation). Crucially, control-flow falls through to the next instruction. - In this case, the decompiler inserts the warning "Control flow ignored unimplemented - instructions" as a comment in the function header, but the exact point at which - instruction was ignored may not be clear. - If this option is toggled off, the decompiler inserts the built-in - function halt_unimplemented() at the point of the unimplemented instruction, and - control-flow does not fall through. -

    -
    -
    -Infer constant pointers -
    -
    -

    - When toggled on, the decompiler infers a data-type for constants - it determines are likely pointers. In the basic heuristic, - each constant is considered as an address, and if that address starts a known data or function element - in the program, the constant is assumed to be a pointer. The constants are treated like - any other source of data-type information, and the inferred data-types are freely propagated by - the decompiler to other parts of the function. -

    -
    -
    Respect read-only flags

    - When toggled on, the decompiler treats any values in memory + When toggled on, the Decompiler treats any values in memory marked read-only as constant. If a read-only memory location is explicitly referenced by the function being decompiled, it is considered to be unchanging, and the initial - value present in the Program is pulled in to the data-flow of the function as a constant. + value present in the Program is pulled into the data flow of the function as a constant. Due to Constant Propagation and other transformations, read-only memory - can have a large effect on decompiler output. + can have a large effect on Decompiler output.

    - Typically as part of the import process, Ghidra marks memory blocks as read-only if they + Typically, as part of the import process, Ghidra marks memory blocks as read-only if they are tagged as such by a section header or other meta-data in the original binary. Users can actively set whether specific memory regions are considered read-only through the Memory Manager, and individual data elements can be marked as constant via the Mutability setting - (See “Data Mutability”). + (see Data Mutability).

    @@ -261,7 +277,7 @@

    - This toggles whether the decompiler attempts to simplify double precision arithmetic operations, + This toggles whether the Decompiler attempts to simplify double precision arithmetic operations, where a single logical operation is split into two parts, calculating the high and low pieces of the result in separate instructions. Decompiler support for this kind of transform is currently limited, and only certain constructions are simplified. @@ -272,10 +288,10 @@

    - When this option is active, the decompiler simplifies code sequences containing + When this option is active, the Decompiler simplifies code sequences containing predicated instructions. A predicated instruction is executed conditionally based on a boolean value, the predicate, - and a sequence of instructions can share the same predicate. The decompiler merges the + and a sequence of instructions can share the same predicate. The Decompiler merges the resulting if/else blocks that share the same predicate so that the condition is only printed once.

    @@ -285,7 +301,7 @@

    - When toggled on, the decompiler employs in-place assignment operators, + When toggled on, the Decompiler employs in-place assignment operators, such as += and <<=, in its output syntax.

    @@ -300,7 +316,7 @@ Display Options

    - These options do not change the decompiler's analysis but only affect how the results are presented. + These options do not change the Decompiler's analysis but only affect how the results are presented.

    @@ -315,11 +331,21 @@

    +Color Default +
    +
    +

    + Assign the color to any characters emitted by the Decompiler that do not fall into one of token types + listed below. This includes delimiter characters like commas and parentheses as well as various operator + characters. +

    +
    +
    Color for <token>

    - Assign colors to the different types of language tokens emitted by the decompiler. + Assign colors to the different types of language tokens emitted by the Decompiler. These include:

    @@ -344,21 +370,11 @@

    -Color Default -
    -
    -

    - Assign the color to any characters emitted by the decompiler that do not fall into one of token types - listed above. This includes delimiter characters like commas and parentheses as well as various operator - characters. -

    -
    -
    Color for Current Variable Highlight

    - Assign the background color used to highlight the token currently under the cursor in a Decompiler Window. + Assign the background color used to highlight the token currently under the cursor in a Decompiler window.

    @@ -366,8 +382,8 @@

    - Assign the background color used to highlight characters matching the current Find pattern. - See “Find ...”. + Assign the background color used to highlight characters matching the current Find pattern + (see Find...).

    @@ -375,7 +391,7 @@

    - Set the number of characters that comment lines are indented within decompiler output. This applies only + Set the number of characters that comment lines are indented within Decompiler output. This applies only to comments within the body of the function being displayed. Comments at the head of the function are not indented.

    @@ -385,7 +401,7 @@

    - Set the language syntax used to delimit comments emitted as part of decompiler output. For C and Java, + Set the language syntax used to delimit comments emitted as part of Decompiler output. For C and Java, the choices are /* C style comments */ and // C++ style comments.

    @@ -394,7 +410,7 @@

    - Set whether the syntax for type casts is emitted in decompiler output. + Set whether the syntax for type casts is emitted in Decompiler output. If this is toggled on, type cast syntax is never displayed, even when rules of the language require it. So individual statements may no longer be formally accurate.

    @@ -404,28 +420,29 @@

    - Set whether a specific kind of comment can be incorporated into decompiler output. Comments in - Ghidra are categorized based on their placement within the Listing Window, and the decompiler - in general tries to display comments where appropriate. See the discussion in “Comments”. - Each kind of comment has its own toggle and can be individually included or excluded from decompiler output. + Set whether a specific type of comment should be incorporated into Decompiler output. + Each type has its own toggle and can be individually included or excluded from Decompiler output.

    • - PLATE - Whether plate comments within the body of the function are displayed + EOL
    • - PRE + PLATE - Whether plate comments within the body of the function are displayed
    • POST
    • - EOL + PRE

    + A comment's type indicates how it is placed within a Listing window, not how it is placed in + a Decompiler window. All comments within the body of the function are displayed in the same way + by the decompiler, regardless of their type (see the discussion in Comments).

    @@ -433,9 +450,9 @@

    - Toggle whether the decompiler emits comments at the head (before the beginning) of a function. + Toggle whether the Decompiler emits comments at the head (before the beginning) of a function. The header is built from Plate comments placed at the entry point of the - function. See the discussion in “Comments”. + function (see the discussion in Comments). The inclusion of other Plate comments is controlled by the Display PLATE comments toggle, described above.

    @@ -444,11 +461,11 @@

    - Toggle whether line numbers are displayed in any Decompiler Window. If toggled - on, each Decompiler Window reserves space to display a numbers down the left - side of the window, labeling each line of output produced by the decompiler. + Toggle whether line numbers are displayed in any Decompiler window. If toggled + on, each Decompiler window reserves space to display a numbers down the left + side of the window, labeling each line of output produced by the Decompiler. Line numbers are associated with the window itself and are not formally part of - the decompiler's output. + the Decompiler's output.

    @@ -456,20 +473,20 @@

    - Control how the decompiler displays namespace information associated + Control how the Decompiler displays namespace information associated with function and variable symbols. The possible settings are:

      +
    • + Minimally - Display the minimal path that distinguishes the symbol +
    • Always - Always display the entire namespace path
    • Never - Never display the namespace path
    • -
    • - Minimally - Display the minimal path that distinguishes the symbol -

    @@ -489,9 +506,9 @@

    - Toggle whether decompiler generated WARNING comments are displayed as part - of the output. The decompiler generates these comments, independent of those laid down by users, to - indicate unusual conditions or possible errors (See “Warning Comments”). + Toggle whether Decompiler generated WARNING comments are displayed as part + of the output. The Decompiler generates these comments, independent of those laid down by users, to + indicate unusual conditions or possible errors (see Warning Comments).

    @@ -499,8 +516,8 @@

    - Set the typeface used to render characters in any Decompiler Window. Indentation is generally clearer - using a monospaced (fixed width) font, but any font available to the system can be used. The size of + Set the typeface used to render characters in any Decompiler window. Indentation is generally clearer + using a monospaced (fixed-width) font, but any font available to Ghidra can be used. The size of the font can also be controlled from this option.

    @@ -509,19 +526,19 @@

    - Set how integer constants are formatted in the decompiler output. + Set how integer constants are formatted in the Decompiler output. The possible settings are:

    • - Best Fit - Select the most natural representation + Force Hexadecimal - Always use a hexadecimal representation
    • Force Decimal - Always use a decimal representation
    • - Force Hexadecimal - Always use a hexadecimal representation + Best Fit - Select the most natural representation
    @@ -536,8 +553,8 @@

    - Set the maximum number of characters in a line of code emitted by the decompiler before a line break - is forced. The decompiler will not split an individual token across lines. So line breaks frequently + Set the maximum number of characters in a line of code emitted by the Decompiler before a line break + is forced. The Decompiler will not split an individual token across lines. So line breaks frequently will come before the maximum number of characters is reached, and technically a single token can extend the line beyond the maximum.

    @@ -548,9 +565,9 @@

    Set the amount of indenting used to print statements within a nested scope in the - decompiler output. Each level of nesting (for function bodies, + Decompiler output. Each level of nesting (function bodies, loop bodies, if/else bodies, etc.) - bodies adds this number characters. + adds this number characters.

    @@ -558,10 +575,10 @@

    - Set how null pointers are displayed in decompiler output. If this is toggled - on, the decompiler will print a constant pointer value of zero (a null pointer) + Set how null pointers are displayed in Decompiler output. If this is toggled + on, the Decompiler will print a constant pointer value of zero (a null pointer) using the special token NULL. Otherwise the pointer value is represented with the '0' character, - which is then type cast into a pointer. + which is then cast to a pointer.

    @@ -570,10 +587,10 @@

    Set whether the calling convention is printed as part of the function - declaration in decompiler output. If this option is turned on, the name of the calling convention + declaration in Decompiler output. If this option is turned on, the name of the calling convention is printed just prior to the return value data-type within the function declaration. All functions in Ghidra have an associated calling convention (or prototype model) that is used during - decompiler analysis. See the discussion in “Prototype Model”. + Decompiler analysis (see the discussion in Prototype Model).

    @@ -587,7 +604,7 @@ Program Options

    - Changes to these options affect only the decompiler and only for + Changes to these options affect only the Decompiler and only for the current Program being analyzed.

    @@ -600,9 +617,9 @@

    Sets the calling convention (prototype model) used when decompiling a function where - the convention is not known (i.e. marked as "unknown"). Many architectures have multiple - calling conventions, __stdcall, __thiscall etc. See the - discussion in “Prototype Model”. + the convention is not known; i.e., marked as unknown. Many architectures have multiple + calling conventions, __stdcall, __thiscall, etc. + (see the discussion in Prototype Model).

    @@ -615,25 +632,25 @@ Specification Extensions

    - This tab displays elements from the Program's compiler specification and + This entry displays elements from the Program's compiler specification and processor specification and allows the user to add or remove extensions, including prototype models, call-fixups, and callother-fixups.

    Every program has a core set of specification elements, - loaded from the “SLEIGH Specification Files”, that cannot + loaded from the SLEIGH Specification Files, that cannot be modified or removed. Extensions, however, can be added to this core specification. Any extension imported from this dialog is directly associated with the active Program and is stored permanently with it.

    - Users can change or reimport an extension, if new information points to a better definition. + Users can change or reimport an extension if new information points to a better definition. Users have full control over an extension, and unlike a core element, can tailor it specifically to the Program.

    - This options tab presents a table of all specification elements. + This options entry presents a table of all specification elements. Each element, whether core or an extension, is displayed on a separate row with three columns:

    @@ -648,7 +665,7 @@

    The core elements of the specification have a blank Status column, and any extension - is labeled either as "extension" or "override". + is labeled either as extension or override.

    @@ -665,14 +682,14 @@
    prototype

    - This element is a “Prototype Model” that holds a specific named set - of parameter passing details. It - can be applied to individual functions by name, typically via the "Calling Convention" menu + This element is a named Prototype Model that holds a specific set + of parameter-passing details. It + can be applied to individual functions by name, typically via the Calling Convention menu in the Function Editor Dialog. - See the documentation on “Function Prototypes” for how they affect decompilation. + See the documentation on Function Prototypes for how they affect decompilation.

    - The XML tag, <prototype> always has a name attribute + The XML <prototype> tag always has a name attribute that defines the formal name of the prototype model, which must be unique across all models.

    @@ -693,7 +710,7 @@
     	    

    This element is a Call-fixup, which can be used to substitute a specific p-code sequence for CALL instructions during decompilation, as described in - “Function Prototypes”. + Function Prototypes.

    The <callfixup> tag has a name @@ -719,7 +736,7 @@

    This element is a Callother-fixup, which can be used to substitute a specific p-code sequence for CALLOTHER p-code operations. A CALLOTHER - is a black-box, or unspecified p-code operation, see “User-defined P-code Operations - CALLOTHER”. + is a black-box, or unspecified p-code operation (see User-defined P-code Operations - CALLOTHER).

    The <callotherfixup> tag has a @@ -769,7 +786,7 @@

    extension

    - Indicates that the element is a program specific extension that has been + Indicates that the element is a program-specific extension that has been added to the specification.

    @@ -778,7 +795,7 @@

    Indicates that the element, which must be a callotherfixup, is an extension that overrides a core element with the same target. The extension - effectively replaces the p-code injection of the core element with a user supplied one. + effectively replaces the p-code injection of the core element with a user-supplied one. If this type of extension is later removed, the core element becomes active again.

    @@ -789,7 +806,7 @@

    If the user has either imported additional extensions or selected an extension for removal but has not yet clicked the Apply button in the Options dialog, the Status column - may show one of the following values, indicating a pending change. + may show one of the following values, indicating a pending change:

    @@ -830,7 +847,7 @@

    The Import button at the bottom of the - "Specification Extensions" pane allows the user to import one of the + Specification Extensions pane allows the user to import one of the three element types, prototype, callfixup, or callotherfixup, into the program as a new extension. @@ -850,8 +867,8 @@ The XML file describing the extension must have one of the tags, <prototype>, <callfixup>, or <callotherfixup>, as its single root element. Users can find numerous examples within the compiler - and processor specification files that come as part of Ghidra's installation. - See “SLEIGH Specification Files”. + and processor specification files that come as part of Ghidra's installation + (see SLEIGH Specification Files).

    In the case of prototype and callfixup @@ -868,7 +885,7 @@ Removing an Extension

    - The Remove button at the bottom of the "Specification Extensions" pane allows + The Remove button at the bottom of the Specification Extensions pane allows the user to remove a previously installed extension. A row from the table is selected first, which must have a Status of extension or override. Core elements of the specification cannot be removed. diff --git a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerWindow.html b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerWindow.html index 8fa2a9fa0c..e2aac0be72 100644 --- a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerWindow.html +++ b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/DecompilerWindow.html @@ -14,21 +14,21 @@ Decompiler Window

    - To display the decompiler window, position the cursor on a + To display the Decompiler window, position the cursor on a function in the Code Browser, then select the - icon from the tool bar, or the +  icon from the tool bar, or the Decompile option from the Window menu in the tool.

    - A decompiler window always displays one function at a time. + A Decompiler window always displays one function at a time. The initial window that comes up in the Code Browser is called the Main - window (See “Main Window”), and it automatically decompiles and displays the function at the + window (see Main Window), and it automatically decompiles and displays the function at the current address, following the user's navigation. Other Snapshot windows can also - be opened that show different functions at the same time (See “Snapshot Windows”). But any window + be opened that show different functions at the same time (see Snapshot Windows). But any window only shows one function at a time.

    @@ -79,7 +79,7 @@ one or more warnings during the process. These warnings are integrated into the output as source code comments starting with label WARNING:. They occur either at the beginning of the function as part of the function header or at the point in the code directly - associated with the warning. (See “Warning Comments”) + associated with the warning (see Warning Comments).

    @@ -88,29 +88,29 @@ Main Window

    - Initially pushing + Initially pressing - or selecting +  or selecting Decompile from the Window menu in the tool brings up the main window. The main window always displays the function at the current address within the Code Browser and follows as the user navigates within the Program. Any mouse click, menu option, or other action causing the cursor to move to a new address in the Listing also causes the main window to display the function containing that address. Navigation to new functions is also possible from within the window by double-clicking on function - tokens (See “Mouse Actions”). + tokens (see Mouse Actions).

    -Cross Highlighting

    +Cross-Highlighting

    The main window maintains a map between the individual variable and operator tokens displayed in - the window and the machine instructions which correspond to them. This can give the user instant - feedback about the correspondence between the decompiler and disassembly views of the function, - and it is frequently useful to have both the Listing window and the Decompiler window side - by side. Clicking on tokens in the Decompiler window causes the Listing window to navigate + the window and the machine instructions which correspond to them. Disassembled machine instructions + are displayed by any Listing window, and having both a Listing and Decompiler window side by side + lets the user see this correspondence between the decompiled and disassembled views of the function. + Clicking on tokens in the Decompiler window causes the Listing window to navigate to the corresponding instruction, and clicking instructions in the Listing window causes the Decompiler window to navigate to the corresponding line. Highlighting a region of code in either window causes the corresponding region in the other window to be highlighted. @@ -135,7 +135,7 @@

    - In general, the map between machine instructions and tokens is not one to one because the decompiler + In general, the map between machine instructions and tokens is not one-to-one because the Decompiler transforms its underlying representation of the function. An instruction may no longer have any operator that corresponds to it in the decompiled result. Tokens may be transformed from the natural operation of the machine instruction they are associated @@ -153,28 +153,27 @@ Pressing the - icon - in another Decompiler window's toolbar causes a Snapshot window - to be created, which initially shows decompilation of the same function. Multiple - Snapshot windows can be brought up to show decompilation of different functions +  icon + in any Decompiler window's toolbar causes a Snapshot window + to be created, which shows decompilation of the same function. + Unlike the main window however, the Snapshot window + does not change the function it displays in response to external navigation events. + A Snapshot window can be used to hold a function fixed while the user navigates to + different functions in Listing or other windows. +

    +

    + Multiple Snapshot windows can be brought up to show decompilation of different functions simultaneously. Snapshot windows are visually distinguished from the main Decompiler window by their colored outline.

    -

    - The Snapshot - window, unlike the main window, is not linked to the Listing window - and does not change the function it displays in response to external navigation events. - A Snapshot window can be used to hold a function fixed while the user navigates to - different functions in the Listing or other windows. -

    Navigating to new functions within a Snapshot window is possible when the window is active. The window responds to the actions

      -
    • Go To ... (pressing the 'g' key) +
    • Go To... (pressing the 'G' key)
    • Go to previous location (Back)
    • @@ -184,6 +183,10 @@

    +

    + Double-clicking on specific tokens within the Snapshot window may also cause it to navigate + to a new location (see Double-Click). +

    @@ -193,7 +196,7 @@

    If the current location within the Code Browser is in disassembled code, but that code is not contained in a Formal Function Body, - then the decompiler window invents a function body on the fly called an + then the Decompiler invents a function body, on the fly, called an Undefined Function. The background color of the window is changed to gray to indicate this special state.

    @@ -202,21 +205,21 @@

    The entry point address of the Undefined Function is chosen by - backtracking through the code's control-flow from the current location to the start of + backtracking through the code's control flow from the current location to the start of a basic block that has no flow coming in except possibly from call instructions. During decompilation, a function body is computed from the selected entry point (as with any function) - based on control-flow up to instructions with terminator semantics. + based on control flow up to instructions with terminator semantics.

    - The current address, as indicated by the cursor in the Listing Window for instance, is - generally not the entry of the invented function, but the current address will be + The current address, as indicated by the cursor in the Listing for instance, is + generally not the entry point of the invented function, but the current address will be contained somewhere in the body.

    For display purposes in the window, the invented function is given a name based on the computed entry point address with the prefix UndefinedFunction. The function is assigned the default calling convention, and parameters are discovered as part of - the decompiler's analysis. + the Decompiler's analysis.

    @@ -225,9 +228,9 @@ Tool Bar

    - This is a group of actions that can be triggered by pressing a button in the tool/title - bar at the top of individual decompiler windows, both main and - Snapshot. The action applies to the function and decompiler results + The following actions are available by pressing the corresponding icon in the title/tool + bar at the top of each individual Decompiler window. + The action applies to the function and Decompiler results displayed in that particular window.

    @@ -237,7 +240,7 @@

    - - button +  - button

    Exports the decompiled result of the current function to a file. A file chooser @@ -248,7 +251,7 @@

    This action exports a single function at a time. The user can export all functions simultaneously from the Code Browser, by selecting the menu - File -> Export Program ... and then choosing + File -> Export Program... and then choosing C/C++ from the drop-down menu. See the full documentation for the Export dialog. @@ -262,13 +265,13 @@

    - - button +  - button

    Creates a new Snapshot window. The Snapshot window - initially displays the same function as the decompiler window on which the action was triggered, - but if that window navigates to other functions, the Snapshot does not - follow and continues to display the original function. (See “Snapshot Windows”) + displays the same function as the Decompiler window on which the action was triggered, + and if that window navigates to other functions, the Snapshot does not + follow but continues to display the original function (see Snapshot Windows).

    @@ -279,7 +282,7 @@

    - - button +  - button

    Triggers a re-decompilation of the current function displayed in the window. @@ -293,8 +296,8 @@

    This action is not necessary for normal reverse engineering tasks. Re-decompilation is automatically triggered for all - decompiler windows by any change to the Program, so the most up-to-date decompilation is - always available to the user without this action. This action is a primarily a debugging + Decompiler windows by any change to the Program, so the most up-to-date decompilation is + always available to the user without this action. This action is primarily a debugging aid for plug-in developers.
    @@ -307,10 +310,10 @@

    - - button +  - button

    - Copies the currently selected text in the decompiler window to the clipboard. + Copies the currently selected text in the Decompiler window to the clipboard.

    @@ -319,7 +322,7 @@ Debug Function Decompilation

    - This action is located in the drop-down menu on the right side of the decompiler + This action is located in the drop-down menu on the right side of the Decompiler window tool/title bar.

    @@ -327,7 +330,7 @@ the current function is collected and saved to an output file in XML format. A file chooser dialog is presented to the user to choose the output file. The file is useful when submitting bug reports - about the decompiler as it is generally much smaller than + about the Decompiler as it is generally much smaller than the entire Program and only contains information specific to the function. Information is generated by performing the full decompilation of the function and collecting all the data and @@ -342,18 +345,13 @@ Graph AST Control Flow

    - Generate a control-flow graph based upon the results in the active Decompiler Window, + This action is located in the drop-down menu on the right side of the Decompiler + window tool/title bar. +

    +

    + Generate a control-flow graph based upon the results in the active Decompiler window, and render it using the current Graph Service.

    -
    - - - - - -
    [Warning]
    - If no Graph Service is available then this action will not be present. -
    @@ -364,15 +362,15 @@

    -Left Click

    +Left-Click

    - Moves the decompiler window cursor and highlights the token. Within the + Moves the Decompiler window cursor and highlights the token. Within the main window, if a token has a machine address - associated with it, a left click generates a + associated with it, a left-click generates a navigation event to that address, which may cause other - windows to display code near that address. - (See “Cross Highlighting”) + windows to display code near that address + (see Cross-Highlighting).

    Selecting a '(' or ')' token causes it and its matching parenthesis to be @@ -385,55 +383,60 @@

    -Right Click

    +Right-Click

    - Moves the decompiler window cursor, highlights the token, and brings up the menu of - context sensitive actions. Any highlighting and navigation is identical to a - left click. The menu actions presented depend primarily on the token type and + Moves the Decompiler window cursor, highlights the token, and brings up the menu of + context-sensitive actions. Any highlighting and navigation is identical to a + left-click. The menu actions presented depend primarily on the token type and are tailored to the context at that point in the code.

    -Double Click

    +Double-Click

    - Navigates based on the selected symbol or other token (See below). If the selected token represents a formal symbol, - such as a function name or a global variable, double clicking causes a + Navigates based on the selected symbol or other token (see below). + If the selected token represents a formal symbol, + such as a function name or a global variable, double-clicking causes a navigation event to the address associated with the symbol. +

    +

    + This action is performed by clicking twice on the desired token with the left + mouse button.

    Function Symbols

    - Double clicking a called function name causes the + Double-clicking a called function name causes the window itself to navigate away from its current function to the called function, triggering a new decompilation if necessary and changing its display.

    Global Variables

    - Double clicking a global - variable name does not have any effect on the decompiler window itself, - but other windows, like the Listing window, may navigate to the + Double-clicking a global + variable name does not have any effect on the Decompiler window itself, + but Listing or other windows may navigate to the storage address of the global variable.

    Constants

    - Double clicking a token representing a constant causes the constant to be treated - as an address, and a navigation event to that address is generated. The decompiler + Double-clicking a token representing a constant causes the constant to be treated + as an address, and a navigation event to that address is generated. The Decompiler window itself navigates depending again on whether the address represents a new function or not.

    Labels

    - Double clicking the label within a goto statement causes the window to navigate + Double-clicking the label within a goto statement causes the window to navigate to the target of the goto, within the function. The cursor is set and the window view is adjusted if necessary to ensure that the target is visible.

    Braces

    - Double clicking a '{' or '}' token, causes the window to navigate to the matching brace + Double-clicking a '{' or '}' token, causes the window to navigate to the matching brace within the window. The cursor is set and the window view is adjusted if necessary to ensure that the matching brace is visible.

    @@ -445,37 +448,45 @@

    -Control Double Click

    +Ctrl-Double-Click

    Opens a new Snapshot window, navigating it to the selected symbol. This is a convenience for immediately decompiling and displaying a called function in a new window, without disturbing the active window. The behavior is similar to the - Double Click action, the selected token must represent a function name symbol or possibly + Double-Click action, the selected token must represent a function name symbol or possibly a constant address, but the navigation occurs in the new Snapshot window.

    +

    + This action is performed by clicking twice on the desired token with the left + mouse button, while holding down the Ctrl key. +

    -Control Shift Click

    +Ctrl-Shift-Click

    Generates a navigation event to the address, within the current function, associated with the clicked token. This allows Snapshot windows to do basic - cross-highlighting in the same way as the main decompiler window. - A Control double-click causes the Listing and other windows to navigate to and display the same - portion of code currently being displayed in the Snapshot window. (See “Cross Highlighting”) + cross-highlighting in the same way as the main Decompiler window. + A ctrl-shift-click causes Listing and other windows to navigate to and display the same + portion of code currently being displayed in the Snapshot window (see Cross-Highlighting). +

    +

    + This action is performed by clicking on the desired token with the left mouse + button, while holding down both the Ctrl and Shift keys.

    -Middle Click

    +Middle-Click

    - Highlights every occurrence of a variable, constant, or operator under the current - cursor location, within the decompiler window. + Highlights every occurrence of a variable, constant, or operator represented by the selected + token, within the Decompiler window.

    @@ -487,8 +498,8 @@

    All the actions described in this section can be activated from the menu that pops up - when right-clicking on a token within the decompiler window. The pop-up menu is context sensitive and - the type of token in particular (See “Display”) determines what actions are available. + when right-clicking on a token within the Decompiler window. The pop-up menu is context sensitive and + the type of token in particular (see Display) determines what actions are available. The token clicked provides a local context for the action and may be used to pinpoint the exact variable or operation affected.

    @@ -504,7 +515,7 @@

    The structure definition is filled in by examining how the variable is used, assuming it is a pointer to the structure, tracing - data-flow to all the expressions the variable is used in. LOAD and STORE operations + data flow to all the expressions the variable is used in. LOAD and STORE operations trigger new fields and additive offsets are traced to calculate the offset of the fields within the structure definition.

    @@ -513,8 +524,8 @@ retyped to be a pointer to the structure. Within the window, the function is decompiled again and references to new fields in the structure should be immediately apparent. These can be renamed or retyped from the window - to further refine the new structure definition. - (See “Rename Variable”) + to further refine the new structure definition + (see Rename Variable).

    @@ -543,26 +554,32 @@

    Set or change a comment at the address of the selected token.

    - These actions bring up the general Comment dialog (See Comments), - which associates the comment with a specific address in the Program. For the - decompiler actions, this address is of the machine instruction most closely linked to the selected token. - Comments will be visible in the Listing and other Ghidra windows viewing the same + These actions bring up the general Comment dialog (see Comments), + which associates the comment with a specific address in the Program. For comment + actions in the Decompiler, this address is of the machine instruction most closely linked to the selected token. + Any comments generated from a Decompiler window will be visible in Listing and other windows viewing the same section of code.

    - The decompiler windows can display all comment types, but this may be affected by the Display options - (See “Comments”). + A Decompiler window can display all comment types, but this may be affected by the Display options + (see Comments).

    -
    Set Plate Comment ...
    +
    Set Plate Comment...

    Brings up the dialog for setting or editing a Plate comment.

    -
    Set Pre Comment ...
    +
    Set Pre Comment...

    Brings up the dialog for setting or editing a Pre comment.

    +
    Set...
    +

    + Brings up the dialog for setting or editing a comment based on the selected token. + A Plate comment is edited if the token is part + of the function's header. A Pre comment is edited otherwise. +

    @@ -572,21 +589,21 @@ Commit Local Names

    - Commit the names of any local variables discovered during the decompiler's analysis + Commit the names of any local variables discovered during the Decompiler's analysis to the Program database as new Variable Annotations. The recovered data-type is not committed as part of the annotation, only the name and storage location.

    - Parameters are not affected by this command, see “Commit Params/Return”. + Parameters are not affected by this command (see Commit Params/Return). The purpose of the command is to synchronize the local variables in the - decompiler's view of a function with the formal Variable Annotations in the disassembly view, + Decompiler's view of a function with the formal Variable Annotations in the disassembly view, without otherwise affecting the decompilation. After executing this command, additional changes - to local variable can be performed directly on the corresponding annotations in the Listing Window, - using various methods (See “Variable Annotations”). + to local variables can be performed directly on the corresponding annotations displayed in Listing windows, + using various methods (see Variable Annotations). Data-types are not forced for new annotations, they are created with - an undefined data-type, which allows the decompiler to refine + an undefined data-type, which allows the Decompiler to refine its view of the variable's data-type as new information becomes available - (See “Forcing Data-types”). + (see Forcing Data-types).

    @@ -595,28 +612,28 @@ Commit Params/Return

    - Commit the decompiler's analysis of the input parameters and return value of the current + Commit the Decompiler's analysis of the input parameters and return value of the current function as annotations to the Program database.

    - In the absence of either imported or user defined - information about a function's prototype, the decompiler performs its own analysis of what + In the absence of either imported or user-defined + information about a function's prototype, the Decompiler performs its own analysis of what the prototype is, determining the storage location and data-type of all parameters and the return value. This action commits this analysis permanently for the current function displayed in the window, creating a matching Variable Annotation for each input - parameter and the return value. The new annotations will be displayed in the - Listing Window as part of the function header, and the action effectively - synchronizes the disassembly view and decompiler's view of the function prototype. + parameter and the return value. The new annotations will be displayed in a + Listing window as part of the function header, and the action effectively + synchronizes the disassembly view and Decompiler's view of the function prototype.

    Committed prototype information is used both when decompiling the function itself and when - decompiling other functions that call it. - The committed annotations are forcing on the decompiler, and it will - no longer perform prototype recovery analysis for that function. The decompiler assumes the committed parameters, + decompiling other functions that call it. The committed annotations are forcing + on the Decompiler (see Forcing Data-types), and it will + no longer perform prototype recovery analysis for that function. The Decompiler assumes the committed parameters, and only the committed parameters, exist and will not modify their data-types, with the exception of parameters that are explicitly marked as having an undefined data-type. The user must manually modify individual variables or clear the entire prototype - if they want a change (See “Variable Annotations”). + if they want a change (see Variable Annotations).

    @@ -665,10 +682,10 @@

    This command primarily targets a constant token in the Decompiler window, but if there is a scalar operand in an instruction that corresponds - with the selected constant, the same conversion is also applied to the scalar in the Listing + with the selected constant, the same conversion is also applied to the scalar in any Listing window. This is equivalent to selecting the - Convert command from the - Listing. There may not be a scalar operand directly corresponding to the selected constant, in + Convert command from a + Listing window. There may not be a scalar operand directly corresponding to the selected constant, in which case the conversion will be applied only in the Decompiler window.

    @@ -679,20 +696,20 @@

    The constant's encoding can be changed by selecting a different Convert command, or it can be returned to its default encoding by selecting - the “Remove Convert/Equate” command. + the Remove Convert/Equate command.

    -Copy/Copy Special ...

    +Copy

    - Copy selected code from the decompiler window into the clipboard. + Copy selected code from the Decompiler window into the clipboard.

    This is part of the standard copy - capabilities for all Ghidra windows and is suitable for copying (sections of) decompiler output + capabilities for all Ghidra windows and is suitable for copying (sections of) Decompiler output into other documents.

    @@ -713,8 +730,8 @@ Enum Editor.

    - Any change to the definition of the data-type is automatically incorporated by the decompiler into its output - (see “Variable Data-types”). + Any change to the definition of the data-type is automatically incorporated by the Decompiler into its output + (see Variable Data-types).

    @@ -727,7 +744,7 @@ details about how it passes parameters.

    - The action is available from any token in the decompiler window. Most tokens trigger editing + The action is available from any token in the Decompiler window. Most tokens trigger editing of the current function itself, but a called function can be edited by putting the cursor on its name specifically.

    @@ -758,16 +775,16 @@

    See documentation for the Function Editor Dialog. - The decompiler automatically incorporates any changes into its output. + The Decompiler automatically incorporates any changes into its output.

    -Find ...

    +Find...

    - Search for strings within the active window, in the current decompiler output. + Search for strings within the active window, in the current Decompiler output.

    The command brings up a dialog where a search pattern can entered as a raw string or regular expression. @@ -783,8 +800,8 @@ Highlight

    - All these actions highlight a specific set of variable tokens tracing the data-flow - of the selected variable within the current function. Data-flow is the directed flow + All these actions highlight a specific set of variable tokens tracing the data flow + of the selected variable within the current function, defined as the directed flow of data from input variables through operations that manipulate their value to their output variables. The operations and variables chain together to form data-flow paths. @@ -863,7 +880,7 @@ Secondary Highlight

    - A secondary highlight is a semi-permanent token highlight in the decompiler + A secondary highlight is a semi-permanent token highlight in the Decompiler window that, unlike normal highlights, will not go away as the user clicks other tokens. The color and text being highlighted is controlled by the user and will persist for for the duration of the Ghidra session or until the user @@ -907,31 +924,33 @@ Override the function prototype corresponding to the function under the cursor.

    - This action can be triggered at call sites, where the function + This action can be triggered at a call site, where the function being decompiled is calling into another function. Users must select either the token representing - the called function's name or the tokens representing the function pointer at the call site. - A dialog is brought up where the a complete function declaration, specifying - the return data-type along with the name and data-type for each input parameter. Additionally, - the "Calling Convention", "In Line", and "No Return" properties of the function prototype - can be set (See “Function Prototypes”). + the called function's name or one of the tokens representing the function pointer at the call site. + The action brings up a dialog where the function prototype + corresponding to the call site can be edited. The dialog provides fine-grained control of + the return data-type along with the name and data-type of each input parameter. + The function prototype properties Calling Convention, + In Line, and No Return + can also be set (see Function Prototypes).

    - Confirming the dialog forces the new function prototype on the decompiler's view of the called function, + Confirming the dialog forces the new function prototype on the Decompiler's view of the called function, but only for the single selected call site.

    This action is suitable for either indirect calls or direct calls to functions taking a variable number of arguments; situations where a complete description of all parameters is not available. For direct calls with a fixed number of arguments, it is almost always better to provide - parameter information by setting the function's prototype directly. See the - “Commit Params/Return” command for instance. In this situation, the "Override Signature" + parameter information by setting the function's prototype directly (see the + Commit Params/Return command). In this situation, the "Override Signature" command is still possible, but it will bring up a confirmation dialog.

    -Reference ...

    +References

    @@ -948,7 +967,7 @@ References to parameters and global variables are always listed. If the Dynamic Data Type Discovery option is on (see To Find Location References to Data Types), - the decompiler's propagation analysis is invoked on all functions to discover local + the Decompiler's propagation analysis is invoked on all functions to discover local variables as well.

    @@ -993,8 +1012,8 @@ the constant under the cursor.

    - The selected constant must have had either a “Convert” or a - “Set Equate ...” command applied to it. After applying this command, + The selected constant must have had either a Convert or a + Set Equate... command applied to it. After applying this command, the conversion is no longer applied, and the selected constant will be displayed using the decompiler's default strategy, which depends on the data-type of the constant and other display settings (See Integer format). @@ -1010,10 +1029,10 @@

    This action can only be triggered at call sites, where an overriding - prototype was previously placed by the “Override Signature” command. As with + prototype was previously placed by the Override Signature command. As with this command, users must select either the token representing the called function's name or the tokens representing the function pointer at the call site. The action causes the - override to be removed immediately. Parameter information will be drawn from the decompiler's + override to be removed immediately. Parameter information will be drawn from the Decompiler's normal analysis.

    @@ -1027,7 +1046,7 @@

    The current function can be renamed by selecting the name token within the function's - declaration at the top of the decompiler window, or individual called functions + declaration at the top of the Decompiler window, or individual called functions can be renamed by selecting their name token within a call expression. This action brings up a dialog containing a text field prepopulated with the name to be changed. The current namespace (and any parent namespaces) is @@ -1037,8 +1056,8 @@

    A new or child namespace can - be specified by prepending the base name with the namespace using the C++ '::' - separator characters. Any namespace path entered this way is considered relative + be specified by prepending the base name with the namespace using the C++ "::" + delimiter characters. Any namespace path entered this way is considered relative to the namespace set in the drop-down menu, so the Global namespace may need to be selected if the user wants to specify an absolute path. If any path element of the namespace does not exist, it is created. @@ -1049,26 +1068,6 @@

    - -
    -

    -Rename Label

    - -

    - Rename the label corresponding to the token under the cursor. -

    -

    - A label can be renamed by triggering this action while the corresponding label token is - under the cursor. This action brings up the - Edit Label Dialog. -

    -

    - The change will be immediately visible across all references to the label - (including in any Decompiler, Listing, and Functions windows). -

    -
    - -

    Rename Field

    @@ -1084,8 +1083,9 @@

    If the initial name looks like field_0x.., it may be that the field offset was - discovered by the decompiler, and the field does not exist in the structure definition. - In this case, a new field is created at that offset, with the new name and a data-type of "undefined". + discovered by the Decompiler, and the field does not exist in the structure definition. + In this case, a new field is created at that offset, with the new name and a data-type of + undefined.

    The change to the definition is visible globally @@ -1093,10 +1093,10 @@ is triggered again to incorporate the new name, but the output is otherwise unaffected.

    - Within a decompiler window, field name tokens are presented in context, + Within a Decompiler window, field name tokens are presented in context, showing how they are used within the code flow of the current function. - Combined with “Auto Create Structure” and - “Retype Field”, this action allows a + Combined with Auto Create Structure and + Retype Field, this action allows a structure to be created and filled in based on this context.

    @@ -1117,19 +1117,37 @@

    A new or child namespace can - be specified by prepending the base name with the namespace using the C++ '::' - separator characters. Any namespace path entered this way is considered relative + be specified by prepending the base name with the namespace using the C++ "::" + delimiter characters. Any namespace path entered this way is considered relative to the namespace set in the drop-down menu, so the Global namespace may need to be selected if the user wants to specify an absolute path. If any path element of the namespace does not exist, it is created.

    The change will be immediately visible across all references to the variable, - including the Decompiler and Listing windows. A new decompilation is triggered + including in Decompiler and Listing windows. A new decompilation is triggered to incorporate the new name, but the output is otherwise unaffected.

    +
    +

    +Rename Label

    + +

    + Rename the label corresponding to the token under the cursor. +

    +

    + A label can be renamed by triggering this action while the corresponding label token is + under the cursor. This action brings up the + Edit Label Dialog. +

    +

    + The change will be immediately visible across all references to the label + (including in any Decompiler, Listing, and Functions windows). +

    +
    +

    Rename Variable

    @@ -1148,11 +1166,11 @@ change, but otherwise the the output is unaffected.

    - Local variables and parameters presented by the decompiler may be invented on-the-fly + Local variables and parameters presented by the Decompiler may be invented on-the-fly and don't necessarily have a formal annotation in Ghidra - (see “Variable Annotations”). Performing this action on + (see Variable Annotations). Performing this action on a variable will create an annotation if one didn't exist previously, which will - generally be visible as part of the function header in the Listing window. + generally be visible as part of the function header in any Listing window. A new annotation will not commit the data-type of the variable, and data-types applied later, and elsewhere in the function, can still propagate into the variable. @@ -1183,7 +1201,7 @@

    The change to the definition is visible globally throughout the Program, anywhere the data-type is referenced, and is forcing - on the decompiler (see “Forcing Data-types”). Decompilation is triggered + on the Decompiler (see Forcing Data-types). Decompilation is triggered again, and the new data-type is propagated from the point of the field reference(s). Changes to the output may be large and indirect.

    @@ -1204,8 +1222,8 @@

    The change is visible globally throughout the Program, anywhere the variable is - referenced, and is forcing on the decompiler - (see “Forcing Data-types”). Decompilation is triggered + referenced, and is forcing on the Decompiler + (see Forcing Data-types). Decompilation is triggered again, and the new data-type is propagated from the variable reference(s). Changes to the output may be large and indirect.

    @@ -1220,22 +1238,22 @@

    This action is only available from the data-type token in the function declaration, at the - top of the decompiler's output. It brings up a dialog prepopulated with the current + top of the Decompiler's output. It brings up a dialog prepopulated with the current data-type returned by the function. The user can select any fixed length data-type in the Program. Editing and confirming this dialog immediately changes the data-type. If an annotation for the return value (named <RETURN>) did not exist previously, one is created.

    As input parameter annotations and the return value annotation must be committed as a whole - (see the discussion of function prototype's in “Forcing Data-types”), if + (see the discussion of function prototype's in Forcing Data-types), if no prototype existed previously, this action also causes variable annotations for all input parameters to be created as well. In this situation, the action is equivalent to - “Commit Params/Return”, and a confirmation dialog comes up to notify the user. + Commit Params/Return, and a confirmation dialog comes up to notify the user.

    Setting a data-type on the return value using this action affects decompilation for the function itself and, additionally, any function that calls this function. Within a calling - function, the decompiler propagates the data-type into the variable or expression incorporating + function, the Decompiler propagates the data-type into the variable or expression incorporating the return value at each call site.

    @@ -1254,17 +1272,17 @@ the data-type.

    - The change to the data-type is forcing on the decompiler - (see “Forcing Data-types”). Decompilation is triggered again, and the new + The change to the data-type is forcing on the Decompiler + (see Forcing Data-types). Decompilation is triggered again, and the new data-type is propagated from the variable reference(s). Changes to the output may be large and indirect.

    - Local variables and parameters presented by the decompiler may be invented on-the-fly + Local variables and parameters presented by the Decompiler may be invented on-the-fly and don't necessarily have a formal annotation in Ghidra - (see “Variable Annotations”). Performing this action on a variable + (see Variable Annotations). Performing this action on a variable will create an annotation if one didn't exist previously, which will generally be - visible as part of the function header in the Listing window. + visible as part of the function header in any Listing window.

    Performing this action on a function parameter causes a formal annotation to @@ -1282,7 +1300,7 @@

    -Set Equate ...

    +Set Equate...

    Change the display of the integer or character constant under the cursor to an equate @@ -1300,17 +1318,17 @@ OK, completes the action, and the selected equate is substituted for its constant.

    - This command primarily targets a constant token in the Decompiler window, but + This command primarily targets a constant token in a Decompiler window, but if there is a scalar operand in an instruction that corresponds - with the selected constant, the same equate is also applied to the scalar in the Listing + with the selected constant, the same equate is also applied to the scalar in any Listing window. This is equivalent to selecting the - Set Equate command from the - Listing. There may not be a scalar operand directly corresponding to the selected constant, in + Set Equate command from a + Listing window. There may not be a scalar operand directly corresponding to the selected constant, in which case the equate will be applied only in the Decompiler window.

    Once an equate is applied, the constant can be returned to its default display - by selecting the “Remove Convert/Equate” command. + by selecting the Remove Convert/Equate command.

    @@ -1323,11 +1341,11 @@ possible range.

    - The decompiler defines high-level variables in terms of varnodes that + The Decompiler defines high-level variables in terms of varnodes that are merged together to produce the final variable. Some merging is speculative, which reduces the number of variables overall, but is not strictly necessary for valid decompilation. The merged - variable can be represented with two or more variables that have a smaller range. See the - documentation on “HighVariable”. + variable can be represented with two or more variables that have a smaller range (see the + documentation on HighVariable).

    This command is only available if the selected token is part of a high-level variable that has diff --git a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/images/document-properties.png b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/images/document-properties.png new file mode 100644 index 0000000000..ab0e8ea377 Binary files /dev/null and b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/images/document-properties.png differ diff --git a/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/images/openFolder.png b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/images/openFolder.png new file mode 100644 index 0000000000..14cf972f07 Binary files /dev/null and b/Ghidra/Features/Decompiler/src/main/help/help/topics/DecompilePlugin/images/openFolder.png differ