Sparse API

Utilities

Pointer list manipulation

The data structure handled here is designed to hold pointers but two special cases need to be avoided or need special care:

  • NULL is used by {PREPARE,NEXT}_PTR_LIST() to indicate the end-of-list. Thus, NULL can’t be stored in lists using this API but is fine to use with FOR_EACH_PTR() and its variants.
  • VOID is used to replace a removed pseudo ‘usage’. Since phi-nodes (OP_PHI) use a list to store their operands, a VOID in a phi-node list must be ignored since it represents a removed operand. As consequence, VOIDs must never be used as phi-node operand. This is fine since phi-nodes make no sense with void values but VOID is also used for invalid types and in case of errors.
int ptr_list_size(struct ptr_list *head)

Get the size of a ptrlist.

Parameters:
  • head – the head of the list
Returns:

the size of the list given by head.

bool ptr_list_empty(const struct ptr_list *head)

Test if a list is empty.

Parameters:
  • head – the head of the list
Returns:

true if the list is empty, false otherwise.

bool ptr_list_multiple(const struct ptr_list *head)

Test is a list contains more than one element.

Parameters:
  • head – the head of the list
Returns:

true if the list has more than 1 element, false otherwise.

void *first_ptr_list(struct ptr_list *head)

Get the first element of a ptrlist.

Parameters:
  • head – the head of the list
Returns:

the first element of the list or NULL if the list is empty

void *last_ptr_list(struct ptr_list *head)

Get the last element of a ptrlist.

Parameters:
  • head – the head of the list
Returns:

the last element of the list or NULL if the list is empty

void *ptr_list_nth_entry(struct ptr_list *list, unsigned int idx)

Get the nth element of a ptrlist.

Parameters:
  • head – the head of the list
Returns:

the nth element of the list or NULL if the list is too short.

int linearize_ptr_list(struct ptr_list *head, void **arr, int max)

Linearize the entries of a list.

Parameters:
  • head – the list to be linearized
  • arr – a void* array to fill with head’s entries
  • max – the maximum number of entries to store into arr
Returns:

the number of entries linearized.

Linearize the entries of a list up to a total of max, and return the nr of entries linearized.

The array to linearize into (arr) should really be void *x[], but we want to let people fill in any kind of pointer array, so let’s just call it void **.

void pack_ptr_list(struct ptr_list **listp)

Pack a ptrlist.

Parameters:
  • listp – a pointer to the list to be packed.

When we’ve walked the list and deleted entries, we may need to re-pack it so that we don’t have any empty blocks left (empty blocks upset the walking code).

void split_ptr_list_head(struct ptr_list *head)

Split a ptrlist block.

Parameters:
  • head – the ptrlist block to be split

A new block is inserted just after head and the entries at the half end of head are moved to this new block. The goal being to create space inside head for a new entry.

void **__add_ptr_list(struct ptr_list **listp, void *ptr)

Add an entry to a ptrlist.

Parameters:
  • listp – a pointer to the list
  • ptr – the entry to add to the list
Returns:

the address where the new entry is stored.

Note:

code must not use this function and should use add_ptr_list() instead.

void **__add_ptr_list_tag(struct ptr_list **listp, void *ptr, unsigned long tag)

Add a tagged entry to a ptrlist.

Parameters:
  • listp – a pointer to the list
  • ptr – the entry to add to the list
  • tag – the tag to add to ptr
Returns:

the address where the new entry is stored.

Note:

code must not use this function and should use add_ptr_list_tag() instead.

bool lookup_ptr_list_entry(const struct ptr_list *head, const void *entry)

Test if some entry is already present in a ptrlist.

Parameters:
  • list – the head of the list
  • entry – the entry to test
Returns:

true if the entry is already present, false otherwise.

int delete_ptr_list_entry(struct ptr_list **list, void *entry, int count)

Delete an entry from a ptrlist.

Parameters:
  • list – a pointer to the list
  • entry – the item to be deleted
  • count – the minimum number of times entry should be deleted or 0.
int replace_ptr_list_entry(struct ptr_list **list, void *old_ptr, void *new_ptr, int count)

Replace an entry in a ptrlist.

Parameters:
  • list – a pointer to the list
  • old_ptr – the entry to be replaced
  • new_ptr – the new entry
  • count – the minimum number of times entry should be deleted or 0.
void * undo_ptr_list_last(struct ptr_list **head)

Remove the last entry of a ptrlist.

Parameters:
  • head – a pointer to the list
Returns:

the last element of the list or NULL if the list is empty.

Note:

this doesn’t repack the list

void * delete_ptr_list_last(struct ptr_list **head)

Remove the last entry and repack the list.

Parameters:
  • head – a pointer to the list
Returns:

the last element of the list or NULL if the list is empty.

void concat_ptr_list(struct ptr_list *a, struct ptr_list **b)

Concat two ptrlists.

Parameters:
  • a – the source list
  • b – a pointer to the destination list.

The element of a are added at the end of b.

void copy_ptr_list(struct ptr_list **listp, struct ptr_list *src)

Copy the elements of a list at the end of another list.

Parameters:
  • listp – a pointer to the destination list.
  • src – the head of the source list.
void __free_ptr_list(struct ptr_list **listp)

Free a ptrlist.

Parameters:
  • listp – a pointer to the list

Each blocks of the list are freed (but the entries themselves are not freed).

Note:code must not use this function and should use the macro free_ptr_list() instead.

Miscellaneous utilities

unsigned int hexval(unsigned int c)

Return the value coresponding to an hexadecimal digit.

void *xmemdup(const void *src, size_t len)

Duplicate a memory buffer in a newly allocated buffer.

Parameters:
  • src – a pointer to the memory buffer to be duplicated
  • len – the size of the memory buffer to be duplicated
Returns:

a pointer to a copy of src allocated via __alloc_bytes().

char *xstrdup(const char *src)

Duplicate a null-terminated string in a newly allocated buffer.

Parameters:
  • src – a pointer to string to be duplicated
Returns:

a pointer to a copy of str allocated via __alloc_bytes().

char *xasprintf(const char *fmt, ...)

Printf to allocated string.

Parameters:
  • fmt – the format followed by its arguments.
Returns:

the allocated & formatted string.

This function is similar to asprintf() but the resulting string is allocated with __alloc_bytes().

char *xvasprintf(const char *fmt, va_list ap)

Vprintf to allocated string.

Parameters:
  • fmt – the format
  • ap – the variadic arguments
Returns:

the allocated & formatted string.

This function is similar to asprintf() but the resulting string is allocated with __alloc_bytes().

Utilities for flowgraphs

int cfg_postorder(struct entrypoint *ep)

Set the BB’s reverse postorder links.

Each BB will also have its ‘order number’ set.

void domtree_build(struct entrypoint *ep)

Build the dominance tree.

Each BB will then have:
  • a link to its immediate dominator (::idom)
  • the list of BB it immediately dominates (::doms)
  • its level in the dominance tree (::dom_level)
bool domtree_dominates(struct basic_block *a, struct basic_block *b)

Test the dominance between two basic blocks.

Parameters:
  • a – the basic block expected to dominate
  • b – the basic block expected to be dominated
Returns:

true if a dominates b, false otherwise.

Parsing

Constant expression values

int is_zero_constant(struct expression *expr)

Test if an expression evaluates to the constant 0.

Returns:1 if expr evaluate to 0, 0 otherwise.
int expr_truth_value(struct expression *expr)

Test the compile time truth value of an expression.

Returns:
  • -1 if expr is not constant,
  • 0 or 1 depending on the truth value of expr.

Typing

struct symbol *evaluate_expression(struct expression *expr)

Evaluate the type of an expression.

Parameters:
  • expr – the expression to be evaluated
Returns:

the type of the expression or NULL if the expression can’t be evaluated

struct symbol *evaluate_statement(struct statement *stmt)

Evaluate the type of a statement.

Parameters:
  • stmt – the statement to be evaluated
Returns:

the type of the statement or NULL if it can’t be evaluated

void evaluate_symbol_list(struct symbol_list *list)

Evaluate the type of a set of symbols.

Parameters:
  • list – the list of the symbol to be evaluated
int evaluate_arguments(struct symbol_list *argtypes, struct expression_list *args)

Evaluate the arguments of a function.

Parameters:
  • argtypes – the list of the types in the prototype
  • args – the list of the effective arguments

Optimization

Optimization main loop

void optimize(struct entrypoint *ep)

Optimization main loop.

Flow simplification

static struct instruction *get_phinode(struct instruction *phisrc)

Returns the phi-node corresponding to a phi-source.

static int bb_is_trivial(struct basic_block *bb)

Check if the BB is empty or only contains phi-sources.

static bool bb_is_forwarder(struct basic_block *bb)

Does the BB contains ignorable instructions but a final branch?

Note:something could be done for phi-sources but … we’ll see.
static int retarget_bb(struct basic_block *dom, struct basic_block *old, struct basic_block *new)

Do jump threading in dominated BBs.

Parameters:
  • dom – the BB which must dominate the modified BBs.
  • old – the old target BB
  • new – the new target BB
Returns:

0 if no chnages have been made, 1 otherwise.

In all BB dominated by dom, rewrite branches to old into branches to new

static int merge_bb(struct basic_block *top, struct basic_block *bot)

Merge two BBs.

Parameters:
  • top – the first BB to be merged
  • bot – the second BB to be merged
int simplify_cfg_early(struct entrypoint *ep)

Early simplification of the CFG.

Three things are done here:
# inactive BB are removed # branches to a ‘forwarder’ BB are redirected to the forwardee. # merge single-child/single-parent BBs.

Instruction simplification

Notation

The following conventions are used to describe the simplications:

  • Uppercase letters are reserved for constants:
    • M for a constant mask,
    • S for a constant shift,
    • N for a constant number of bits (usually other than a shift),
    • C or ‘K’ for others constants.
  • Lowercase letters a, b, x, y, … are used for non-constants or when it doesn’t matter if the pseudo is a constant or not.
  • Primes are used if needed to distinguish symbols (M, M’, …).
  • Expressions or sub-expressions involving only constants are understood to be evaluated.
  • $mask(N) is used for ((1 << N) -1)
  • $trunc(x, N) is used for (x & $mask(N))
  • Expressions like (-1 << S), (-1 >> S) and others formulae are understood to be truncated to the size of the current instruction (needed, since in general this size is not the same as the one used by sparse for the evaluation of arithmetic operations).
  • TRUNC(x, N) is used for a truncation to a size of N bits
  • ZEXT(x, N) is used for a zero-extension from a size of N bits
  • OP(x, C) is used to represent some generic operation using a constant, including when the constant is implicit (e.g. TRUNC(x, N)).
  • MASK(x, M) is used to respresent a ‘masking’ instruction:
    • AND(x, M)
    • LSR(x, S), with M = (-1 << S)
    • SHL(x, S), with M = (-1 >> S)
    • TRUNC(x, N), with M = $mask(N)
    • ZEXT(x, N), with M = $mask(N)
  • SHIFT(x, S) is used for LSR(x, S) or SHL(x, S).

Utilities

static struct basic_block *phi_parent(struct basic_block *source, pseudo_t pseudo)

Find the trivial parent for a phi-source.

static int get_phisources(struct instruction *sources[], int nbr, struct instruction *insn)

Copy the phi-node’s phisrcs into to given array.

Returns:0 if the the list contained the expected number of element, a positive number if there was more than expected and a negative one if less.
Note:we can’t reuse a function like linearize_ptr_list() because any VOIDs in the phi-list must be ignored here as in this context they mean ‘entry has been removed’.
static pseudo_t trivial_phi(pseudo_t pseudo, struct instruction *insn, struct pseudo_list **list)

Detect trivial phi-nodes.

Parameters:
  • insn – the phi-node
  • pseudo – the candidate resulting pseudo (NULL when starting)
Returns:

the unique result if the phi-node is trivial, NULL otherwise

A phi-node is trivial if it has a single possible result:
  • all operands are the same
  • the operands are themselves defined by a chain or cycle of phi-nodes
    and the set of all operands involved contains a single value not defined by these phi-nodes

Since the result is unique, these phi-nodes can be removed.

int kill_insn(struct instruction *insn, int force)

Kill an instruction.

Parameters:
  • insn – the instruction to be killed
  • force – if unset, the normal case, the instruction is not killed if not free of possible side-effect; if set the instruction is unconditionally killed.

The killed instruction is removed from its BB and the usage of all its operands are removed. The instruction is also marked as killed by setting its ->bb to NULL.

static inline bool is_signed_constant(long long val, unsigned osize, unsigned nsize)

Is this same signed value when interpreted with both size?

static inline pseudo_t is_same_op(pseudo_t src, int op, unsigned osize)

Is @src generated by an instruction with the given opcode and size?

static inline int replace_pseudo(struct instruction *insn, pseudo_t *pp, pseudo_t new)

Replace the operand of an instruction.

Parameters:
  • insn – the instruction
  • pp – the address of the instruction’s operand
  • new – the new value for the operand
Returns:

REPEAT_CSE.

static inline int replace_with_unop(struct instruction *insn, int op, pseudo_t src)

Replace a binop with an unop.

Parameters:
  • insn – the instruction to be replaced
  • op – the instruction’s new opcode
  • src – the instruction’s new operand
Returns:

REPEAT_CSE

static inline int replace_binop_value(struct instruction *insn, int op, long long val)

Replace rightside’s value.

Parameters:
  • insn – the instruction to be replaced
  • op – the instruction’s new opcode
  • src – the instruction’s new operand
Returns:

REPEAT_CSE

static inline int replace_opcode(struct instruction *insn, int op)

Replace the opcode of an instruction.

Returns:REPEAT_CSE
static unsigned int operand_size(struct instruction *insn, pseudo_t pseudo)

Try to determine the maximum size of bits in a pseudo.

Right now this only follow casts and constant values, but we could look at things like AND instructions, etc.

Simplifications

static int simplify_mask_or_and(struct instruction *insn, unsigned long long mask, pseudo_t ora, pseudo_t orb)

Try to simplify MASK(OR(AND(x, M’), b), M).

Parameters:
  • insn – the masking instruction
  • mask – the associated mask (M)
  • ora – one of the OR’s operands, guaranteed to be PSEUDO_REG
  • orb – the other OR’s operand
Returns:

0 if no changes have been made, one or more REPEAT_* flags otherwise.

static int simplify_mask_or(struct instruction *insn, unsigned long long mask, struct instruction *or)

Try to simplify MASK(OR(a, b), M).

Parameters:
  • insn – the masking instruction
  • mask – the associated mask (M)
  • or – the OR instruction
Returns:

0 if no changes have been made, one or more REPEAT_* flags otherwise.

static int simplify_mask_shift_or(struct instruction *sh, struct instruction *or, unsigned long long mask)

Try to simplify MASK(SHIFT(OR(a, b), S), M).

Parameters:
  • sh – the shift instruction
  • or – the OR instruction
  • mask – the mask associated to MASK (M):
Returns:

0 if no changes have been made, one or more REPEAT_* flags otherwise.

static int canonical_order(pseudo_t p1, pseudo_t p2)

Check if the given pseudos are in canonical order.

The canonical order is VOID < UNDEF < PHI < REG < ARG < SYM < VAL The rationale is:

  • VALs at right (they don’t need a definition)
  • REGs at left (they need a defining instruction)
  • SYMs & ARGs between REGs & VALs
  • REGs & ARGs are ordered between themselves by their internal number
  • SYMs are ordered between themselves by address
  • VOID, UNDEF and PHI are uninteresting (but VOID should have type 0)
static int simplify_memop(struct instruction *insn)

Simplify memops instructions.

Note:We walk the whole chain of adds/subs backwards. That’s not only more efficient, but it allows us to find loops.
static int simplify_cond_branch(struct instruction *br, struct instruction *def, pseudo_t newcond)

Simplify SET_NE/EQ $0 + BR.