对于LLVM之类的编译器是如何实现在构造SSA形式的IR的时候,计算出def-use链?

在做基于SSA形式的优化的时候,发现def-use链和use-def是基本的问题,我找到的资料基本上都仅仅只是采用传统的数据流方法——扩展的liveness analysis算法,cs.rice.edu/~keith/512/,而我想要知道的是如何从AST构造SSA形式的中间代码的时候,计算出指令的def-use链。 —————————————————————————————————— 麻烦各位了!!!,如果内容多,只需要给个LLVM中的文件名或者论文即可,谢谢! ————…
关注者
143
被浏览
3348
题主的问题其实是两个问题。或者说我知道题主关心的其实是后者,但这问题的表述方式看起来像前者:
  1. 在创建SSA形式的LLVM IR时,SSA value之间的def-use信息是如何建立的?
  2. 在把alloca/load/store表达的局部变量提升为SSA value时,LLVM是如何分析出def-use关系的?

1. SSA value之间的def-use信息是如何建立的?

那么先看前者。假定我们的IR已经是SSA形式的了,只是要建立SSA value之间的def-use/use-def关系。

很简单,LLVM IR中的SSA value的def-use与use-def信息其实是用同一套双向引用来维护的。所以说只要A use了B,A与B之间的use-def与def-use关系就同时建立好了。
这个维护双向引用的数据结构叫做“Use”。LLVM官方文档对其有很不错的介绍:LLVM Programmer's Manual: The User and owned Use classes' memory layout
请读完这块文档再继续读下文。
有趣的是,这种“Use”结构在V8 TurboFan和Adobe Flash VM的Halfmoon编译器里都有应用:

举例来说,假如我们要创建一个Add指令,BinaryOperator::CreateAdd(Value *V1, Value *V2, const Twine &Name),这条新创建的Add(BinaryOperator)指令是如何跟它的两个输入(V1、V2)建立起def-use/use-def联系的呢?请看下面的代码:
class Value {
  void addUse(Use &U) { U.addToList(&UseList); }

  // ...
};

class Use {
  Value *Val;
  Use *Next;
  PointerIntPair<Use **, 2, PrevPtrTag> Prev;

  // ...
};

void Use::set(Value *V) {
  if (Val) removeFromList();
  Val = V;
  if (V) V->addUse(*this);
}

Value *Use::operator=(Value *RHS) {
  set(RHS);
  return RHS;
}

class User : public Value {
  template <int Idx, typename U> static Use &OpFrom(const U *that) {
    return Idx < 0
      ? OperandTraits<U>::op_end(const_cast<U*>(that))[Idx]
      : OperandTraits<U>::op_begin(const_cast<U*>(that))[Idx];
  }
  template <int Idx> Use &Op() {
    return OpFrom<Idx>(this);
  }
  template <int Idx> const Use &Op() const {
    return OpFrom<Idx>(this);
  }

  // ...
};

class Instruction : public User,
                    public ilist_node_with_parent<Instruction, BasicBlock> {
  // ...
};

class BinaryOperator : public Instruction {
  /// Construct a binary instruction, given the opcode and the two
  /// operands.  Optionally (if InstBefore is specified) insert the instruction
  /// into a BasicBlock right before the specified instruction.  The specified
  /// Instruction is allowed to be a dereferenced end iterator.
  ///
  static BinaryOperator *Create(BinaryOps Op, Value *S1, Value *S2,
                                const Twine &Name = Twine(),
                                Instruction *InsertBefore = nullptr);

  // ...
};

BinaryOperator::BinaryOperator(BinaryOps iType, Value *S1, Value *S2,
                               Type *Ty, const Twine &Name,
                               Instruction *InsertBefore)
  : Instruction(Ty, iType,
                OperandTraits<BinaryOperator>::op_begin(this),
                OperandTraits<BinaryOperator>::operands(this),
                InsertBefore) {
  Op<0>() = S1;
  Op<1>() = S2;
  init(iType);
  setName(Name);
}

BinaryOperator *BinaryOperator::Create(BinaryOps Op, Value *S1, Value *S2,
                                       const Twine &Name,
                                       Instruction *InsertBefore) {
  assert(S1->getType() == S2->getType() &&
         "Cannot create binary operator with two operands of differing type!");
  return new BinaryOperator(Op, S1, S2, S1->getType(), Name, InsertBefore);
}
从BinaryOperator的构造器开始看,会看到里面有:
  Op<0>() = S1;
  Op<1>() = S2;
的看起来很有趣的代码。追溯源头会来到Use::set(Value *V),这里就借助Use对象来把def与use的双向引用给联系起来了。

2. alloca/load/store形式的局部变量是如何进入SSA形式的?

关键词:mem2reg pass
源码:
lib/Transforms/Utils/Mem2Reg.cpp
lib/Transforms/Utils/PromoteMemoryToRegister.cpp

这个第二部分的本质其实是:假定我们的IR其实还不是SSA形式的,如何把它转成SSA形式的IR。这包括计算Phi节点的安插位置(Phi insertion),以及计算变量版本/变量重命名(variable renaming)。

LLVM IR虽然是SSA形式的,但如果所有生成LLVM IR的前端都要自己计算好如何生成SSA形式,对前端来说也是件麻烦事。
所以LLVM IR借助“memory不是SSA value”的特点来开了个后门来妥协:前端在生成LLVM IR时,可以选择不生成真正的SSA形式,而是把局部变量生成为alloca/load/store形式:
  • 用alloca来“声明”变量,得到一个指向该变量的指针;
  • 用store来把值存进变量里;
  • 用load来把值读出为SSA value。
这样,对局部变量的读写就跟对普通内存的读写一样,不需要是SSA形式的。
然后,LLVM在mem2reg pass中,会识别出这种模式的alloca,并把它提升为SSA value(并消除掉store与load,改为普通的SSA层面的def-use/use-def关系,并且在合适的位置安插Phi和变量重命名)。

Clang就是这样把生成SSA形式的任务交给LLVM处理的:Clang的前端只会把某些临时值生成为SSA value;对显式的局部变量,Clang前端都只是生成alloca/load/store形式的LLVM IR。
交给LLVM之后,经过mem2reg pass,IR才真正进入了普通的SSA形式。
LLVM Tutorial也提到了这点:7. Kaleidoscope: Extending the Language: Mutable Variables

例如说这样的函数:
int foo(int x, bool cond) {
  int inc;
  if (cond) {
    inc = 1;
  } else {
    inc = -1;
  }
  return x + inc;
}
Clang的前端生成的LLVM IR是:
; Function Attrs: nounwind uwtable
define i32 @_Z3fooib(i32 %x, i1 zeroext %cond) #0 {
entry:
  %x.addr = alloca i32, align 4
  %cond.addr = alloca i8, align 1
  %inc = alloca i32, align 4
  store i32 %x, i32* %x.addr, align 4
  %frombool = zext i1 %cond to i8
  store i8 %frombool, i8* %cond.addr, align 1
  %0 = load i8, i8* %cond.addr, align 1
  %tobool = trunc i8 %0 to i1
  br i1 %tobool, label %if.then, label %if.else

if.then:                                          ; preds = %entry
  store i32 1, i32* %inc, align 4
  br label %if.end

if.else:                                          ; preds = %entry
  store i32 -1, i32* %inc, align 4
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  %1 = load i32, i32* %x.addr, align 4
  %2 = load i32, i32* %inc, align 4
  %add = add nsw i32 %1, %2
  ret i32 %add
}
可以看到局部变量都在函数的最开头(entry block)有对应的alloca来“声明”它们——申请栈上空间。后面赋值的地方用store、取值的地方用load指令,就跟操作普通内存一样。

而经过mem2reg之后它才真正进入了SSA形式:
; Function Attrs: nounwind uwtable
define i32 @_Z3fooib(i32 %x, i1 zeroext %cond) #0 {
entry:
  %frombool = zext i1 %cond to i8
  %tobool = trunc i8 %frombool to i1
  br i1 %tobool, label %if.then, label %if.else

if.then:                                          ; preds = %entry
  br label %if.end

if.else:                                          ; preds = %entry
  br label %if.end

if.end:                                           ; preds = %if.else, %if.then
  %inc.0 = phi i32 [ 1, %if.then ], [ -1, %if.else ]
  %add = add nsw i32 %x, %inc.0
  ret i32 %add
}
可以看到进入SSA形式后的LLVM IR里,那些局部变量就变成了普通的SSA value,而不再需要alloca/load/store了。

LLVM的mem2reg pass本质上就是识别“局部变量”的模式,并对它们构建SSA形式。
它用的算法也很常规——经典的Lengauer-Tarjan的iterated dominance frontier算法。
mem2reg的算法有同学写过简介,我懒就不重复了:cs.rice.edu/~mc29/Milin
  1. LLVM assumes that all locals are introduced in the entry basic block of a function with an alloca instruction. LLVM also assumes that all allocas appear at the start of the entry block continuously. This assumption could be easily violated by the front-end, but it is a reasonable assumption to make.
  2. For each alloca seen in step 1, LLVM checks if it is promotable based on the use of this local. It is deemed promotable iff:
    1. (a) It is not used in a volatile instruction.
    2. (b) It is loaded or stored directly, i.e, its address is not taken.
  3. For each local selected for promotion in step 2, a list of blocks which use it, and a list of block which define it are maintained by making a linear sweep over each instruction of each block.
  4. Some basic optimizations are performed:
    1. (a) Unused allocas are removed.
    2. (b) If there is only one defining block for an alloca, all loads which are dominated by the definition are replaced with the value.
    3. (c) allocas which are read and written only in a block can avoid traversing CFG, and PHI-node insertion by simply inserting each load with the value from nearest store.
  5. A dominator tree is constructed.
  6. For each candidate for promotion, points to insert PHI nodes is computed as follows:
    1. (a) A list of blocks which use it without defining it (live-in blocks or upward exposed blocks) are determined with the help of using and defining blocks created in Step 3.
    2. (b) A priority queue keyed on dominator tree level is maintained so that inserted nodes corresponding to defining blocks are handled from the bottom of the dominator tree upwards. This is done by giving each block a level based on its position in the dominator tree.
    3. (c) For each node — root, in the priority queue:
      1. i. Iterated dominance frontier of a definition is computed by walking all dominator tree children of root, inspecting their CFG edges with targets elsewhere on the dominator tree. Only targets whose level is at most root level are added to the iterated dominance frontier.
      2. ii. PHI-nodes are inserted at the beginning in each block in the iterated dominance frontier computed in the previous step. There will be predecessor number of dummy argument to the PHI function at this point.
  7. Once all PHI-nodes are prepared, a rename phase start with a worklist containing just entry block as follows:
    1. (a) A hash table of IncomingVals which is a map from a alloca to its most recent name is created. Most recent name of each alloca is an undef value to start with.
    2. (b) While (worklist != NULL)
      1. i. Remove block B from worklist and mark B as visited.
      2. ii. For each instruction in B:
        1. A. If instruction is a load instruction from location L (where L is a promotable candidate) to value V, delete load instruction, replace all uses of V with most recent value of L i.e, IncomingVals[L].
        2. B. If instruction is a store instruction to location L (where L is a promotable candidate) with value V, delete store instruction, set most recent name of L i.e, IncomingVals[L] = V.
        3. C. For each PHI-node corresponding to a alloca — L , in each successor of B, fill the corresponding PHI-node argument with most recent name for that location i.e, IncomingVals[L].
      3. iii. Add each unvisited successor of B to worklist.
嗯先就写这么多。