Control Flow Graphs
A control flow graph (often referenced by its acronym, CFG) is a directed graph that models the flow of a program using basic blocks that contain the intermediate representation (abbreviated “IR”, and in this case is Python bytecode) within the blocks. Basic blocks themselves are a block of IR that has a single entry point but possibly multiple exit points. The single entry point is the key to basic blocks; it all has to do with jumps. An entry point is the target of something that changes control flow (such as a function call or a jump) while exit points are instructions that would change the flow of the program (such as jumps and ‘return’ statements). What this means is that a basic block is a chunk of code that starts at the entry point and runs to an exit point or the end of the block.
As an example, consider an ‘if’ statement with an ‘else’ block. The guard on the ‘if’ is a basic block which is pointed to by the basic block containing the code leading to the ‘if’ statement. The ‘if’ statement block contains jumps (which are exit points) to the true body of the ‘if’ and the ‘else’ body (which may be
NULL), each of which are their own basic blocks. Both of those blocks in turn point to the basic block representing the code following the entire ‘if’ statement.
CFGs are usually one step away from final code output. Code is directly generated from the basic blocks (with jump targets adjusted based on the output order) by doing a post-order depth-first search on the CFG following the edges.