https://www.fuzzingbook.org/html/Coverage.html

Coverage란, 소프트웨어 테스팅을 하는데 사용되는 지표 중 하나로, Code가 얼마나 실행되었느냐를 측정한다.
Coverage를 측정하기 위해 instrumentation을 한다.
instrumentation은 작성된 코드가 실행되는 지를 확인하는 코드를 작성된 코드 사이사이에 삽입하는 것을 뜻한다.

Function coverage : 함수가 최소한 한 번 실행되었는 가를 측정한다. function call을 했는지만을 확인하기 때문에 해당 function안의 모든 코드가 실행되었는 지는 확인할 수 없다.

Statement coverage : 각 statement가 실행 되었는지를 측정한다.
- 장점 : object code에 바로 적용 될 수 있다.
- 단점 : 특정 조건분기문에서 에러를 찾아내지 못할 가능성이 있다.
  1
  2
  3
  4
  int *p = NULL;
  if(condition)
  p = &a;
  *p = 123;

condition이 false라면 null pointer error가 발생한다. 이때 coverage가 달성되었다고 하겠지만, 모든 경우를 test했다고 보장 할 수 없다.

Brach coverage : 각 조건 분기문이 참/거짓으로 모두 실행되었는 지를 측정한다.
- 장점 : Statement coverage가 가지고 있는 단점을 보완한다.
- 단점 : 조건문 전체를 확인해보지 않기 때문에 에러를 찾아내지 못할 가능성이 있다.
  1
  if(condition1 && (condition2 || function()))

condition2가 참이되는 순간 function()을 실행하지 않기 때문에 function()함수가 test되지 않는 경우가 발생한다.

Condition coverage : 조건 분기문에 있는 모든 expression들의 참/거짓을 측정한다.
- 장점 : Branch coverage의 단점을 보완한다.
  1
  if(condition1 || condition2 && condition3)

condition1, condition2, condition3가 각각 모두 참/거짓인 경우를 확인한다.

Path coverage : 각 function에서 가능한 모든 Path를 실행해봤는지를 확인한다.

path란 함수의 시작부터 끝까지 존재하는 연속된 branch들을 뜻한다.
각 path는 unique하다.
- 장점 : 엄밀한 테스팅이 가능하다.
- 단점 : Path의 수가 매우 많다.

Synopsis

coverage 정보를 uncoverd locations로 fuzzing을 유도할 수 있다. (guided fuzzing)

A CGI Decoder

CGI encoding은 URL에서 사용된다.
공백은 ‘+’로, invaild한 문자는 ‘%xx’ 로 바뀐다. (xx는 두자리 hex)
“Hello, world!” -> “Hello%2c+world%21”

def cgi_decode(s: str) -> str:
    """Decode the CGI-encoded string `s`:
       * replace '+' by ' '
       * replace "%xx" by the character with hex number xx.
       Return the decoded string.  Raise `ValueError` for invalid inputs."""

    # Mapping of hex digits to their integer values
    hex_values = {
        '0': 0, '1': 1, '2': 2, '3': 3, '4': 4,
        '5': 5, '6': 6, '7': 7, '8': 8, '9': 9,
        'a': 10, 'b': 11, 'c': 12, 'd': 13, 'e': 14, 'f': 15,
        'A': 10, 'B': 11, 'C': 12, 'D': 13, 'E': 14, 'F': 15,
    }

    t = ""
    i = 0
    while i < len(s):
        c = s[i]
        if c == '+':
            t += ' '
        elif c == '%':
            digit_high, digit_low = s[i + 1], s[i + 2]
            i += 2
            if digit_high in hex_values and digit_low in hex_values:
                v = hex_values[digit_high] * 16 + hex_values[digit_low]
                t += chr(v)
            else:
                raise ValueError("Invalid encoding")
        else:
            t += c
        i += 1
    return t

cgi_decode함수를 체계적으로 테스트 하고싶다면 두 가지 방법이 있다.
BlackBox testing과 WhiteBox testing

Black-Box Testing

블랙박스 테스트는 specification으로부터 테스트를 도출하는 것이다.
내부 구조나 동작 원리를 모르는 상태에서 소프트웨어의 동작을 검사하는 것.
- testing for correct replacement of ‘+’;
- testing for correct replacement of “%xx”;
- testing for non-replacement of other characters; and
- testing for recognition of illegal inputs.

White-Box Testing

화이트박스 테스트는 내부 구조를 가지고 테스트를 도출하는 것이다.
코드의 구조적 특징을 다루는 개념과 밀접한 관련이 있다.
- The block following if c == ‘+’
- The two blocks following if c == ‘%’ (one for valid input, one for invalid)
- The final else case for all other characters.

Tracing Executions

파이썬에서는 함수 sys.settrace()를 사용하면 추적이 가능하다.
모든 라인에 대해 호출되는 추적함수 f를 정의하고 sys.settrace(f)를 호출하면 된다.
추적함수에는 3가지 매개변수가 있다. (frame, event, arg)
- frame 매개변수는 현재 frame을 가져와 현재 위치와 변수에 접근할 수 있다.
  - frame.f_code는 현재 프레임과 함께 생성되는 코드이다.
  - frame.f_code.co_name은 함수 이름이다.
  - frame.f_lineno는 현재 라인 번호이다.
  - frame.f_locals는 지역 변수와 인자값 이다.
- event 매개변수는 문자열이 들어간다.
  - “line”이면, 호출된 새로운 line,
  - “call”이면, 호출된 함수
- agr 매개변수는 어떠한 event에 대한 추가적인 인자이다.
  - “return”이벤트일대 arg는 리턴 값을 포함한다.

def traceit(frame: FrameType, event: str, arg: Any) -> Optional[Callable]:
    """Trace program execution. To be passed to sys.settrace()."""
    if event == 'line':
        global coverage
        function_name = frame.f_code.co_name
        lineno = frame.f_lineno
        coverage.append(lineno)

    return traceit

추적함수 traceit을 선언해준다. 새로운 line이 호출되면, 호출된 lineno을 coverage 전역변수에 추가해주는 함수이다.

def cgi_decode_traced(s: str) -> None:
    global coverage
    coverage = []
    sys.settrace(traceit)  # Turn on
    cgi_decode(s)
    sys.settrace(None)    # Turn off

sys.settrace()함수로 추적함수를 실행 시킨 후, 추적할 함수 cgi_decode()를 호출한다.
그 후 sys.settrace(none)함수로 추적을 종료한다.

for lineno in range(1, len(cgi_decode_lines)):
    if (function_name, lineno) in self.trace():
    	t += "# "
    else:
    	t += "  "
    print("%2d  " % lineno, end="")
    print_content(cgi_decode_lines[lineno], '.py')
    print()

추가적으로 위의 코드를 실행시키면, 커버되지 않은 line과 커버된 line에 표시를 할 수 있다.

1
2
3

with Coverage() as cov:
    cgi_decode("a+b")
print(cov)

“a+b”를 주었을 때 if c==”+” block을 커버하고, elif문을 커버하지 않는 것을 확인할 수 있다.

A Coverage Class

1 2	with OBJECT [as VARIABLE]: BODY

파이썬으로 커버리지를 측정할때 일반적인 형식은 OBJECT가 정의되고 BODY가 실행되는데,
이때 OBJECT.enter()과, OBJECT.exit()가 자동으로 호출된다.
Coverage.enter()은 자동으로 추적을 시작하고.
Coverage.exit()은 추적을 종료한다.

1
2
3

with Coverage() as cov:
    function_to_be_traced()
c = cov.coverage()

function_to_be_traced()가 진행될 동안 추적을 시작하고, with block 다음에 다시 꺼진다.

Location = Tuple[str, int]

class Coverage:

    def __init__(self) -> None:
        """Constructor"""
        self._trace: List[Location] = []
    
    # Trace function
    def traceit(self, frame: FrameType, event: str, arg: Any) -> Optional[Callable]:
        """Tracing function. To be overloaded in subclasses."""
        if self.original_trace_function is not None:
            self.original_trace_function(frame, event, arg)
    
        if event == "line":
            function_name = frame.f_code.co_name
            lineno = frame.f_lineno
            if function_name != '__exit__':  # avoid tracing ourselves:
                self._trace.append((function_name, lineno))
    
        return self.traceit
    
    def __enter__(self) -> Any:
        """Start of `with` block. Turn on tracing."""
        self.original_trace_function = sys.gettrace()
        sys.settrace(self.traceit)
        return self
    
    def __exit__(self, exc_type: Type, exc_value: BaseException, 
                 tb: TracebackType) -> Optional[bool]:
        """End of `with` block. Turn off tracing."""
        sys.settrace(self.original_trace_function)
        return None  # default: pass all exceptions
    
    def trace(self) -> List[Location]:
        """The list of executed lines, as (function_name, line_number) pairs"""
        return self._trace
    
    def coverage(self) -> Set[Location]:
        """The set of executed lines, as (function_name, line_number) pairs"""
        return set(self.trace())
    
    def function_names(self) -> Set[str]:
        """The set of function names seen"""
        return set(function_name for (function_name, line_number) in self.coverage())
    
    def __repr__(self) -> str:
        """Return a string representation of this object.
           Show covered (and uncovered) program code"""
        t = ""
        for function_name in self.function_names():
            # Similar code as in the example above
            try:
                fun = eval(function_name)
            except Exception as exc:
                t += f"Skipping {function_name}: {exc}"
                continue
    
            source_lines, start_line_number = inspect.getsourcelines(fun)
            for lineno in range(start_line_number, start_line_number + len(source_lines)):
                if (function_name, lineno) in self.trace():
                    t += "# "
                else:
                    t += "  "
                t += "%2d  " % lineno
                t += source_lines[lineno - start_line_number]
    
        return t

Coverage of Basic Fuzzing

cgi_decode() 함수를 랜덤 퍼징으로 최고의 커버리지에 도달하는 것을 목표로 한다.

with Coverage() as cov_fuzz:
    try:
        cgi_decode(sample)
    except:
        pass
cov_fuzz.coverage()

maximum 커버리지라고 생각 될 수있지만, max 커버리지와 비교했을 때 여전히 몇 개의 라인을 놓친것을 확인 할 수 있다.

Getting Coverage from External Programs

거의 모든 프로그래밍 언어는 커버리지를 측정할 수 있는 기능이 있다.

#include <stdlib.h>
#include <string.h>
#include <stdio.h>

int hex_values[256];

void init_hex_values() {
    for (int i = 0; i < sizeof(hex_values) / sizeof(int); i++) {
        hex_values[i] = -1;
    }
    hex_values['0'] = 0; hex_values['1'] = 1; hex_values['2'] = 2; hex_values['3'] = 3;
    hex_values['4'] = 4; hex_values['5'] = 5; hex_values['6'] = 6; hex_values['7'] = 7;
    hex_values['8'] = 8; hex_values['9'] = 9;

    hex_values['a'] = 10; hex_values['b'] = 11; hex_values['c'] = 12; hex_values['d'] = 13;
    hex_values['e'] = 14; hex_values['f'] = 15;

    hex_values['A'] = 10; hex_values['B'] = 11; hex_values['C'] = 12; hex_values['D'] = 13;
    hex_values['E'] = 14; hex_values['F'] = 15;
}

int cgi_decode(char *s, char *t) {
    while (*s != '\0') {
        if (*s == '+')
            *t++ = ' ';
        else if (*s == '%') {
            int digit_high = *++s;
            int digit_low = *++s;
            if (hex_values[digit_high] >= 0 && hex_values[digit_low] >= 0) {
                *t++ = hex_values[digit_high] * 16 + hex_values[digit_low];
            }
            else
                return -1;
        }
        else
            *t++ = *s;
        s++;
    }
    *t = '\0';
    return 0;
}

int main(int argc, char *argv[]) {
    init_hex_values();

    if (argc >= 2) {
        char *s = argv[1];
        char *t = malloc(strlen(s) + 1); /* output is at most as long as input */
        int ret = cgi_decode(s, t);
        printf("%s\n", t);
        return ret;
    }
    else
    {
        printf("cgi_decode: usage: cgi_decode STRING\n");
        return 1;
    }
}

파이썬과 같은 코드로 c를 짠다.

1	cc --coverage -o cgi_decode cgi_decode.c

컴파일 단계에서 –coverage 옵션을 준다. –coverage옵션은 런타임에 정보가 수집되도록 코드를 instrument한다.
1
./cgi_decode 'Send+mail+to+me%40fuzzingbook.org'
프로그램을 실행할 때 커버리지 정보는 파일로 자동 생성된다.
커버러지 정보는 gcov 프로그램에 의해 수집된다.
주어진 모든 소스 파일에 대해 커버리지 정보가 포함된 새로운 .gcov 파일을 생성한다.
1
gcov cgi_decode.c
.gcov 파일에서 각 행은 호출된 횟수와 줄 번호가 앞에 붙는다.

rvkhunLab

The Fuzzing Book_02_Coverage

Synopsis

A CGI Decoder

Black-Box Testing

White-Box Testing

Tracing Executions

A Coverage Class

Coverage of Basic Fuzzing

Getting Coverage from External Programs