Apr 4
Finish chap 6 "Memory"
Mention I1 (instruction) vs D1 (data) caches.
Mention upcoming
http://ccscne.org conference
that some of us are going to on Fri Apr 12
(Sam A, Alex Hiam, Chad are presenting posters).
Use valgrind to look at (at least)
execution times and cache hits/misses
of summarycols and sumarrayrows .
We can run this on csmarlboro.org or cs.marlboro.edu,
which have slightly different caches.
I've assigned a matrix transposition similar
thing for you to do as an assignment.
Mention "cache oblivious" approaches :
Work through practice 6.18 together. (See below. I've assigned a similar "practice problem" that has an answer in the text.)
We're heading into chapter 7, "Linking" (all about C compilation and system libraries) next week.
I encourage you to get a head start and look at that.
If there's still time, do the same valgrind stuff
on some of the matrix multiplication loop variations
in the textbook.
6.18 in the text
/* creates an "array" type */
typedef int array[2][2];
void transpose(array dst, array src){
int i,j;
for (i=0; i<2; i++){
for (j=0; j<2; j++){
dst[j][i] = src[i][j];
}
}
}
------------------------------
sizeof(int) = 4
src starts at address 0
dst starts at address 16 (base 10)
L1 cache, direct mapped, write through,
block size of 8 bytes
cache has 16 data bytes, empty initially
only src, dst read/read cache misses
We talked through the answer given
in the text, confused ourselves
for a minute when we misread
which was the src & dest in the
answer. Right at the end of class,
Alex and I saw that the stated
answer is correct, with the lone
hit happening when the bottom
src (4th int) is still there
since a) the 3rd int was previously
grabbed, and b) the 3rd dst is
in its top row, not the bottom,
so it doesn't eject the 3rd & 4th int.
dst src
miss miss miss miss
miss miss miss hit
src 1st 2nd
3rd 4th | cache a b <= 1 block
| c d <= 1 block
dst 1st 3rd
2nd 4th