CAL: January 2011

Friday, January 21, 2011

Skype discussion

at 7:37 AM Labels: Research 0 comments

parallelize for JasPer 9/7 portion
quantization, vectorization? parallelization?
memory intel asm vectorization, copying stuff, 64bits, upto factor 2 by using the 128 bits. pin down the problems.
packagelization, Rate/Distortion control, print the number separated by the comma,
modify simplescalar to create 3-D memory access pattern graph

now counting access,

count the load and write sedately. 2 sets of counters,
print the sum of 2 numbers
spill out as 2 set of data for graphing
use more resolution, now using 80, pixel based, more pixel for memory and for time.
plot color, on 2D, color identify the density

Other images for testing

memory access pattern, (8k by 8k)
understand the behavior based on the different graph. printout the loop iterations, graph that?

take close look at the fusion

cache grind for the certain line of the code.
by counting access and miss in SimpleScalar.. from begin to the end of the fused loop.
graphical pixel approached to zoom into the fused loop behavior.

I/O problem - large time fraction

for the large image.
can get a better performance for I/O

Lab work
2 large FPGA board
look at clear system installation
try the equipment, and make sure it's working

Wednesday, January 19, 2011

Ideas for the next-step work

at 8:11 AM Labels: Research 0 comments

parallelize for JasPer 9/7 portion
intel asm verctorization
modify simplescalar to create 3-D memory access pattern graph

Thursday, January 13, 2011

lossy compression rate time, increase the rate??
comparison with the original image? [confirmed]

isolate the quantization from previous paper.

fig. 4
a) drop the comment, replace by '...'

fig.5
a)

Index term...
ieee standard index... 3-4
ieee.ca -> next ccece ->author kit...

Wednesday, January 12, 2011

Skype discussion for the paper

at 8:00 AM Labels: Research 0 comments

Section 3

slightly longer
look at the cache profile to shorten
scale the memory access graph down...[vertical should be smaller.
look at the source code for qmfb.c, colgrp, defined a constant which is 16. follow up the 1.9 does this this.
working at entire column, but 16 columns at time?? or just work on some columns not all..

Section 5

which image used in discussion
figure 7 -> graph should be smaller, font size large.
rename the color image -> PCB _orginal _half _tiled
figure 5 for presentation, remove ...
move the open bracket.
do the same thing for figure 4
retrieve pointer value... -> set
figure changes for fig2.
table 1 should be changes..
http://www.blogger.com/post-create.g?blogID=9131256253599090292
image thumbnails in the paper??
table caption goes on the top
heading for section 4 - drop all the word "initial"
section 4 heading - keep it as is...
Title - drop the word "execution"

TOMORROW
** create .zip file contains everything
phone number...

Friday, January 7, 2011

Skype call discussion

at 7:01 AM Labels: Research 0 comments

Section 2

they used 9/7
we chosen to used 5/3, but we tried 9/7

Section 3

cache behavior, instead, start in general, show cachegrind result, and simplescalar results... show the figure (memory graph 2-D). not touch frequently, data access is concentrated in certain area.
cachegrind L2 7million misses, 0.34%, ss-direct-map misses 2% (L1=L2, worst case), best=0.3%; RISC vs. CISC
later to generate the number in the memory region that has been access, to produce a 3-D bar graph. number separated by comma.

Section 4 is appropriate..

Cluster used is not the latest and greatest, with DDR3 memory, making the comment somewhere. we already got the relatively low cache missing rate.

Section 5 [me, adding material]

written comments.
confirming the number of instruction [done]
at the end of section 5.2, adding brief discussion, along with a figure to should various speedups in one paragraph. /cats/color/color2/color4/galaxy/galaxy_4(4k by 4k)/
9/7 rely on the profile, comment to the previous work [section 3.5 ??]

Experiment

galaxy 4k*4k(and 8k*8k)/color(longest execution time)/, native running with 9/7, time. (30% improvement)
profile for the galaxy 4k * 4k native.

Send

section 5 - pdf and latex
section 3 - pdf and latex

Wednesday, January 5, 2011

Skype call discussion

at 7:16 AM Labels: Research 0 comments

1. update the caption for the fused loops

caption (a) original loop with pointer index variable.
caption (b) original loop with integer index variable.
add more detail in the loop body
have some explanation in the text for the paper

2. Experiments

confirm the pointer ->integer without increment the execution time.... can compare the cachegrind result. (origianl vs. fused loop version)
native hardware run (pointer-based loop (original), and integer-based loop only)
try 7/9 configuration , and run image encoding for 7/9 and 3/5

single processor, natively, see any significantly difference

gprof for 7/9 to see the difference in cblk part, and dwt portion.