- resovled the segmetation faults in the p-cblk()
- the col_grp() function, use the Intel vector instructions
- use ASM directive to insert the assebly code in the c source code
- operating 64 bits (Quard word) instead of 32 bits at a time
- compilation: #gcc -O2 -o testcopy testcopy.c vectorcopy.s
- loop unrolling, instead of MOV, ADD(4), we do 4 MOV, and ADD(16)
- gcc could do loop unroolling with certain argument when compile, look up that
- Tasks
- - split_col_grp() mainly data copying, try to enhance
- - try gcc unrolling on jasper, and profile/record the execution time
- - try vector instructions (ASM directive)
- - publis a paper by the end of year
Friday, October 22, 2010
Skype call
Posted by
Bo
Sunday, October 17, 2010
Research Progress
- Fixed the "segmentation faults" when executing the jasper with more than 2 thread. (error was caused by the incorrect workload distributing index incrementation in 2 nested loops within the enc_cblks() routine.
- The fused loop also give some indication of performance enhancement
- try to fuse more loops if possible
Posted by
Bo
Friday, October 15, 2010
Skype Call today
- clarification on the report
- run the pc+pd 4 for multiple time, see if the result is always bad
- run the pc+pd 2 for multiple time, see if the result is always good
- quote or paraphrase the literature from other source
- segmentation fault related to the image quality?
- diving into the code to correct the seg. fault for multithreading..
- cache issue reported in the previous paper, dose the new version of japser fix the issue, updated, or changed?
- p-dwt, poor performance may caused by the cache read & write between processor => overhead
- whether it's possible to improve the dwt performance on the single core, in the c code level, with special instruction for example.
- looking the dwt 2 loops for the horizontal and vertical filtering, and come up some ideas for improvement, and discuss it in the next week meeting.
- next Tuesday will have the discussion for the course material
Posted by
Bo
Saturday, October 9, 2010
Research Progress
- talked about the p-dwt performance, hoping to get a better one
- the elec871 course reading, chapter 3 cache coherance top, and Cullar book related to chapter 5
- checked the threading in the p-dwt functions. ok
- checked the p-cblk functions.
- thread safety, run jasper with 4 threads, segmentation faults happened a lot, the run is ok with 2 threads.
- updated jasper code in cvs..
Posted by
Bo
Subscribe to:
Comments (Atom)