- resovled the segmetation faults in the p-cblk()
- the col_grp() function, use the Intel vector instructions
- use ASM directive to insert the assebly code in the c source code
- operating 64 bits (Quard word) instead of 32 bits at a time
- compilation: #gcc -O2 -o testcopy testcopy.c vectorcopy.s
- loop unrolling, instead of MOV, ADD(4), we do 4 MOV, and ADD(16)
- gcc could do loop unroolling with certain argument when compile, look up that
- Tasks
- - split_col_grp() mainly data copying, try to enhance
- - try gcc unrolling on jasper, and profile/record the execution time
- - try vector instructions (ASM directive)
- - publis a paper by the end of year
No comments:
Post a Comment