더프의 장치는 어떻게 작동합니까?
나는 더프의 장치에서 Wikipedia 의 기사를 읽었으며 그것을 얻지 못했습니다. 나는 정말로 관심이 있지만, 거기에 대한 설명을 몇 번 읽었지만 여전히 더프의 장치가 어떻게 작동하는지 알 수 없습니다.
더 자세한 설명은 무엇입니까?
다른 곳에서 좋은 설명이 있지만 시도해 보겠습니다. (이것은 화이트 보드에서 훨씬 쉽습니다!) 다음은 몇 가지 표기법이있는 Wikipedia 예입니다.
20 바이트를 복사한다고 가정 해 봅시다. 첫 번째 패스에 대한 프로그램의 흐름 제어는 다음과 같습니다.
int count; // Set to 20
{
int n = (count + 7) / 8; // n is now 3. (The "while" is going
// to be run three times.)
switch (count % 8) { // The remainder is 4 (20 modulo 8) so
// jump to the case 4
case 0: // [skipped]
do { // [skipped]
*to = *from++; // [skipped]
case 7: *to = *from++; // [skipped]
case 6: *to = *from++; // [skipped]
case 5: *to = *from++; // [skipped]
case 4: *to = *from++; // Start here. Copy 1 byte (total 1)
case 3: *to = *from++; // Copy 1 byte (total 2)
case 2: *to = *from++; // Copy 1 byte (total 3)
case 1: *to = *from++; // Copy 1 byte (total 4)
} while (--n > 0); // N = 3 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it is)
}
이제 두 번째 패스를 시작하면 표시된 코드 만 실행합니다.
int count; //
{
int n = (count + 7) / 8; //
//
switch (count % 8) { //
//
case 0: //
do { // The while jumps to here.
*to = *from++; // Copy 1 byte (total 5)
case 7: *to = *from++; // Copy 1 byte (total 6)
case 6: *to = *from++; // Copy 1 byte (total 7)
case 5: *to = *from++; // Copy 1 byte (total 8)
case 4: *to = *from++; // Copy 1 byte (total 9)
case 3: *to = *from++; // Copy 1 byte (total 10)
case 2: *to = *from++; // Copy 1 byte (total 11)
case 1: *to = *from++; // Copy 1 byte (total 12)
} while (--n > 0); // N = 2 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it is)
}
이제 세 번째 단계를 시작하십시오.
int count; //
{
int n = (count + 7) / 8; //
//
switch (count % 8) { //
//
case 0: //
do { // The while jumps to here.
*to = *from++; // Copy 1 byte (total 13)
case 7: *to = *from++; // Copy 1 byte (total 14)
case 6: *to = *from++; // Copy 1 byte (total 15)
case 5: *to = *from++; // Copy 1 byte (total 16)
case 4: *to = *from++; // Copy 1 byte (total 17)
case 3: *to = *from++; // Copy 1 byte (total 18)
case 2: *to = *from++; // Copy 1 byte (total 19)
case 1: *to = *from++; // Copy 1 byte (total 20)
} while (--n > 0); // N = 1 Reduce N by 1, then jump up
// to the "do" if it's still
} // greater than 0 (and it's not, so bail)
} // continue here...
이제 20 바이트가 복사됩니다.
참고 : 원래의 더프 장치 (위 그림 참조)는 to주소 의 I / O 장치로 복사되었습니다 . 따라서 포인터를 늘릴 필요가 없습니다 *to. 두 개의 메모리 버퍼 사이를 복사 할 때는을 사용해야 *to++합니다.
Dr. Dobb 's Journal 의 설명은 내가 주제에서 찾은 최고입니다.
이것은 나의 AHA 순간입니다.
for (i = 0; i < len; ++i) {
HAL_IO_PORT = *pSource++;
}
된다 :
int n = len / 8;
for (i = 0; i < n; ++i) {
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
HAL_IO_PORT = *pSource++;
}
n = len % 8;
for (i = 0; i < n; ++i) {
HAL_IO_PORT = *pSource++;
}
된다 :
int n = (len + 8 - 1) / 8;
switch (len % 8) {
case 0: do { HAL_IO_PORT = *pSource++;
case 7: HAL_IO_PORT = *pSource++;
case 6: HAL_IO_PORT = *pSource++;
case 5: HAL_IO_PORT = *pSource++;
case 4: HAL_IO_PORT = *pSource++;
case 3: HAL_IO_PORT = *pSource++;
case 2: HAL_IO_PORT = *pSource++;
case 1: HAL_IO_PORT = *pSource++;
} while (--n > 0);
}
더프의 장치에는 두 가지 중요한 사항이 있습니다. 첫째, 내가 이해하기 쉬운 부분이라고 생각되는 루프는 풀립니다. 이것은 루프의 완료 여부를 확인하고 루프의 맨 위로 점프하는 것과 관련된 오버 헤드를 피함으로써 더 큰 코드 크기를 더 빠른 속도로 교환합니다. CPU는 점프 대신 직선 코드를 실행할 때 더 빠르게 실행될 수 있습니다.
The second aspect is the switch statement. It allows the code to jump into the middle of the loop the first time through. The surprising part to most people is that such a thing is allowed. Well, it's allowed. Execution starts at the calculated case label, and then it falls through to each successive assignment statement, just like any other switch statement. After the last case label, execution reaches the bottom of the loop, at which point it jumps back to the top. The top of the loop is inside the switch statement, so the switch is not re-evaluated anymore.
The original loop is unwound eight times, so the number of iterations is divided by eight. If the number of bytes to be copied isn't a multiple of eight, then there are some bytes left over. Most algorithms that copy blocks of bytes at a time will handle the remainder bytes at the end, but Duff's device handles them at the beginning. The function calculates count % 8 for the switch statement to figure what the remainder will be, jumps to the case label for that many bytes, and copies them. Then the loop continues to copy groups of eight bytes.
The point of duffs device is to reduce the number of comparisons done in a tight memcpy implementation.
Suppose you want to copy 'count' bytes from a to b, the straight forward approach is to do the following:
do {
*a = *b++;
} while (--count > 0);
How many times do you need to compare count to see if it's a above 0? 'count' times.
Now, the duff device uses a nasty unintentional side effect of a switch case which allows you to reduce the number of comparisons needed to count / 8.
Now suppose you want to copy 20 bytes using duffs device, how many comparisons would you need? Only 3, since you copy eight bytes at a time except the
last
first one where you copy just 4.
UPDATED: You don't have to do 8 comparisons/case-in-switch statements, but it's reasonable a trade-off between function size and speed.
When I read it for the first time, I autoformatted it to this
void dsend(char* to, char* from, count) {
int n = (count + 7) / 8;
switch (count % 8) {
case 0: do {
*to = *from++;
case 7: *to = *from++;
case 6: *to = *from++;
case 5: *to = *from++;
case 4: *to = *from++;
case 3: *to = *from++;
case 2: *to = *from++;
case 1: *to = *from++;
} while (--n > 0);
}
}
and I had no idea what was happening.
Maybe not when this question was asked, but now Wikipedia has a very good explanation
The device is valid, legal C by virtue of two attributes in C:
- Relaxed specification of the switch statement in the language's definition. At the time of the device's invention this was the first edition of The C Programming Language which requires only that the controlled statement of the switch be a syntactically valid (compound) statement within which case labels can appear prefixing any sub-statement. In conjunction with the fact that, in the absence of a break statement, the flow of control will fall-through from a statement controlled by one case label to that controlled by the next, this means that the code specifies a succession of count copies from sequential source addresses to the memory-mapped output port.
- The ability to legally jump into the middle of a loop in C.
1: Duffs device is a particular implimentation of loop unrolling. What is loop unrolling?
If you have an operation to perform N times in a loop you can trade program size for speed by executing the loop N/n times and then in the loop inlining (unrolling) the loop code n times e.g. replacing:
for (int i=0; i<N; i++) {
// [The loop code...]
}
with
for (int i=0; i<N/n; i++) {
// [The loop code...]
// [The loop code...]
// [The loop code...]
...
// [The loop code...] // n times!
}
Which works great if N % n == 0 - no need for Duff! If that is not true then you have to handle the remainder - which is a pain.
2: How does Duffs device differ from this standard loop unrolling?
Duffs device is just a clever way of dealing with the remainder loop cycles when N % n != 0. The whole do / while executes N / n number of times as per standard loop unrolling (because the case 0 applies). On the last run through the loop (the 'N/n+1'th time) the case kicks in and we jump to the N % n case and run the loop code the 'remainder' number of times.
Though I'm not 100% sure what you're asking for, here goes...
The issue that Duff's device addresses is one of loop unwinding (as you'll no doubt have seen on the Wiki link you posted). What this basically equates to is an optimisation of run-time efficiency, over memory footprint. Duff's device deals with serial copying, rather than just any old problem, but is a classic example of how optimisations can be made by reducing the number of times that a comparison needs to be done in a loop.
As an alternative example, which may make it easier to understand, imagine you have an array of items you wish to loop over, and add 1 to them each time... ordinarily, you might use a for loop, and loop around 100 times. This seems fairly logical and, it is... however, an optimisation can be made by unwinding the loop (obviously not too far... or you may as well just not use the loop).
So a regular for loop:
for(int i = 0; i < 100; i++)
{
myArray[i] += 1;
}
becomes
for(int i = 0; i < 100; i+10)
{
myArray[i] += 1;
myArray[i+1] += 1;
myArray[i+2] += 1;
myArray[i+3] += 1;
myArray[i+4] += 1;
myArray[i+5] += 1;
myArray[i+6] += 1;
myArray[i+7] += 1;
myArray[i+8] += 1;
myArray[i+9] += 1;
}
What Duff's device does is implement this idea, in C, but (as you saw on the Wiki) with serial copies. What you're seeing above, with the unwound example, is 10 comparisons compared to 100 in the original - this amounts to a minor, but possibly significant, optimisation.
Here's a non-detailed explanation which is what I feel to be the crux of Duff's device:
The thing is, C is basically a nice facade for assembly language (PDP-7 assembly to be specific; if you studied that you would see how striking the similarities are). And, in assembly language, you don't really have loops - you have labels and conditional-branch instructions. So the loop is just a part of the overall sequence of instructions with a label and a branch somewhere:
instruction
label1: instruction
instruction
instruction
instruction
jump to label1 some condition
and a switch instruction is branching/jumping ahead somewhat:
evaluate expression into register r
compare r with first case value
branch to first case label if equal
compare r with second case value
branch to second case label if equal
etc....
first_case_label:
instruction
instruction
second_case_label:
instruction
instruction
etc...
In assembly it's easily conceivable how to combine these two control structures, and when you think of it that way, their combination in C doesn't seem so weird anymore.
Just experimenting, found another variant getting along without interleaving switch and loop:
int n = (count + 1) / 8;
switch (count % 8)
{
LOOP:
case 0:
if(n-- == 0)
break;
putchar('.');
case 7:
putchar('.');
case 6:
putchar('.');
case 5:
putchar('.');
case 4:
putchar('.');
case 3:
putchar('.');
case 2:
putchar('.');
case 1:
putchar('.');
default:
goto LOOP;
}
참고URL : https://stackoverflow.com/questions/514118/how-does-duffs-device-work
'IT' 카테고리의 다른 글
| 구속 조건을 동시에 만족시킬 수 없으며 구속 조건을 해제하여 복구를 시도합니다. (0) | 2020.06.23 |
|---|---|
| Redis에서 키 수 인쇄 (0) | 2020.06.23 |
| 한 벡터를 다른 벡터로 복사하는 빠른 방법 (0) | 2020.06.23 |
| 댓글 바로 가기 Android Studio (0) | 2020.06.23 |
| Rails에서 DB 사용자 이름, 비밀번호, 데이터베이스 이름을 얻을 수 있습니까? (0) | 2020.06.23 |