Y86代写

assembly | 代做Algorithm – 该题目是一个常规的汇编的练习题目代写, 包括了assembly/Algorithm等方面

SP20 CIS341 final exam

Instructions

There are 16 questions, total 99 points.
- 1 question on C to Y86 (12 points)
- 1 question on cache memory (15 points)
- 1 question on data hazards (15 points)
- 1 question on RAID performance (12 points)
- 1 question on virtual memory (15 points)
- 1 question on instruction processing (10 points)
- 10 short questions (2 points each)
The exam is closed-book, closed-notes.
Although this is an online exam, you should treat it identical to an on-site exam.
You must be in the pre-designated Zoom room during the exam.
If you are using any scratch papers, show them through Zoom first when you begin your exam.
You may refer to the appendix provided for the first midterm (found in the exam content area).
Failure to be monitored through Zoom will be regarded as academic dishonesty, and will result in an automatic zero for the exam.

Question set 1: C to Y

The Euclidean Algorithm finds the greatest common divisor (GCD) between two numbers. Described in circa 300BC, it is one of the oldest algorithms. Fascinating!

Match the Y86 instruction for each blank so that the complete assembly program is functionally identical to the GCD function in C code. Assume that the two inputs to the GCD function, p and q, are non- negative. [4 points for each correct match, -2 points for each incorrect match, minimum 0 points for this problem]

// C program long gcd(long p, long q){ if (p==q){ return q; } else if (p>=q){ gcd(q, p-q); } else { gcd(q, p); } }

Y86 program

gcd: rrmovq %rdi, %rax subq %rsi, %rax jne gcd_elif rrmovq %rsi, %rax ret gcd_elif: ———————– (a) rrmovq %rsi, %rdi rrmovq %rax, %rsi call gcd ret gcd_else: rrmovq %rsi, %rax ———————– (b) ———————– (c) call gcd ret

Question set 2: Cache hit/miss and coherence

Consider the following multi-core architecture. Each core has its own private cache, and both caches are connected to the main memory. Both caches have the same configuration: 16B block size, 16 sets, and 1 way per set (direct-mapped). They also implement a write-back policy on a write hit and write-allocate policy on a write miss.

Now, consider a series of memory accesses from the cores to its respective cache. Each memory reference is labeled so that it indicates the core, the type (load or store), the address, and the size. For example,

A: L 0x0, 4

means that core A is loading 4 bytes from address 0x0.

For the given sequence of accesses below, indicate whether or not each access will result in either a 1) cache hit, 2) cache miss and data shared by the other cache, or 3) cache miss and data read from memory. (Hint: In addition to thinking of cache hits and misses, think of cache coherence protocol.) [ points for each correct result, no incorrect penalty]

Assume that all addresses are physical addresses and caches are initially empty.

A: L 0x0300, 4 B: L 0x0308, 4 B: L 0x0100, 4 A: S 0x0304, 4 B: S 0x030c, 4

Question set 3: Data hazard

Consider a 5-stage single-issue pipelined Y86 processor that goes through fetch, decode, execute, memory, and write-back stages. The processor is able to detect data hazards and implements both stalling (and inserting bubbles) and data forwarding.

For the following example sequence of three instructions, each instruction is annotated whether or not it will detect a data hazard during its pipelined processing.

irmovq $10, %rax # No data hazard detected mrmovq (%rax), %rbx # Data hazard detected with previous irmovq instruction addq %rcx, %rdx # No data hazard detected

Determine whether or not there is a data hazard for the following sequence of instructions. Assume that the pipeline is initially empty. [3 points for each correct answer, no incorrect penalty]

irmovq $16, %rax addq %rax, %rbx mrmovq (%rbx), %r addq %rax, %rcx pushq %rax

Question set 4: RAID 5

Consider a RAID 5 (rotating parity) system with 4 HDDs in total. Each HDD is of the same model and make with a large sequential read/write performance of 80MB/s, and a small random read/write performance of 60 IOPS.

What are the expected performance numbers for the RAID 5 system? [3 points for each correct expected performance, -1.5 point for each incorrect answer, minimum 0 points for this problem]

Question set 5: Virtual memory

Consider the following virtual memory system:

256B page size
Direct-mapped, 4-entry TLB
12 – bit virtual address; 11-bit physical address

The initial states of the TLB and page table are as follows.

Furthermore, the following is the list of recently used physical pages, from most recent (just accessed) to least recent (haven’t accessed in a long time). The page fault handler evicts the least recently used (LRU) page on a page fault.

Physical page number 7 (most recent)
Physical page number 5
Physical page number 3
Physical page number 4
Physical page number 0
Physical page number 2
Physical page number 6
Physical page number 1 (least recent)

For the given sequence of virtual address accesses below, indicate whether or not each will result in either a 1) TLB hit, 2)TLB miss, or 3) page fault. [3 points for each correct result, no incorrect penalty]

Virtual address 0x4FC

Virtual address 0x
Virtual address 0x
Virtual address 0x
Virtual address 0xF

Question set 6: Y86 instruction processing

Among the listed steps involved in instruction processing below, select all the required ones for the following Y86 instruction. [? points for correct selections, -1.25 point for incorrect selection]

subq rA, rB # add

Read register(s) from register file
Perform ALU operation
Set condition code
Determine whether or not to jump
Write to memory
Read from memory
Write value from ALU to register
Write value from memory to register

Question set 7: Short questions

Which of the following statement is NOT true about floating-point representation? [2 points for the correct choice, no penalty for the incorrect choice]

IEEE (Institute of Electrical and Electronics Engineer) established a standard (IEEE 754) as a uniform standard for floating-point representation in 1985. Prior to this, two different computers may represent the same real number (such as the Euler’s number, 2.71828…) in different formats.
Floating-point computations are associative. That is, given three floating-point numbers a, b, and c, (a+b)+c = a+(b+c).
Floating-point numbers are represented with three fields: a sign bit that represents if the number is positive or negative, an exp field that encodes the exponent, and a frac field that encodes the mantissa (also known as significant).
A floating-point number can be either in the denormalized form to represent small numbers, normalized form to represent larger numbers, or special form to represent values such as infinity.

The execution time of 80% of the program can be accelerated by a factor of 4. What is the overall speed- up? [2 points for the correct choice, no penalty for the incorrect choice]

Which of the following is NOT a reason why we use virtual memory? [2 points for the correct choice, no penalty for the incorrect choice]

Virtual memory allows us to efficiently scale programs with respect to the number of cores.
Virtual memory makes it easy for multiple programs to share physical memory.
Virtual memory provides the illusion of large memory even when we don’t have that much physical memory.
Virtual memory isolates memory space for each process, ensuring that one process does not interfere with another.

Which of the following is NOT true about the ISA (instruction set architecture)? [2 points for the correct choice, no penalty for the incorrect choice]

The ISA is the SW-HW interface that, to the software programmer, states the necessary information to write programs on the hardware (processor).
Processor manufacturers such as Intel often make ISA backwards-compatible so that older programs can still run on newer processors.
The ISA exports a set of operations that can run on the processor and a set of programmer- visible states.
Processor manufacturers typically allow modification of the ISA by installing software packages.

Which of the following is NOT true about integer representation in computers? [2 points for the correct choice, no penalty for the incorrect choice]

Adding two positive integer numbers will always result in a greater positive number.
Given limited number of bits, there is a maximum value that the number type can represent. For example, int type range from -2,147,483,648 to 2,147,483,647.
Compared to sign-magnitude or one’s complement representation, two’s complement representation is widely used because arithmetic operations such as addition are easier to implement.
Integer arithmetic is associative. That is, given three integer numbers a, b, and c, (a+b)+c = a+(b+c).

Which of the following is NOT true about a RAID system? [2 points for the correct choice, no penalty for the incorrect choice]

RAID0 (striping) cannot tolerate any failure.
RAID1 (mirroring) is the most cost-effective solution in terms of capacity per dollar.
RAID4 (parity disk) cannot recover from two disk failures.
RAID5 (rotating parity) has higher performance compared to RAID4 (parity disk).

Let’s say that the function foo calls another function bar for it to handle a specific task within the program. When bar finishes, the control is handed back to foo. How is this implemented? [2 points for the correct choice, no penalty for the incorrect choice]

When foo calls bar, the return address is pushed onto the stack, and when bar returns, the return address popped from the stack is loaded into the PC (program counter).
When foo calls bar, the return address is stored into one of the registers, and when bar returns, the return addressed read from the register is loaded into the PC (program counter).
When foo calls bar, the trap handler invokes the operating system that records the return address to foo; when bar returns, the trap handler is again invoked to have the operating system load the return address into the PC (program counter).
When the program is compiled, the compiler places the code for bar in such a way that the call to bar and returning back to foo is trivial.

In lecture, we’ve seen an example of how a 4 – bit parity can protect 8-bits of data. In such case, encoding can correct up to 1-bit error. One of the more popular Hamming ECC is used in DRAM, where 7-bit parity is added to protect 64-bit of data. This totals in 9 bytes of word. How many extra bytes would we need to protect 128 bytes of data from a single bit error? [2 points for the correct choice, no penalty for the incorrect choice]

Which of the following is NOT true about flash memory? [2 points for the correct choice, no penalty for the incorrect choice]

A piece of data in flash memory cannot be overwritten in-place.
Flash memory is non-volatile and retains data even after a power loss.
Flash memory is faster than DRAM.
Although flash memory-based SSDs and HDDs share the same interface, their internal implementation are widely different.

Which of the following is NOT a correct statement about exceptions? [2 points for the correct choice, no penalty for the incorrect choice]

Exceptions can be ignored by the user process.
System calls, faults, and interrupts are share the same mechanism in handling the event through the exception table.
Exceptions are handled by the kernel running in supervisor mode.
Exceptions are raised when the hardware detects an anomalous condition and requires special processing.

Which of the following is TRUE about calling conventions? [2 points for the correct choice, no penalty for the incorrect choice] In the case where foo calls bar, foo is the caller and bar is the callee.

The callee needs to save and restore registers that it plans to use so that the caller would see the same register values between calls.
Regardless of ISA, the caller always passes arguments by writing them to the stack.
Using caller-saved registers over callee-saved registers results in fewer instructions.
The callee should only use callee-saved registers, and should not modify caller-saved registers.

Which of the following is NOT true about high-level languages such as C? [2 points for the correct choice, no penalty for the incorrect choice]

High-level languages can be directly interpreted by the processor.
High-level languages abstract low-level details such as the ISA (instruction set architecture) of the processor.
High-level languages, compared to low-level languages such as assembly, are easier to express algorithmic ideas.
High-level languages are easier to read and understand compared to assembly language.

Which of the following is NOT true about memory hierarchy? [2 points for the correct choice, no penalty for the incorrect choice]

CPU caches that need fast response time use a deterministic way in which data is placed and looked-up to achieve fast hit times. However, software-managed caches such as browser cache, page cache, and web cache use sophisticated algorithms to reduce the miss rate because of high miss penalties.
Memory hierarchy places the small but fast memory closer to the processor, and the large but slow memory further away – this organization works because of programs typically exhibit locality.
Programs benefit from greater spatial locality when the unit of memory access (cache block, memory page) size increases.

Random access pattern greatly benefits from caches.

Snooping cache coherence protocol is a way in which data across different processors are kept coherent. Which of the following violates the protocol? [2 points for the correct choice, no penalty for the incorrect choice]

Data is in neither processor A nor processor B.
Data is in a shared state in processor A, but in a modified state in processor B.
Data is in a shared state for both processors A and B.
Data is in a modified state for processor A, but in an invalid state for processor B.

Which of the following describes the advantage of increasing the page size in the virtual to physical address translation? [2 points for the correct choice, no penalty for the incorrect choice]

A larger page size would reduce the size of the page table.
A larger page size would exploit temporal locality better.
A larger page size would reduce the occurrence of thrashing.
A larger page size increases the virtual address space.

Which of the following is NOT a reason why computer systems separate untrusted processes and trusted processes? [2 points for the correct choice, no penalty for the incorrect choice]

To ensure fairness among running processes
To protect the system from malicious programs
To limit the way in which system-wide resources are arbitrated
To isolate untrusted code because it contains more bugs than trusted code

Which of the following best-describes a TLB (translation lookaside buffer)? [2 points for the correct choice, no penalty for the incorrect choice]

TLB is a hardware cache that keeps frequently used page table entries.
TLB stands for Tomato, Lettuce, and Bacon sandwich.
TLB is not performance-critical, and is often not found in modern systems.
TLB caches virtual addresses, and is managed by the operating system.

Which of the following is TRUE about instruction pipelining? [2 points for the correct choice, no penalty for the incorrect choice]

Pipelining increases the overall throughput of the processor, but does not necessarily decrease the time it takes to process each instruction.
Hazards are typically identified by the compiler, and nop (instruction that does nothing) is inserted to avoid them.
Program behaviors are highly predictable and it is common to see 100% prediction accuracy for handling control hazards.
The pipeline depth scales with Moore’s Law, and the number of pipeline stages in modern processors have grown exponentially.

On a system with 32-bit addressing, 8KB page size, and 8 bytes page table entries, what is the size of the page table? Assume we use a single-level page table. [2 points for the correct choice, no penalty for the incorrect choice]

Which of the following is TRUE about hard disk drives? [2 points for the correct choice, no penalty for the incorrect choice]

HDDs internally have multiple arms that independently move and access data.
Hard disk drives are susceptible to noise and vibration.
Hard disk drives internally contain multiple spinning DVDs to record data.
The advances in technology allow modern hard disk drives to have negligible latency.

Posts

Y86代写 | assembly | 代做Algorithm – Y86代写

Y86代写

Y86 program

Virtual address 0x4FC