Mass storage media are currently pushing the envelope of parallel interfaces. Cable properties, connector legacies and signaling protocols have reached a point where trading off one potential problem spot against another has left no more room for technical and design navigation. Cross talk, ground bouncing and signal ringing, along with too tight timing windows have left no margins for improvement beyond ATA PI-7 (UATA-133) and, thus, the industry is confronted with the paradox that the actual storage media such as hard disk drives outperform the connectivity. Other issues like lack of hot swap capability and difficult trace routing on the PCB have done the rest to call for a radical change of guards. Within the next few weeks / months, we will experience a somewhat radical transition from parallel to serial ATA that will deliver higher speed, improved reliability and easier installation along with the introduction of some other nifty features like tagged command queuing to the commodity drive world. What are the real pitfalls of parallelism? How can serialization address the issues and what can we expect in terms of the industry supporting the new kids on the block? Time to put on those big reading glasses and the thinking cap (it may get a bit technical at times) .... Taking theory to the testlab, we have dissected the Seagate Barracuda SATA V and show you the first prototype of the next generation of drive including some mindblowing performance. Any problems with it? You are about to find out. After over 15 years, internal storage media such as hard disk and optical drives finally meet the limitations of the currently used parallel interface. Parallel interfaces, in particular, the advanced technology attachment (ATA) standard have evolved since the mid 1980s from transfer rates of 3.3 MB/sec in ATA-1 to currently 100 MB/sec in ATA PI-6, also known as Ultra ATA 100. Contrary to common belief, Ultra ATA133 is not yet fully standardized, however, finalization of the standard is well on its way and will be listed as ATA PI-7 or UATA PI Mode7 in the future. Serial and parallel connectors side by side. Not only has the data interface changed, also the power connectors have been given a new face. Best of all, there are real reasons behind this and those reasons are prime examples of extremely smart engineering. We'll have all the details later..... It is clear that parallel interfaces are pushing the limitations of what can be achieved without violating the laws of physics while maintaining a reasonable cost point. It is also clear that storage devices are the one imminent performance bottleneck in the overall system performance. Combined, these two points pose a strong argument for embarking on new technology with the goal of enabling faster mass data access and higher overall throughput. Before going into specifics, here is a short outlook into what we'll discuss in the rest of this article: Within the next few weeks, Serial ATA will be introduced to the mainstream PC at an original peak transfer rate of 150 MB/sec and a roadmap leading to up to 600 MB/sec throughput in 2007. What are the reasons to turn the back on the established parallel interface and how is the industry addressing the challenge for faster mass storage media? We'll be dissecting the technology of the old, discuss possibilities for transition or migration between standards and show the advantages of the new, shall we?. Pitfalls in Parallelism For reasons of simplicity and easier calculation of timings, we will be leaving UATA133 out of the picture and concentrate on UATA 100 or ATAPI-6. Briefly, the ATA standard employs a 16 bit wide bi-directional bus that is capable of transmitting two bytes per transaction. For a 100 MB / sec throughput, this requires a 50 MHz data rate or 50 Mbps (50 ***** per pin and second). Clock signals are subject to skew and sluggish edges, therefore, from an electrical standpoint, in most cases it is cleaner to use half the frequency with a double data rate protocol that allows transfers not only on the rising but also on the falling edge of the clock. In this particular case, this results in an actual clock frequency of 25 MHz with a cycle time of 40 ns. Let's stick with the 40 ns bus cycle time for another second. Fourty ns appears long but in reality, the important interval is the time between consecutive clock edges and that translates to ½ of a clock cycle, in this case, 20 ns. These 20 ns need to accommodate setup and hold times as well as the maximum allowable switching times. In other words, every transaction originates at the clock edge and needs to be completed within 20 ns but the data also need to be available for a certain time after the crossover point. This "Hold Time" is specified as 4.8 ns by the ATAPI-6 protocol. Likewise, there is a time interval before the clock edge during which the data are made ready for transfer, this "Setup Time" is also specified as 4.8 ns. Combined, Setup and Hold times add up to 9.6 ns within an available 20 ns hemicycle, leaving no more than 10.4 ns switching interval. Moreover, the 10.4 ns are applicable only if there is no delay in reaching the Vhigh or Vlow targets, a somewhat unrealistic assumption. In any case, given the fact that setup and hold times remain fairly constant, it is easy to see where the UATA 133 standard with its 15 ns hemi-cycle time (5.4 ns switching time) will be the fastest achievable unless some more fundamental changes are happening. The counterintuitive aspect of a narrow bus Moving from a wide bus to a narrow bus is somewhat counterintuitive when it comes to justifying high speed data transfer and overall bandwidth. In other words, if 15 ns hemi-cycle time and 33 MHz clock frequency are already problematic with respect to signal integrity, how can we expect more bandwidth from a narrower bus that would run at multiples of that frequency. In other words, we were just claiming that 15 ns are hard to maintain but now we are postulating that we can make it work at 0.333 ns hemicycle time. Needless to say that Setup and Hold times are shorter but also the maximum available switching time is reduced from 10.4 ns to 0.273 ns. Further, these numbers conform to SATA150 only which is just the beginning. But then, serial designs are shedding the chains of master-slave configurations and even more weight . Clocking Scheme and Skew Problems UltraATA uses a conventional non-interlocked (source-synchronous) clock signaling. This means that an additional clock signal acting as a strobe is sent along with the data. This source synchronous clocking is necessary because of the high propagation delays caused by cable length and trace impedance. The drawback is still that any differences in electrical properties of the traces can cause a mismatch in timing, i.e., different arrival times for data and strobe signals or even between signals running on separate data lines. This problem is generally referred to as clock skew and directly relates to the signal voltage amplitude. Fotr those concerned about not having the corr ect power adapter on their power supply, here is the good news: All SATA drives will ship with power adapters like the one shown here. Picture courtesy of Seagate 2. 3.3 V High-Low Signaling The issues with 5V used up to the UDMA 33 standards regarding clock skew caused the industry to change the standard to 3.3 V signaling. The main advantage is a more symmetrical distribution of the high-low voltages around the 1.5V trip point. Keep in mind, though, that 3.3V still means massive charges traveling down the ribbon cables 3. Cable Design Issues: Cross-Talk and Ground Bouncing vs.Ringing Each signal propagating through a data line makes the data line act like the inductor of a transformer. That is, each voltage swing generates a dynamic electromagnetic field, that, depending on cable length and proximity will induce another signal in adjacent data lines. This cross-talk adds noise to data lines and can produce errors by generating false positives or negatives simply by induction of voltage swings in data lines. Another problem with parallel pathways is the phenomenon of simultaneously switching outputs (SSO) noise. As we explained in detail in our reviews of the i845 and the SIS645 chipsets, SSO noise becomes really problematic if the majority of signals switch from high to low since this can induce ground bouncing. On the chipset level, workaround in form of dynamic bus inversion (DBI) is feasible, that is, instead of switching all bits, only the reference bit is switched simultaneously at the sender and receiver end which has the same net effect, namely, that the system does not see the reference switch but thinks that all other lines have switched. DBI, however requires an additional latency cycle and this is where the 40 ns clock cycle time starts to look really ugly. Workaround In the past, the easiest way to remedy cross-talk has been to minimize cable length. However, in most cases, this solution is simply impractical. With older UDMA cables it was still possible to custom cut the cables and add a custom connector. Our own experience with those cables (some 4 inches) has been that they greatly increase overclocking tolerance, performance and reliability of any drive. From a commercial standpoint, a more feasible solution turned out to be the addition of interlaced ground wires to add shielding between data lines. The standard ribbon cable, therefore, uses an additional 40 wires that connect through the existing seven ground wires within the 40-pin connector. This interlacing of signal, power and control lines (throttle, SMART etc. since the real control signals are, as mentioned above time-muxed over the data lines) with ground wires effectively eliminates electrical cross-talk. 4. Pandora's Box of Connector Legacy There are several caveats associated with the move to 80 wire cabling. First, the required legacy support and backward compatibility also requires the carry-over of the existing connector form factor. This, in turn, means that the physical width of the connector has not changed and now needs to accommodate 80 wires instead of 40. As a consequence, the individual wires used in the ribbons need to use a smaller caliber. Smaller caliber means higher resistance and, by extension, lower signal propagation speed. This opens up an entire new Pandora's Box of problems in that the signal potentially ramps up faster than it is propagated. Again, for easier understanding, we resort to an analogy and the most intuitive example is a sonic boom. In other words, the signal voltage ramps up faster than it can be moved away from the source of origin. This leads to differential voltages across the data lines, a situation that can further be exacerbated if the feedback circuitry shows too little voltage at the receiver end and the controller keeps pumping up the lines. The result is inflated signal amplitudes that will reflect at the end of the cable and travel back in opposite direction to collide with the next set of data. UDMA (ATA PI-4) vs. UATA (ATA PI-5) cables. It is easy to see where the additional wires necessitate a finer pitch with all electrical trade-offs described on this page. As we outlined in an earlier article, those design induced problems, that is, the trade-off between elimination of cross talk and avoidance of signal ringing are the reason for the strict designation of the individual connectors on any UATA ribbon cable as "board", "master" and "slave" that cannot be reversed without disturbing the balance and potentially damage the drive. Likewise, folding of the cables, rounded and spliced cables can be fashionable but one thing they are definitely not is electrically clean. This is why rounded cables will increase the failure rate of HDDs regardless of their esthetic value. 5. Termination Another issue playing into the signal properties of the parallel path is the termination of signals. Briefly, to avoid signal reflection at the end of a cable, the individual lines can be terminated, that is, tied to ground via a resistor that will eliminate any voltage swings above a certain level. Usually, the drive itself provides the termination and as mentioned in our earlier article, this termination is the reason why single drives need to sit on the end of the cable even if it involves longer distances. Of course, there is always the possibility to simply cut off the tail end of the cable, which solves most of the electrical problems but also has the negative side effect of eliminating the use of a master-slave 6. Tagged Command Queuing Unlike system memory, storage media are not separating command and address lines from the data bus, rather, commands are sent time multiplexed over the same bus as the data. This can cause bus contention and stalling of data transfer whenever additional commands are needed. A workaround for this situation is known as tagged command queuing. Depending on manufacturer and model, current parallel drives have a limited capability for TCQ, in the commodity market mostly the current series of IBM GXP drives are capable of TCQ. Briefly, TCQ means that the device itself can make intelligent decisions regarding the sequence of execution of tasks. That is, instead of being required to send each command by itself, the host can send an entire list of command to the drive which then can make decisions regarding the most economic order of command execution. The principle behind TCQ is that the device itself, that is, the drive, is able to make an intelligent decision in which order the data will be pulled from the platters. Without TCQ, the heads have to bounce all over the platters to access the data. With TCQ, the drive can determine, based on the head positioning, the most efficient way to access the data with minimal head movement. The simplest analogy is probably a shopping list for the supermarket. Going down the list item by item will, in most cases, be a very uneconomic way of shopping since it involves a huge amount of legwork. On the other hand, adding things to the shopping cart based on the order of how they are laid out in the store may be easier and more economic but requires checkmarks behind each item to verify that the task has been completed. The checkmark in this case is equivalent to the tagged in tagged command queuing. It is easy to see how TCQ can greatly improve the performance of a drive if the application and the operating system can take advantage of it. 7. PCB Issues Cables are one factor posing design constraints and effectively, the insufficiencies even of the best compromise are at the point where hardly any further evolution is possible. PCB designers are facing a very similar situation, in that the traces between the controller and the connectors need to be matched extremely well, regardless of whether we are looking at an integrated controller on the mainboard or else a PCI-based extra controller card. All in all, each channel uses 32 different signaling lines, that is, a two channel design requires 64 signals to be routed from the I/O controller to the connector. The specifications limit the maximum trace length to 8" with no more than 0.5" differences in trace length between transmission lines including strobe to minimize clock skew. The Transition from Parallel to Serial Signaling Transition from parallel to serial almost sounds like a misnomer, that is, the two standards are mutually exclusive and a real transition would involve intermediate steps that instead of the best of two worlds would most likely just add another can of worms to the snake pit of meandering traces. However, despite the fact that the overall concept and the design rules are totally different, there will be a transitional period. That is, we will be looking at some parallel drives using onboard translators to enable serialization and de-serialization of data at both the sender and receiver end. In addition, we will see some bridge adapters that can be used to connect older parallel drives to serial interfaces Bridge solutions will not show any performance improvements, however, it should be clear that, at least, they eliminate the electrical problems of ribbon cables and further have the advantage of easier routing, EMI suppression and improved air flow through the case. The importance of the latter point cannot be stressed enough, since, as we reported some 5 years ago, local heating of individual memory chips on a DIMM caused by obstruction of airflow by a ribbon cable can have detrimental influences on the overall speed of a memory module. In addition, every other low life mainboard manufacturer will claim the invention of bridge solutions, names like seriallel and other nonsense have been popping up like mushrooms in the infomercials sent out in form of press releases True Serial ATA Aside from the neater design of serial cables, what is the new technology that will conquer the mass storage world? As the name indicates, the new interface will be serial and capable of 150 MB throughput. Since we are looking at a point to point connectivity, the required operating frequency will be 150 MHz x 8 bit and if we add a 20 % overhead for cyclic redundancy check and other error correction mechanisms, we'll end up with 1.5 Gbit /sec or 1.5 GHz operating frequency. How is this high frequency possible? . Embedded Clocking Scheme Instead of relying on an external clock, with an elaborate clock forwarding scheme and synchronization of data and strobe signals, Serial ATA is using an "embedded" clock meaning that the data themselves act as clock signal or rather as synchronizers for the internal receivers at both ends of the transmission. One potential problem with this scheme is that during periods of no data transfer, the system could get out of sync. This issue can be completely avoided by sending a dummy signal consisting of a regular IOIOIOI pattern. The result is a high speed interface with internal clocks at both ends that are synchronized by the data flow. Because there are no parallel paths, clock skew is simply impossible. . Low Voltage Differential Signaling As we outlined earlier, many of the Parallel ATA problems stem from the 15 years of legacy of a standard 5V-tolerant 3.3V signal protocol. More modern signaling methods are using LVDS, that is, a pair of wires is used to send two voltage signals and the voltage potential between the two wires is the actual data. Since we are no longer talking about reaching a cross-over or trip point with the necessary overhead to ensure signal integrity, ultra-low voltages can be applied which allow to greatly speed up the signaling scheme, for the simple reason that the number of electrons that are pumped into / received from the wires is much lower and therefore the transaction is faster. What is needed, though, for LVDS is a DC Bias, that is a reference voltage. The two differential signals are swinging 0.125V above and below. In detail, the reference bias is around 500 mV with +0.125V amplitudes and nominal Vmin and Vmax points of 400 and 600 mV. I mentioned it earlier that any kind of differential signaling is prone to bias shifts that can occur in periods of no-transfers or else if the majority of signals are either high or low. To compensate, the earlier mentioned dummy signals (IOIOIOIO ..... ) are used to re-adjust the DC bias. A "drab" red cable next to a metallic Cerise shade, flat vs. dual pairs, when will we see glow in the dark SATA cables? Note the L-shaped connector opening and the additional key on the right of the connector which make it impossible to reverse the orientation 3. Cabling If only four transmission lines are needed for a full duplex (independent unidirectional upstream and downstream) signaling, and the additional ground pins can be used for shielding of the four data wires, we are looking at a reduction of total wires by a factor of 20 compared to the 80 wires in the UATA cables. Realistically, this is not entirely true since instead of parallel ground wires, the shielding is accomplished by coaxial wrapping of grounds around the data wires. Nonetheless, there are no imminent space constraints and, therefore, the diameter of the individual leads can be increased to reduce the resistance and impedance of the cables and, by extension, speed up signal propagation. The latter is the really important factor, since signaling frequency increases by a factor of 30 even in the first implementation of Serial ATA. Another important aspect of the SATA cabling is that there are hardly any length constraints. Officially, SATA cable length is limited to 1m, however we have seen cables as long as 2.5m running in demo systems without any measurable performance loss. Last not least, it is no longer necessary to destroy signal integrity with rounded IDE cables for esthetic or air flow reasons. Serial ATA cables are sleek, flexible and offer a whole new wealth of cool designs 4. New Smart Connectors Enable Hot Plug Capabilities Nobody who has ever worked with the parallel ribbon cables has not cursed about the connectors, broken fingernails and bent pins as well as simple cuts from trying to connect the drive in a crowded environment are probably the most under-represented reasons for the move to a new interface. Schematic drawing of the data connector on cable and device (or board). There is only one way to insert the cable and regardless of whether it is done fast or slow, the ground connectors will always touch first and establish electrostatic equilibrium across the board. The new SATA connectors have a number of little devils in the detail that make them much more than meets the eye. First of all, the connectors are keyed so there is no possibility to accidentally reverse the cables, input will always connect to output and vice versa but that is only the beginning. The SATA interface uses what is called a staggered pin design, meaning that there are two different lengths of pin used, long and short ones. All in all, this offers a total of three different combinations, i.e., long on long, long on short and short on short. Since the way of plugging the cable into the device is always the same, this can be used to generate a temporal sequence of connectivity, that is, the long on long pins always connect first, followed by the long on short and the short on short pins are always going to be the last connections closed. The biggest foe of any hot plugging scheme of electronic components is electrostatic discharge which can effectively zap any device into oblivion. The countermeasure is usually grounding and as simple as it appears, the idea to use long on long pin conections for grounding both devices is nothing short of brilliant since these connections will always be shorted eons (in electron years) before the data and power wires connect. In other words, whenever a drive is connected to a receiver, the first thing that is established is the electrostatic equilibrium necessary to protect the data lines. As soon as power is established, the drive will be able to run through its initialization sequence and establish a handshake with the host. The real beauty is that this can be accomplished without power-down of the host, in other words, we have hot swap capabilities and it is all made possible through an extra mm of copper. Seagate Barracuda V Serial ATA is being embraced by the entire hard disk drive industry with the exception of IBM who have no plans of moving into serial interface technology at this point. WesternDigital, Maxtor and Fujitsu all have pre-production drives ready for showcasing, however, all three companies currently use original parallel drives that were upgraded to feature a bridge adapter built into the drive. The only pure SATA solution on the horizon is currently the Seagate Barracuda V. The "Cuda V" is the first true SATA 150 drive that will hit the market. The 4 pin header next to power and data interface is there to limit the drive capacity to 32 GBytes. Compared to most drives, the Cuda looks and feels somewhat different, that is, more like a safe deposit box than a HDD. For noise reduction and shock / vibration protection, a foam layer is added at the bottom of the drive, that is, between the drive itself and the "SeaShield". The only problem we see with this is that the drive is running quite hot. Otherwise, it really is amazingly quiet. On the left, the staggered pin formation of both power and data connector is visible Features 6th Generation 7200 rpm drive 60 GB / platter Formatted drive capacities of 60, 80, 120 GB SATA interface on 80 and 120 GB models 2 MB buffer (8 MB on SATA models) 100 % fluid dynamic bearings motor for ultra quiet operation 350 G no-Op shock resistance Seagate 3D Defense system Drive Defense SeaShell SeaShield G-Force Protection Data Defense ECC Safe Sparing Continuous Background Defect Scanning (CBDS) Rotational Vibration Diagnostic Defense SeaTools Web-based Tools Enhanced SMART Barracuda V Performance Barracudas are very fast, at least in their native habitat, the ocean. Seagate's Baracudas have an equally fast reputation but we are stuck here with a preproduction engineering sample that still may need some tuning to deliver optimal performance. When we received the drive from Seagate, we had to promise we would not run any benchmarks or at least not publish them. Well, we lied ..... Let the games begin! To match the drives size differences, both drives were partitioned with a 3 GB primary DOS partition and a separate swapfile in a 1 GB logical drive on the Extended DOS partition. Promise PDC 20367 SATA-150 controller on the ASUS A7V8X We start out with HDTach to get some basic information on the drive. The sustained transfers are quite impressive and about the same level with the IBM 120 GXP Deskstar but we are missing the 150 MB/sec transfers that the drive should theoreticlally deliver. Keep in mind that the transfers will be capped by the 133 MB/sec limitation of the PCI bus. Most likely, with some more tuning, we will see some higher burst performance in the near future As we emphasized earlier, HDTach is similar to SiSoft Sandra, just streaming through the platters does not necessarily give any indication of the real world performance of a drive. So once again, we resorted to our all time classic: WinBench98 to check what is really going on. WinBench98 Business Performance Some of the numbers we were getting here appeared somewhat unbelievable and we re-ran every test multiple times to make sure that there were no misreportings. It did not help, the results did not change. The results for the Maxtor D740X-6L are within the same range as what we found in our earlier review using a different platform Throughput in 1000 Bytes/sec: The three candidates are the Barracuda SATA V, and the Maxtor D740X-6L using either the on-board VIA IDE controller or else the Promise PDC 20367 onboard ATA / SATA controller. The Barracuda V literally destroys the competition and this pattern will persist throughout the entire rest of benchmarks No comment necessary. The numbers speak for themselves Conclusions Currently, Seral ATA is still in the waste lands, there are still issues between controllers and different drive configurations, some of these issues appear reporting errors of the software used rather than real performance issues but overall, the new technology looks extremely impressive and promising. The results from HDTach are not exactly mind blowing, timing marginalities could play into those numbers as well as some other issues One issue that might be relevant for the performance or lack thereof in HDTach measurements of the burst rate is that for the first time, we have a drive and interface that is actually faster than the downstream PCI bus. In other words, a possible scenario is that the drive sends data packets to the controller / PCI bus, fills up the buffers there and sends the next string of data. Everything happens at 150 MB/sec. Potentially, and this is purely speculative, the subsequent burst cannot be handled by the controller because it is still busy trying to funnel the data through the bottleneck of the PCI bus. The consequence would be a retry. By extension, if we spin this a bit further, we would come to expect successful transactions alternating with retries caused by full buffers. This might be a possibility to explain the roughly 50% utilization of the bus only when at the same time, other, nominally slower drives will achieve higher performance. Again, this is pure speculation, without bus analysis tools we cannot answer the question. Another possibility would have been to repeat the HDTach measurement using a 66 MHz PCI card, however, unfortunately, at this time we did not have any external 66 MHz PCI-based SATA controller to test our hypothesis. http://www.lostcircuits.com/advice/sata150/shuttlesata.jpg Regardless of whether the hypothesis outlined above is valid or not, the topic opens another issue which concerns the saturation of the PCI bus even with 150 MB/sec transfers. This calls for an independent data bus or else a 66 MHz PCI bus as implemented in the AMD MPX platform. In other words, will SATA 150 or 300 be useless for the average consumer? The answer is: Absolutely Not! The reason is buried in the WinBench results. The highest transfer rates that we get are in the order of 57 MB/sec. Of course, those are average numbers and cut off the peak transfer rates but the takehome message is that burst rates are burst rates and even the fastest drives cannot fill up the buffers at the same rate as the buffer can flush the data. Therefore, there will always be enough headroom, at least in the near future. In the far future, that is sometime next year, we will see SATA controllers with dedicated lines to the NorthBridge like Intel's ICH5 that are running completely uncoupled from the PCI bus. nVidia will have similar technology in their Hammer chipsets as will probably everybody else. Finally, the last question is whether it is worth migrating to SATA with the new chipsets / boards that are just hitting the shelves and the answer is one unambiguous yes. Acknowledgements: A number of individuals have been extremely helpful in the making of this article: Marc Noblitt, Seagate; Mark Jackson, Maxtor; Larry Li and Craig Lyons, Promise Technology; Sumit Puri, Fujitsu and most importantly, Joni Clark, Seagate. Apologies to those whose names have slipped my mind, I know there should be some 20 more acknowledgements and, closing my eyes, I see the faces but cannot remember the names.