I agree that we should get a technology forum.
Maybe then I could learn to understand half of what's been written in this thread.
Maybe then I could learn to understand half of what's been written in this thread.
A project that harnesses the spare processing power of Sony's PlayStation 3 (PS3) to help understand the cause of diseases has entered the record books.
Guinness World Records has recognised folding@home (FAH) as the world's most powerful distributed computing network.
FAH has signed up nearly 700,000 PS3s to examine how the shape of proteins affect diseases such as Alzheimer's.
The network has more than one petaflop of computing power - the equivalent of 1,000 trillion calculations per second.
"To have folding@home recognized by Guinness World Records as the most powerful distributed computing network ever is a reflection of the extraordinary worldwide participation by gamers and consumers around the world and for that we are very grateful," said Professor Vijay Pande of Stanford University and a leader of the FAH project.
Disease link
Distributed computing is a method for solving large complex problems by dividing them between many computers.
They harness the idle processing power of computers to crunch small packets of data, which are then fed back over the internet to a central computer.
The technique has been used by several groups to study everything from how malaria spreads to searching for new cancer drugs.
One of the most high profile projects is seti@home, which uses computer cycles to search through thousands of hours of radio telescope signals for signs of extra-terrestrial intelligence.
FAH uses distributed computing to examine protein folding and how it maybe linked to diseases.
Proteins that do not fold correctly have been implicated in diseases such as Alzheimer's, Huntingdon's, BSE and many cancers.
Speed test
Until March this year, FAH only ran on PCs.
The program had around 200,000 computers participating in the program, the equivalent of about 250 teraflops (trillion calculations per second).
The addition of 670,000 PS3s has taken the computing power of the network to more than one petaflop.
By comparison BlueGene L, which tops the list of most powerful supercomputers, has a top speed of just 280.6 teraflops.
The boost is in part because of the PS3's powerful processor, known as the "cell", which runs up to 10 times faster than current PC chips.
"It is clear that none of this would be even remotely possible without the power of PS3, it has increased our research capabilities by leaps and bounds," said Prof Pande.
The computing industry is moving rapidly away from exponential scaling of clock frequency toward chip multiprocessors in order to better manage trade-offs among performance, energy efficiency, and reliability. Understanding the most effective hardware design choices and code optimizations strategies to enable efficient utilization of these systems is one of the key open questions facing the computational community today.
In this paper we developed a set of multicore optimizations for LBMHD, a lattice Boltzmann method for modeling turbulence in magnetohydrodynamics simulations. We presented an auto-tuning approach, which employs a code generator that produces multiple versions of the computational kernels using a set of optimizations with varying parameter settings. The optimizations include: an innovative approach of phase-space TLB blocking for lattice Boltzmann computations, loop unrolling, code reordering, software prefetching, streaming stores, and use of SIMD instructions. The impact of each optimization varies significantly across architectures, making a machine-independent approach to tuning infeasible. In addition, our detailed analysis reveals the performance bottlenecks for LBMHD in each system.
Results show that the Cell processor offered (by far) the highest raw performance and power efficiency for LBMHD, despite having peak double-precision performance, memory bandwidth, and sustained system power that is comparable to other platforms in our study. The key architectural feature of Cell is explicit software control of data movement between the local store (cache) and main memory. However, this impressive computational efficiency comes with a high price — a difficult programming environment that is a major departure from conventional programming. Nonetheless, these performance disparities point to the deficiencies of existing automatically-managed coherent cache hierarchies, even for architectures with sophisticated hardware and software prefetch capabilities. The programming effort required to compensate for these deficiencies demolishes their initial productivity advantage.
Our study has demonstrated that — for the evaluated class of algorithms — processor designs that emphasize high throughput via sustainable memory bandwidth and large numbers of simpler cores are more effective than complex, monolithic cores that emphasize sequential performance. While prior reseach has shown that these design philosophies offer substantial benefits for peak computational rates [16], our work quantifies that this approach can offer significant performance benefits on real scientific applications.
Overall the auto-tuned LBMHD code achieved sustained superscalar performance that is substantially higher than any published results to date — over 50% of peak flops on two of our studied architectures, with speedups of up to 14 relative to the original code. Auto-tuning amortizes tuning effort across machines by building software to generate tuned code and using computer time rather than human time to search over versions. It can alleviate some of compilation problems with rapidly-changing microarchitectures, since the code generator can produce compiler-friendly versions and can incorporate small amounts of compiler or machine-specific code. We therefore believe that auto-tuning will be an important tool in making use of multicore-based HPC systems of the future. Future work will continue exploring auto-tuning optimization strategies for important numerical kernels on the latest generation of multicore systems, while making these tuning packages publicly available.
![]()
However, this impressive computational
efficiency comes with a high price — a difficult
programming environment that is a major departure
from conventional programming. Nonetheless, these
performance disparities point to the deficiencies of existing
automatically-managed coherent cache hierarchies,
even for architectures with sophisticated hardware and
software prefetch capabilities. The programming effort
required to compensate for these deficiencies demolishes
their initial productivity advantage.
Yes, agreed, but part of this I think is the mindset and education of programmers, including myself. Parallel processing has always been a very small niche, and mainly done on a very large scale. I think that programmers experienced in that realm would have no problem. The interesting finding of this particular research regarding the Cell is how it scales up almost on the linear level the more cores you have, where the other multi core architectures drop off on performance per core.
Does anyone know what this thread is about?
i stand by my original thought..
"Shit Thread"
EE Times: Latest News
IBM shifts Cell to 65nm
Server chip aims at "supercomputing for the masses"
Rick Merritt
EE Times
(05/13/2008 12:01 HM EDT)
SAN JOSE, Calif. — IBM Corp. officially announces today (May 13) a next-generation version of its Cell processor, the first specifically geared for computer servers.
The PowerXCell 8i will drive the Road Runner system now under test at Los Alamos National Labs to see if it can become the world's first supercomputer to deliver sustained petaflops performance. Besides cracking the petaflops barrier, IBM hopes hundreds of users will decide to plug into their IBM servers a two-socket board housing the new Cell chips to deliver what IBM calls "supercomputing for the masses."
The new chip uses 65nm process technology to reduce the power consumption of the previous 90nm chip while maintaining the same 3.2 GHz frequency. That allows IBM to get two of the chips on to a single board while keeping board-level power consumption under 250 W required by IBM's BladeCenter servers.
The new design now supports mainstream DDR-2 memory rather than the Rambus XDR memories used in the original Cell. It has also expanded total memory capacity of the chip from 2 to 32 Gbytes to support large data sets required in many high-end technical computing applications.
IBM also expanded support for double precision floating point on the eight specialty cores used on Cell. The chip now delivers up to 190 TFlops of double precision floating point performance, five times its previous level, said Jim Comfort, vice president of workload optimized systems in IBM's Systems and Technology Group.
Supercomputer sets record
By John Markoff, Published: June 9, 2008
SAN FRANCISCO: An American military supercomputer, assembled from components originally designed for video game machines, has reached a long-sought-after computing milestone by processing more than 1.026 quadrillion calculations per second.
The new machine is more than twice as fast as the previous fastest supercomputer, the IBM BlueGene/L, which is based at Lawrence Livermore National Laboratory in California.
The new $133 million supercomputer, called Roadrunner in a reference to the state bird of New Mexico, was devised and built by engineers and scientists at IBM and Los Alamos National Laboratory, based in Los Alamos, New Mexico. It will be used principally to solve classified military problems to ensure that the nation's stockpile of nuclear weapons will continue to work correctly as they age. The Roadrunner will simulate the behavior of the weapons in the first fraction of a second during an explosion.
Before it is placed in a classified environment, it will also be used to explore scientific problems like climate change. The greater speed of the Roadrunner will make it possible for scientists to test global climate models with higher accuracy.
To put the performance of the machine in perspective, Thomas D'Agostino, the administrator of the National Nuclear Security Administration, said that if all six billion people on earth used hand calculators and performed calculations 24 hours a day and seven days a week, it would take them 46 years to do what the Roadrunner can in one day.
The machine is an unusual blend of chips used in consumer products and advanced parallel computing technologies. The lessons that computer scientists learn by making it calculate even faster are seen as essential to the future of both personal and mobile consumer computing.
The high-performance computing goal, known as a petaflop — one thousand trillion calculations per second — has long been viewed as a crucial milestone by military, technical and scientific organizations in the United States, as well as a growing group including Japan, China and the European Union. All view supercomputing technology as a symbol of national economic competitiveness.
By running programs that find a solution in hours or even less time — compared with as long as three months on older generations of computers — petaflop machines like Roadrunner have the potential to fundamentally alter science and engineering, supercomputer experts say. Researchers can ask questions and receive answers virtually interactively and can perform experiments that would previously have been impractical.
"This is equivalent to the four-minute mile of supercomputing," said Jack Dongarra, a computer scientist at the University of Tennessee who for several decades has tracked the performance of the fastest computers.
Each new supercomputing generation has brought scientists a step closer to faithfully simulating physical reality. It has also produced software and hardware technologies that have rapidly spilled out into the rest of the computer industry for consumer and business products.
Technology is flowing in the opposite direction as well. Consumer-oriented computing began dominating research and development spending on technology shortly after the cold war ended in the late 1980s, and that trend is evident in the design of the world's fastest computers.
The Roadrunner is based on a radical design that includes 12,960 chips that are an improved version of an IBM Cell microprocessor, a parallel processing chip originally created for Sony's PlayStation 3 video-game machine. The Sony chips are used as accelerators, or turbochargers, for portions of calculations.
The Roadrunner also includes a smaller number of more conventional Opteron processors, made by Advanced Micro Devices, which are already widely used in corporate servers.
"Roadrunner tells us about what will happen in the next decade," said Horst Simon, associate laboratory director for computer science at the Lawrence Berkeley National Laboratory. "Technology is coming from the consumer electronics market and the innovation is happening first in terms of cellphones and embedded electronics."
The innovations flowing from this generation of high-speed computers will most likely result from the way computer scientists manage the complexity of the system's hardware.
Roadrunner, which consumes roughly three megawatts of power, or about the power required by a large suburban shopping center, requires three separate programming tools because it has three types of processors. Programmers have to figure out how to keep all of the 116,640 processor cores in the machine occupied simultaneously in order for it to run effectively.
"We've proved some skeptics wrong," said Michael Anastasio, a physicist who is director of the Los Alamos National Laboratory. "This gives us a window into a whole new way of computing. We can look at phenomena we have never seen before."
Solving that programming problem is important because in just a few years personal computers will have microprocessor chips with dozens or even hundreds of processor cores. The industry is now hunting for new techniques for making use of the new computing power. Some experts, however, are skeptical that the most powerful supercomputers will provide useful examples.
"If Chevy wins the Daytona 500, they try to convince you the Chevy Malibu you're driving will benefit from this," said Steve Wallach, a supercomputer designer who is chief scientist of Convey Computer, a start-up firm based in Richardson, Texas.
Those who work with weapons might not have much to offer the video gamers of the world, he suggested.
Many executives and scientists see Roadrunner as an example of the resurgence of the United States in supercomputing.
Although American companies had dominated the field since its inception in the 1960s, in 2002 the Japanese Earth Simulator briefly claimed the title of the world's fastest by executing more than 35 trillion mathematical calculations per second. Two years later, a supercomputer created by IBM reclaimed the speed record for the United States. The Japanese challenge, however, led Congress and the Bush administration to reinvest in high-performance computing.
"It's a sign that we are maintaining our position," said Peter Ungaro, chief executive of Cray, a maker of supercomputers. He noted, however, that "the real competitiveness is based on the discoveries that are based on the machines."
Having surpassed the petaflop barrier, IBM is already looking toward the next generation of supercomputing. "You do these record-setting things because you know that in the end we will push on to the next generation and the one who is there first will be the leader," said Nicholas Donofrio, an IBM executive vice president.
By breaking the petaflop barrier sooner than had been generally expected, the United States' supercomputer industry has been able to sustain a pace of continuous performance increases, improving a thousandfold in processing power in 11 years. The next thousandfold goal is the exaflop, which is a quintillion calculations per second, followed by the zettaflop, the yottaflop and the xeraflop.
Supercomputer sets petaflop pace
By Jonathan Fildes
Science and technology reporter, BBC News
A supercomputer built with components designed for the Sony PlayStation 3 has set a new computing milestone.
The IBM machine, codenamed Roadrunner, has been shown to run at "petaflop speeds", the equivalent of one thousand trillion calculations per second. The benchmark means the computer is twice as nimble as the current world's fastest machine, also built by IBM. It will be installed at a US government laboratory later this year where it will monitor the US nuclear stockpile. It will also be used for research into astronomy, genomics and climate change.
"We are getting closer to simulating the real world," Bijan Davari, vice president of next generation computing systems at IBM, told BBC News. It would be of particular use for calculating risk in financial markets, he said. "The latency of the calculations is so small that for all practical purposes it is real time."
Chip stacks
The current fastest supercomputer is IBM's Blue Gene/L, also at the Los Alamos National Laboratory (LANL). It is used in the US Department of Energy's Stockpile Stewardship Program, which oversees the country's nuclear weapons.
TOP FIVE SUPERCOMPUTERS
Blue Gene/L, Lawrence Livermore National Laboratory, California. (478.2 teraflops; 212, 992 processors)
Blue Gene/P, Forschungszentrum Juelich, Germany. (167.3 teraflops; 65536 processors)
SGI Altix ICE 8200, SGI/New Mexico Computing Applications Center, Wisconsin, US. (126.9 teraflops; 14336 processors)
EKA - Cluster Platform 3000 BL460c, Computational Research Laboratories, Pune, India. (117.9 teraflops; 14240 processors)
Cluster Platform 3000 BL460c, government agency, Sweden. (102.8 teraflops; 13728 processors)
Source: Top 500 Supercomputers
It was recently upgraded and now runs at a speed of 478.2 teraflops (trillions of calculations per second), using 212,992 processors. By comparison, Roadrunner will use fewer than 20,000 chips. This is because the new computer is a so-called "hybrid" design, using both conventional supercomputer processors and the powerful "cell" chip designed for the PS3. The eight-core chip runs at speeds greater than 4 GHz and was designed by a consortium of companies including IBM, Sony and Toshiba. It has been modified for Roadrunner to allow it to handle a greater bandwidth of data and to carry out more specialist calculations.
Roadrunner packs more than 12,000 of the processors - known as "accelerators" - on top of nearly 7,000 standard processors. The standard processors are used too handle the general computation needed to keep the machine running, whilst the cells are left to crunch vast swathes of unstructured data. "For these kinds of simulations of very complex natural phenomena the cell chip is extremely powerful," said Dr Davari. "It is a lot more effective than combing many, many, many more smaller, general purpose computational engines."
The machine was the first to pass through the petaflop barrier, said Dr Davari. "The exciting part for me as a technical person is that we can now see the recipe for high performance computing for the next 10 to 15 years," he said. It will now be disassembled and moved to New Mexico where it will be housed in 288 refrigerator-sized cases connected by 57 miles (92km) of fibre optic cable. Although Roadrunner will run at extraordinary speeds, other computers could soon challenge its record.
IBM currently has another petaflop machine in the pipeline based on its Blue Gene/P technology. When finished, it will be the world's fastest commercial supercomputer. "Blue Gene/P continues the path of Blue/Gene L," said Dr Davari. The machines share much of the same software and hardware. Blue Gene/P will be installed at the Department of Energy's (DOE) Argonne National Laboratory in Illinois later this year.
Both Sun and Cray supercomputers have also unveiled plans for petaflop machines in the near future.
Fixstars Accelerates the Mizuho Securities Derivatives System using Cell/B.E.
Press Release
30 May 2008
More than 10 fold improvement in performance compared to the existing exotic derivatives system
Tokyo, Japan; 30 May, 2008 --- Fixstars Corporation (“Fixstars”) announced today that it has succeeded in accelerating the exotic derivatives trading system of Mizuho Securities Co., Ltd. (“Mizuho Securities”) by utilizing the computational powers of the Cell Broadband Engine™ (Cell/B.E.). Mizuho Securities expects more than 10 fold improvement in processing performance by using the Cell/B.E. system compared to the existing exotic derivatives trading system, and the new system will be introduced in its daily operations.
In financial institutions, data volume has been continually rising in various fields led by derivatives and risk calculations, and algorithmic trading for automation, which in turn demands faster and more efficient data processing. Cell/B.E. has superb computational capabilities which are well suited to accelerating financial modeling calculations. System utilization is maximized with software technologies designed to take advantage of Cell/B.E. capabilities, porting of existing programs into parallel or distributed systems, and robust middleware that is absolutely necessary to handle various usages and demands placed upon the system. Fixstars specializes in programming for the Cell/B.E. platform, and we fully utilized our in house resources to drastically reduce the execution times of the Mizuho Securities exotic derivatives system calculations by maximizing the capabilities of the Cell/B.E. platform.
“We are extremely pleased that Mizuho Securities, an industry leading financial institution, has chosen us as the partner to lead the way in Cell/B.E. development.” said Satoshi Miki, CEO of Fixstars. “We are hopeful that the Mizuho Securities business experiences further acceleration by introducing the Cell/B.E. system into their daily operations.”
This system acceleration was made possible by the use of system infrastructure consisting of IBM Japan Cell/B.E. based IBM BladeCenter® QS series. Fixstars will continue to offer development support to Mizuho Securities to speed up execution times of their financial computation systems.
Scaled-down Cell CPU beats Intel’s quad-core chip in video transcoding
Hardware
By Theo Valich
Monday, June 09, 2008 00:30
Taipei (Taiwan) – During Computex 20008 we had a chance to visit Corel's suite at the Grand Hyatt hotel, which featured, at least as far as we know, the first third party demonstration of Toshiba’s SpursEngine 1000 (SE1000), an accelerator board based on the Cell BE processor. Despite the fact that Toshiba has trimmed down the chip, the performance potential is impressive.
Corel demonstrated the SE1000 on a special and Cell-optimized version of its DVD MovieFactory application, transcoding 1080p H.264 video to a smaller resolutions such as 480p. The SE1000 board was one of Toshiba’s sample boards, which were announced in April of this year.
The PCI Express x1 card houses a 65 nm Cell BE processor running at 1.5 GHz (compared to the 3.2 GHz in the Playstation 3) as well as four active SPE units (PS3: 8) and 128 MB XDR DRAM memory (PS3: 256 MB). Essentially, the SE1000 has about half the resources of the Cell engine in the PS3.
![]()
However, the demonstration results were quite spectacular. The video transcoding process takes about half as long on a SE1000 than on a 3 GHz Intel Core 2 Quad CPU. Keep in mind that this is a very specialized application, while the Core 2 Quad is a much more universal chip, but the simple performance potential is impressive nevertheless. Especially if you consider the fact that the accelerator consumes only 10 to 20 watts.
![]()
So, how much does this board cost? We don’t know. Following a first demonstration of the SE1000 last September, the product has been sampling including a middleware kit for an undisclosed amount since April. Don’t hold your breath that you will be able to buy this board sometime in the future: Toshiba so far said that it is only targeted at consumer electronics applications for now.
In February of this year Sony announced that scaling the Cell BE chip to 45 nm is underway, with the die-size of the chip expected to shrink by about 34%. The smaller chip is also expected to consume about 40% less power.
HD, Bandwidth Conservation, Linux and Codec Compression
Submitted by Investor Voices on Sat, 04/12/2008 - 7:17pm - TAGS:
[HD, Bandwidth Conservation, Linux and Codec Compression] An interview with Rod Tiede, CEO of Broadcast International, sheds some light on the CodecSys technology.
By Carl Moebis
The codec industry is something most people don't pay attention to, but it affects everything we watch from DVDs to Cable News and especially web-casts on the Internet. Most people give little thought to how the picture on their TV actually gets there. In today's HD world it's all based on codecs (COder/DECoder). Most of the time the broadcaster has to choose an appliance with a basic encoder chip like MPEG-2 because it's fast and cheap, and there is no shortage of cheap MPEG-2 decoders out there. It's in every $20 DVD player on the market, and it's not even the most expensive component in the DVD player, it's usually the laser diode. The problem with choosing MPEG-2 is not it's cost, but it was a specification (MPEG-2 Parts 1 and 2) written in 1992, back when processors were expensive and not very capable, which is what led to specialized chips called ASICs whose sole purpose was to encode and decode MPEG-2 streams, but they're not upgradable, they are too specialized.
![]()
The other important factor to understand about MPEG-2 is that it follows the same kind of algorithms as JPEG, and as such it suffers greatly in unwanted artifacts when the compression is turned up high. So it's not very efficient, but it's been with us for 16 years, because once a standard is put in place it's hard to change. CodecSys changes the entire paradigm. Not only is it built on hardware that can handle software and codec changes going years and years into the future, but it was built to switch codecs on the fly, and the ones it primarily uses are what's been called H.264 or AVC (Advanced Video Codec). A typical raw HD stream (uncompressed) takes up between 1-1.5Gbps. MPEG-2 can squeeze this down to about 20Mbps, but CodecSys can squeeze this same 1.5Gbps stream down to a paltry 3Mbps, that's a 500:1 compression ratio, soon to be 1.5Mbps or 1000:1 by year end. And the quality of the CodecSys stream is outstanding.
This is made possible by a collaboration between IBM and Broadcast International, and the cell-blade hardware that does this is like a super fast programmable ASIC with each blade having 2 cell processors which total 16 of these SPEs (Synergistic Processing Elements) and 2 full PowerPC cores. When the cell processor was developed, engineers knew there wasn't much out there that you could throw at it, processing wise, that it wouldn't be able to handle. In fact, there simple isn't very many processing applications that can keep the entire cell processor busy at all times. It seems like it was built just for this task, one of the most processor intensive tasks there is, codec compression of video, especially when using an advanced one like CodecSys.
People like to speculate. Investors like to speculate, it's what has brought the price of oil to an all time high. We worry about finite resources, and even more importantly we know we worry about finite resources. This is elementary for any investor to understand. But the real exciting things, the things that get me to wondering are usually surrounding technology. We've all seen the great success stories in the technology sector. It takes innovation and it takes vision, and most importantly it means looking into a crystal ball and seeing the future.
![]()
Speaking of "finite" things, bandwidth is one of them, and it applies to more then just your Internet connection, even though more and more it's becoming one and the same. We've made the switch to HD, but it wasn't a 2x quality difference it's more like a 20x quality difference, and you can use that 20x multiple to understand the implications of bandwidth usage on our cable and satellite providers switching from SD (standard definition) to HD. The 20x quality difference is not only derived from having over twice the horizontal resolution, but the fact that HD is not interlaced/scanline, it's progressive scan, and it's in a 16:9 aspect ratio, as opposed to 4:3. What we need is a technology that can compress (what is already compressed using standardized codecs) to something much smaller while retaining the quality of the video. Sounds easy, but it's not. Once you figure out how to do that, now you have to figure out how to do it on the fly. Once again here is where CodecSys steps in and this collaboration between IBM and Broadcast International makes this all possible.
CodecSys is able to watch a full HD stream and make a decision every 3-4 frames on what would be the most efficient codec to use, and then compress this down to a bandwidth within the limits of most DSL and Cable Internet connections. The technology will mostly be used by broadcasters looking to fit more channels within their finite bandwidth, and this means more HD channels as they have already started switching. Not only are they starting to push out some broadcasts in HD, but they have to simulcast the old SD stream as well. This is very taxing on their systems and they have already reached their bandwidth limits. CodecSys can easily alleviate their grief, by giving full quality HD streams at SD bandwidths. But the day is soon approaching when everything will just be a single TCP/IP connection to the home for voice, data and video anyway. Here once again CodecSys is future proof, because it will be just this patented technology that the broadcasters and web-casters will need to turn to in order to push out their programming.
Another important consideration for any potential investor, hopefully now that you understand the implications of such a technology, is the fact that Broadcast International has also been hard at work protecting their process for the last 7 years. They have patented the general approach of multiple codec switching so no competitor can come in with a different system and try to accomplish the same thing. The patents are filed in every country in the world, and the first one which was filed in the United States in 2001 was approved in 2007. They have also been approved in many other countries with more to come.
For those that are a little more technically inclined, I've included the following transcript which indicates partially what I already knew about the technology and what more importantly talking to Rod Tiede the CEO of Broadcast International was able to bring to the table. I had the honor of picking his brain a little, and coming from an engineering background, he was more then prepared to answer my hard pressing questions:
Carl: Hi Rod, I’m Carl Moebis and I work as a technology consultant. I used to work at AT&T as an IT consultant in the late 90’s. One of the systems I built while I was there was an all-employee video broadcast system delivered over their vast pre-existing intranet. They used to use a k-band satellite which required special dishes for each of the 300 some odd buildings, and the rental of the upstream for the satellite was about $60,000 per hour. We built encoder boxes using the fastest dual-processors at that time, and even for a postage stamp sized video, it would peg out at 100% utilization. I know how processor intensive encoders can be. My question is how well does the cell-blade system work? Is each individual cell-blade capable of encoding only a single live HD stream at a time or more?
Rod: It does both, it does more. If you look at the way the cell-blade is actually constructed, it has the SPEs that are out there as well, so what it's able to do is offload so much of it to the SPEs and the PowerPC and what it ends up doing is it's so very efficient inside it's internal communications bus that it's able to compress multiple versions of the video and it's able to do this in real-time. To give you an idea of what we're doing right now, two cell processors sit on one cell-blade chasis, and that one cell-blade can do 2 HD 1080p 60fps encodings simultaneously.
Carl: Wow. Each cell processor has 8 SPEs in it?
Rod: 8 SPEs, correct.
Carl: And you've written your CodecSys encoder to be able to talk to those?
Rod: That's correct, and that is what we have done with the CodecSys multi-encoding, switching back and forth between optimized codecs for dark/bright, fast motion/slow motion, but there's even another proprietary technology taking place here that in which we have ported these codecs onto each of these cells has enabled us to do this in real-time and has enabled us to do things that nobody else has done with the cell in the past.
Carl: So that leads me to my next question, do you have a demo video that shows how the codesys system works? Where it would demonstrate fast motion, slow motion, rapid frame cycling and stills, and where each appropriate codec was picked by codecsys and what the bandwidth savings were by picked that codec?
Rod: What we show, and we'll be showing this at NAB, you can see the MPEG-2 version of a video running at 19Mbit per seconds and then you can see the CodecSys video running at 3Mbits per second and you can see the quality has been preserved, this is at 720P HD. As far as which codecs are being selected from a scene-by-scene basis, sometimes we're selecting codecs 2-3 times a second. We have 1080P running at 5Mbits per second, and that 1080P is 60 frames per second.
Carl: That's incredible. What is the buffering time?
Rod: We're delayed 3-4 frames is all. So what we end up doing is doing scene changes every 3-4 frames is all.
...Carl: So what it's doing is picking a codec within that 3-4 frame buffer, to determine what the best codec is.
Rod: That's correct.
Carl: Can the CodecSys cell-blade servers be scaled to encode in non-proprietary codecs such as H.264 for distribution and streaming to a larger install base?
Rod: Let's take a quick step back and just talk about CodecSys in general. The first version of CodecSys that we're doing today is H.264, we are running multiple versions of H.264, the output of our encoder is H.264, we are compatible with H.264 devices in the field.
Carl: That was one of my other questions, is that the majority of the encoding is H.264?
Rod: Right, so think of my encoder as an encoder but rather having a single H.264 codec in it, it has 12 H.264's in it and each one of those H.264s is optimized for specific events like flashes of bright light, dark content, quick panning, all of those different types of variables that when you watch a bandwidth meter as you're doing the encoding, you see that bandwidth meter keeps spiking. What we have done is we've developed codecs that are able to eliminate those spikes.
Carl: And this is the crux of the CodecSys system that picks the codec on the fly?
Rod: Right. And that's what we're selecting on the fly. Therefore the output coming out of the encoder is H.264 compatible.
Carl: Ok, and this is only applied to the video stream, the audio codec you probably just choose one and go with that?
Rod: That's correct. Audio we're probably only using like 24Kbps and if we cut that in half it's only saving 12kbps.
Carl: Right, it's not even worth it. Alright, another part of the genius of this system is that it’s upgradable, to incorporate more efficient codecs in the future. I understand how this would be easy to roll-out to the encoders, and I understand the choice of the cell-blade processor to make that an easy transition, but how is the client side handled when a new codec is added?
Rod: It can be done online, and as far as what IBM has setup today, IBM is the one of course that is out there today selling the encoder, and it's Broadcast International software that is running on that encoder that does the compression. So essentially we have sold IBM a license to our compression technology. With IBM, what they do in their support structure, they have it setup right now, that they are 1st and 2nd level support and we are 3rd level support, all new upgrades, patches and fixes are included with their annual fees that they charge to their customers that buy the devices. You're 100% right in that here's the H-blades chassis that are hot-swappable for power supplies and fans and all the other kind of components that are running. More importantly is that our technology is software, so if you were an IPTV broadcaster today, knowing you had to select a codec today, you know in 6 months someone is going to come out with something better. Here's a technology where you don't have to worry about selecting a codec, first of all you're going to get many codecs in there, because if you've seen our press releases we've licensed onto Flash, JPEG2000 and many other different kinds of codecs built into it, so you as a broadcaster don't have to worry about making that decision. What you know is that when you buy this IBM product, you know that over the next couple of years, for instance we know in our product road-map we'll have HD at 720P 30fps running at 1.5Mbps by the end of this year. So you know that you will have that upgrade, so by buying this one piece of hardware over the next 4-5 years you're going to be continually getting evolving new technologies of compression built into your IBM, so you're always getting better. The preliminary marketing that IBM and Broadcast International has done, it's been very well received by the broadcasters because they are so sick and tired today of every 6 months it's time to through out that appliance and buy another appliance to fix in there.
Carl: What about the “holy-grail” of codec technology which is fractal compression? In theory it would be the best solution, but it’s incredibly processor intensive and almost everyone has failed at actually implementing an elegant solution. This has been going on since the early 90’s and some of successes have found their way into the modern codecs like H.264 such as predictive motion. This is most often seen when a stream starts to fail and the texture of the texture of the video starts to map over or move over what seems to be a grey outline of the objects movement. Have you looked at fractal compression?
Rod: We have 3 fractals that we're experimenting with right now. And we know that we will be able to build in over the next 12 months fractals codecs into the CodecSys system. Imagine having a video compression system that might be made up of H.264, maybe even MPEG2 and fractals in there as well and again it's the processing power of the IBM that give us the liberty to be able to add those type of codecs in and it's never been done before because we have this great processing power.
Carl: Great. And I have one last question for you Rod, I don't want to take up too much of your time, as more and more households and businesses start to understand the ubiquity of just having a Internet connection for voice, data and video, systems like CodecSys will become indispensable for video distribution. Do you think investors understand where this is going, or is it a challenge to help them see the inevitable future of having just satellite or fiber to the home, and why it’s so important to conserve bandwidth especially with HD streams?
Rod: I think you hit the nail on the head there Carl, I bet you 1 out of 150-200 guys in the investor community that I talk to "gets this", you are so right there. Many of the them think that you know, you just put that bigger Internet connection in there and everything will be fine. It doesn't work that way, you got to have good compression. If you have a 1.5Mbps DSL connection you can't use 1.5Mbps because you'll never get good quality of service, you'd be lucky to get 700-800Kbps coming through. This is where our technology really enables IPTV broadcasters, web, satellite and cable broadcasters. Take a look at the ROI, if one of these direct to home satellite broadcasters were to buy $40-$50 million dollars worth of IBM encoders and populate their head-end with that equipment that would give them additional channel capacity as opposed to launching another $500 million dollar satellite in the sky. So a cable and satellite operator certainly benefits from the technology, and then also your IPTV broadcasters as well as web-casters, everybody is going to need good compression technology. In the next few years there's going to be this standard where everybody is going to have an HD in the home and they're going to have HD on their laptop, and they're going to have HD everywhere, HD is going to become a standard because it's technologies like this that are going to enable HD to be transmitted at economical bit-rates, making it a truly ubiquitous technology.
Carl: You may have heard the news that came out over the last week that Comcast is finally working with Bit torrent as a distribution system. Can CodecSys work with distribution systems like Bit torrent?
Rod: Absolutely. Again, all we are doing is replacing the compression algorithms that have going into creating that compressed video already, so that's the only part that we're doing. As far as how the distribution takes place, whether it's P2P or streaming or whatever it might be, again the encapsulation process is not what we're specialized in, what we're specialized in is the codecs.
Carl: Are these Blade systems running some flavor of UNIX like Solaris?
Rod: I'm pretty sure it's Fedora or Redhat.
Carl: Yeah, you couldn't do this in Windows, I'm glad to hear it's Linux which is a "UNIX-like" OS. Of course IBM has a history with Linux and it doesn't surprise me if it is indeed Redhat.
Rod: It's all Linux.
Carl: Good. What is the patent situation with the multiple codec implementation? Can anybody else come along and do this?
Rod: No. We went out and patented the process of doing multiple codecs. I could put 80 engineers in 80 different rooms and say create a system that switches between multiple optimized codecs to provide the highest quality video at the lowest bitrate and they would come up with 80 different ways to do it. So what we have done is we've patented the process of doing this, not how we do it. The actual algorithms, we have specific algorithms that the video is being viewed by, and that's part of how we're doing our codec selection. Those algorithms are being kept in trade secret and the patent actually covers the fact that we do this optimized codec switching process. The patent was filed in September of 2001, we received here in the United States in August of 2007. Over the last couple of years we've actually received the patent in places like Malysasia and Russia, Singapore and many, many other countries we've received it, and it's filed in every country in the world and it took a full 6 years to get it here in the U.S.
Carl: That's great news. Have you thought about talking to Apple or Microsoft about incorporating the codec into QuickTime or Windows Media Player?
Rod: Sure those are all great opportunities for us. Right now we're focused on making sure we have a product with IBM on the IBM platform and then we will branch out into other opportunities as well. All of their codecs can be easily ported into this so that it would be Apple and Microsoft compatible as well.
End of interview. I hope everyone has a better understanding of how CodecSys works, and what this means in the broadcast industry. More and more the broadcast industry and the Internet community are merge and melting together. CodecSys is there waiting and ready, and all likely-hood it will find its way into your home, and you probably won't even be aware of it.
I'm not reading all of this, just wanted to say the thing in that video didn't look like a real car, so this is still shit.
Sony Unveils New Hybrid Multi-Core Cell Platform
Tuesday August 12, 2008
Sony Electronics is unveiling a new workflow solution for faster processing of high-resolution effects and computer graphics. This new technology platform, named ZEGO, is based on the Cell/B.E. (Cell Broadband Engine) and RSX(r) technologies, and is designed to eliminate bottlenecks that can occur during post production, especially during the creation and rendering of visual effects.
The BCU-100 computing unit is the first product to utilize this new ZEGO hybrid multi-core cell platform.
The ZEGO platform, and the BCU-100 unit, will be shown at SIGGRAPH in conjunction with Sony's 4K SXRD projector, to demonstrate Sony's vision for providing enhanced end-to-end workflows -- from acquisition through display.
"Sony's ZEGO platform represents the melding of both Cell and GPU technologies to provide efficient and accelerated workflows for HD content and beyond," said Satoshi Kanemura, VP of B2B of America. "This technology not only has the power to achieve new efficiencies in visual effects pipelines, it also represents our commitment to designing solutions with customers' needs in mind."
Production professionals are working with increasingly higher-resolution video content, from HD to 4K and beyond. As a result, the time required for post production work has also increased significantly, driving the need for faster workflows.
The Cell type architecture, known as on-chip parallelism, packages a collection of processors and co-processors in a combination that is specifically designed to optimize specific types of applications. It incorporates a low-power consumption design with the added benefit of reduced operational costs when many units are run in a clustered configuration.
Sony has been working with software companies that specialize in video production, including mental images of Berlin and Side Effects Software of Toronto, Ontario to create applications which take advantage of the ZEGO platform.
"Computer graphics remain one of the world's most computationally demanding activities and there seems to be no limit to our industry's desire for innovative hardware and software solutions," said Dr. Paul Salvini, chief technology officer and VP of Canadian operations at Side Effects Software. "The BCU-100's hybrid Cell/B.E. and RSX architecture has provided us with a tremendous opportunity to develop new algorithms and techniques for parallel, multi-core, and stream computing."
"Our collaboration with Sony has yielded important insights into the development of next-generation options for high-end rendering technology focused on high-bandwidth data transfer," said Ludwig von Reiche, chief operating officer of mental images. "The porting of mental ray software and MetaSL to the Cell platform using Sony toolkits has demonstrated the promising potential for accelerating rendering by utilizing on-chip parallelism."
The BCU-100 computing unit is expected to ship later this year, with pricing to be determined.
The new grade name “ZEGO” from Sony originates from the concept “Zest to Go”, emphasizing Sony’s dynamic advancement towards the future of video production with that new, hybrid multi-core processing platform. BCU-100 is a 1U (unit) size and can be mounted in a 19-inch rack. Sony BCU-100 is scheduled for launch in the U.S. market this year. Sony is additionally exhibiting a prototype 56 inch QFHD (Quad Full High Definition, 3840 x 2160 resolution) LCD video monitor.
Sony to Showcase Innovative "Beyond HD" Content Creation Workflows at "SIGGRAPH 2008"
"ZEGO" Computing Unit incorporating Cell/B.E. microprocessor and RSX® graphics processor to be launched for U.S. market within this year
Sony is exhibiting new "Beyond HD" content creation workflow solutions at "SIGGRAPH 2008" (held from August 12-14 at Los Angeles Convention Center, Los Angeles, California, USA). Based on a new hybrid multi-core processing platform, these solutions are designed to provide production professionals with increased efficiency and accelerated workflows across every stage of production, from shooting and acquisition to editing and display.
New Product Launch for U.S. Market
To coincide with SIGGRAPH 2008, Sony Electronics Inc. has also announced a new computing unit. The BCU-100 is the first product to be launched as part of Sony's new multi-core processing platform, "ZEGO". BCU-100 incorporates the high-performance Cell Broadband Engine™ microprocessor (Cell/B.E.) and RSX® graphics processor to realize high speed computational performance. BCU-100 is scheduled for launch in the U.S. market within this year.
The new brand name "ZEGO" originates from the concept "Zest to Go", emphasizing Sony's dynamic advancement towards the future of video production with this new, hybrid multi-core processing platform.
Features of "ZEGO" BCU-100
1. High-Speed Processing
- RSX® delivers high-performance graphics processing while Cell/B.E. realizes arithmetic operation speeds of up to 230 GFLOPS.
- Equipped with onboard high speed memory, XDR™.
2. Miniature Size
BCU-100 is a 1U (unit) size and can be mounted in a 19-inch rack.
3. Reduced Energy Consumption
BCU-100 delivers high computational performance while reducing power consumption to 330W or less.
Exhibits
· 56 inches QFHD LCD Video Monitor(prototype)
Sony is exhibiting a prototype 56 inch QFHD (Quad Full High Definition, 3840 x 2160 resolution) LCD video monitor.
· Demonstrations
Sony is demonstrating the following applications at SIGGRAPH 2008, including "4K" image rendering using the new BCU-100, that showcase seamless integration between cameras, VTRs and CG technology to realize effective workflows in "Beyond HD" content creation.
1. Demonstration of CG production
This application demonstrates Sony's vision of highly efficient CG production using the BCU-100's advanced processing capacity to pre-visualize metadata within the camera unit.
2. Demonstration of high speed DPX & Cineon file import from "HDCAM-SR" VTR using Side Effects Software Inc.'s "Houdini® Batch" application
Demonstration of DPX and Cineon file import from the VTR to CG workstation. Direct import of HD and 4K files as uncompressed data at up to 4K full aperture (4096 x 3112) enables users to easily visualize lighting patterns and match color data.
3. Introduction of application software for BCU-100, Side Effects Software Inc.'s "Houdini® Batch"
Demonstration of Side Effects Software Inc.'s "Houdini® Batch" for physics simulation and effects creation using the advantages of BCU-100's multi-thread processing capabilities.
4. Introduction of mental images' rendering software "mental ray®"
An introduction of mental images' rendering software "mental ray®" and the "MetaSL™" shading language.
5. Technology demonstration of 4K applications
Real time, lossless encoding and decoding of 4K images, harnessing the advanced computational performance of Cell technology.
I'm as lost as those guys of Lost.
What we saw demo’d was the BCU-100 ZEGO, expected to ship by the end of the year with a tab under $10K. The 1RU device–here stacked together in a rack of nine some units—consists of a Cell processor and RSX ‘Reality Synthesizer’ GPU, a graphics chip co-developed by Nvidia and Sony for the PlayStation 3.
The ZEGO is coming out of the newly formed Sony B2B of America division of Sony Electronics. Announced earlier this year, the LA-based unit was conceived to offer integrated support to the digital cinema industry; the gear sheparded by the unit include capture with Sony CineAlta systems, storage on SRW series tape decks, codecs, and projection on the CineAlta 4K SXRD systems, according to “Toshi” Ohnisi, senior vp at Sony B2B.
Sony’s roll-out of its 4K projectors on a large scale, which has seemed an on again/off again affair struggling to take off (Sony’s not alone of course, as the rest of the industry has struggled with a range of technical and economic issues over the past few years) is now “very close to finishing” a roll-out as the company completes contracts with studios and cinemas, says Lance Kelson, businsess development manager, Beyond HD Workflow Development, Sony B2B.
Kelson lead the assembled reporters through a demo of the rack of BCU-100 units—lashed together via Gigabit Ethernet—working with the first two apps ported to the ZEGO: a rendering program from mental images and an inclusive Side Effects Software’s Houdini server tool offering modeling, lighting, advanced physical simulations, particle effects, compositing, and rendering.
The operating system is Yellow Dog Linux, provided by Colorado-based Terra Soft Solutions.
The BCU-100 scales from small 1RU units designed to meet the needs of a single CG artist to many dozens combined into rendering farms. Sony’s on the right side of the argument, as the rig is small, and its performance/watt efficiency will save on increasingly pricey electricity charges. Unlike most gear that might be used in render farms, the OpenGL-based RSX chip can also run a hi-res monitor or projector out of the box without any added graphics card, enabling less expensive set-ups.
Dr Paul Salvini, CTO and VP of Canadian Operations, Side Effects Software, answered questions after his demo artist sped through 4K compositing and effects sequences using footage from Sony’s F35 CineAlta.
Wasn't PS3 supposed to have Gigabit Ethernet?
Is the IBM PowerXCell what they'll use in the PS4?
I've just saw your piece of text between the images.Does a console really need that much power though? And could MS use Cell or do Sony have exclusivity rights?