Syber Group
Toll Free : 855-568-TSTG(8784)
Subscribe To : Envelop Twitter Facebook Feed linkedin

ARM Launches Juno

July 18, 2014 by  
Filed under Computing

Comments Off on ARM Launches Juno

ARM has announced two programs to assist Android’s ascent into the 64-bit architecture market.

The first of those is Linaro, a port of the Android Open Source Project to the 64-bit ARMv8-A architecture. ARM said the port was done on a development board codenamed “Juno”, which is the second initiative to help Android reach the 64-bit market.

The Juno hardware development platform includes a system on chip (SoC) powered by a quad-core ARM Cortex-A53 CPU and dual-core ARM Cortex-A57 CPU in an ARM big.little processing configuration.

Juno is said to be an “open, vendor neutral ARMv8 development platform” that will also feature an ARM Mali-T624 graphics processor.

Alongside the news of the 64-bit initiatives, ARM also announced that Actions Semiconductor of China signed a license agreement for the 64-bit ARM Cortex-A50 processor family.

“Actions provides SoC solutions for portable consumer electronics,” ARM said. “With this IP license, Actions will develop 64-bit SoC solutions targeting the tablet and over-the-counter (OTT) set top box markets.”

The announcements from ARM come at an appropriate time, as it was only last week that Google announced the latest version of its Android mobile operating system, Android L, which comes with support for 64-bit processors. ARM’s latest developments mean that Android developers are likely to take advantage of them in the push to take Android to the 64-bit market.

Despite speculation that it would launch as Android 5.0 Lollipop, Google outed its next software iteration on Wednesday last week as simply Android L, touting the oddly-named iteration as “the largest update to the operating system yet”.

Source

nVidia Releases CUDA

July 10, 2014 by  
Filed under Computing

Comments Off on nVidia Releases CUDA

Nvidia has released CUDA – its code that lets developers run their code on GPUs – to server vendors in order to get 64-bit ARM cores into the high performance computing (HPC) market.

The firm said today that ARM64 server processors, which are designed for microservers and web servers because of their energy efficiency, can now process HPC workloads when paired with GPU accelerators using the Nvidia CUDA 6.5 parallel programming framework, which supports 64-bit ARM processors.

“Nvidia’s GPUs provide ARM64 server vendors with the muscle to tackle HPC workloads, enabling them to build high-performance systems that maximise the ARM architecture’s power efficiency and system configurability,” the firm said.

The first GPU-accelerated ARM64 software development servers will be available in July from Cirrascale and E4 Computer Engineering, with production systems expected to ship later this year. The Eurotech Group also plans to ship production systems later this year.

Cirrascale’s system will be the RM1905D, a high density two-in-one 1U server with two Tesla K20 GPU accelerators, which the firm claims provides high performance and low total cost of ownership for private cloud, public cloud, HPC and enterprise applications.

E4′s EK003 is a production-ready, low-power 3U dual-motherboard server appliance with two Tesla K20 GPU accelerators designed for seismic, signal and image processing, video analytics, track analysis, web applications and Mapreduce processing.

Eurotech’s system is an “ultra-high density”, energy efficient and modular Aurora HPC server configuration, based on proprietary Brick Technology and featuring direct hot liquid cooling.

Featuring Applied Micro X-Gene ARM64 CPUs and Nvidia Tesla K20 GPU accelerators, the new ARM64 servers will provide customers with an expanded range of efficient, high-performance computing options to drive compute-intensive HPC and enterprise data centre workloads, Nvidia said.

Nvidia added, “Users will immediately be able to take advantage of hundreds of existing CUDA-accelerated scientific and engineering HPC applications by simply recompiling them to ARM64 systems.”

ARM said that it is working with Nvidia to “explore how we can unite GPU acceleration with novel technologies” and drive “new levels of scientific discovery and innovation”.

Source

ARM To Focus On 64-bit SoC

May 15, 2014 by  
Filed under Computing

Comments Off on ARM To Focus On 64-bit SoC

ARM announced its first 64-bit cores a while ago and SoC makers have already rolled out several 64-bit designs. However, apart from Apple nobody has consumer oriented 64-bit ARM devices on the market just yet. They are slowly starting to show up and ARM says the transition to 64-bit parts is accelerating. However, the first wave of 64-bit ARM parts is not going after the high-end market.

Is 64-bit support on entry-level SoCs just a gimmick?

This trend raises a rather obvious question – are low end ARMv8 parts just a marketing gimmick, or do they really offer a significant performance gain? There is no straight answer at this point. It will depend on Google and chipmakers themselves, as well as phonemakers.

Qualcomm announced its first 64-bit part late last year. The Snapdragon 410 won’t turn many heads. It is going after $150 phones and it is based on Cortex A53 cores. It also has LTE, which makes it rather interesting.

MediaTek is taking a similar approach. Its quad-core MT6732 and octa-core MT6752 parts are Cortex A53 designs, too. Both sport LTE connectivity.

Qualcomm and MediaTek appear to be going after the same market – $100 to $150 phones with LTE and quad-core 64-bit stickers on the box. Marketers should like the idea, as they’re getting a few good buzzwords for entry-level gear.

However, we still don’t know much about their real-world performance. Don’t expect anything spectacular. The Cortex A53 is basically the 64-bit successor to the frugal Cortex A7. The A53 has a bit more cache, 40-bit physical addresses and it ends up a bit faster than the A7, but not by much. ARM says the A7 delivers 1.9DMIPS/MHz per core, while the A53 churns out 2.3DMIPS/MHz. That puts it in the ballpark of the good old Cortex A9. The first consumer oriented quad-core Cortex A9 part was Nvidia’s Tegra 3, so in theory a Cortex A53 quad-core could be as fast as a Tegra 3 clock-for-clock, but at 28nm we should see somewhat higher clocks, along with better graphics.

That’s not bad for $100 to $150 devices. LTE support is just the icing on the cake. Keep in mind that the Cortex A7 is ARM’s most efficient 32-bit core, hence we expect nothing less from the Cortex A53.

The Cortex A57 conundrum

Speaking to CNET’s Brooke Crothers, ARM executive vice president of corporate strategy Tom Lantzsch said the company was surprised by strong demand for 64-bit designs.

“Certainly, we’ve had big uptick in demand for mobile 64-bit products. We’ve seen this with our [Cortex] A53, a high-performance 64-bit mobile processor,” Lantzch told CNET.

He said ARM has been surprised by the pace of 64-bit adoption, with mobile parts coming from Qualcomm, MediaTek and Marvell. He said he hopes to see 64-bit phones by Christmas, although we suspect the first entry-level products will appear much sooner.

Lantzsch points out that even 32-bit code will run more efficiently on 64-bit ARMv8 parts. As software support improves, the performance gains will become more evident.

But where does this leave the Cortex A57? It is supposed to replace the Cortex A15, which had a few teething problems. Like the A15 it is a relatively big core. The A15 was simply too big and impractical on the 32nm node. On 28nm it’s better, but not perfect.  It is still a huge core and its market success has been limited.

As a result, it’s highly unlikely that we will see any 28nm Cortex A57 parts. Qualcomm’s upcoming Snapdragon 810 is the first consumer oriented A57 SoC. It is a 20nm design and it is coming later this year, just in time for Christmas as ARM puts it. However, although the Snapdragon 810 will be ready by the end of the year, the first phones based on the new chip are expected to ship in early 2015.

While we will be able to buy 64-bit Android (and possibly Windows Phone) devices before Christmas, most if not all of them will be based on the A53. That’s not necessarily a bad thing. Consumers won’t have to spend $500 to get a 64-bit ARM device, so the user base could start growing long before high-end parts start shipping, thus forcing developers and Google to speed up 64-bit development.

If rumors are to be believed, Google is doing just that and it is not shying away from small 64-bit cores. The search giant is reportedly developing a $100 Nexus phone for emerging markets. It is said to be based on MediaTek’s MT6732 clocked at 1.5GHz. Sounds interesting, provided the rumour turns out to be true.

Source

nVidia Outs CUDA 6

March 19, 2014 by  
Filed under Computing

Comments Off on nVidia Outs CUDA 6

Nvidia has made the latest GPU programming language CUDA 6 Release Candidate available for developers to download for free.

The release arrives with several new features and improvements to make parallel programming “better, faster and easier” for developers creating next generation scientific, engineering, enterprise and other applications.

Nvidia has aggressively promoted its CUDA programming language as a way for developers to exploit the floating point performance of its GPUs. Available now, the CUDA 6 Release Candidate brings a major new update in unified memory access, which lets CUDA applications access CPU and GPU memory without the need to manually copy data from one to the other.

“This is a major time saver that simplifies the programming process, and makes it easier for programmers to add GPU acceleration in a wider range of applications,” Nvidia said in a blog post on Thursday.

There’s also the addition of “drop-in libraries”, which Nvidia said will accelerate applications by up to eight times.

“The new drop-in libraries can automatically accelerate your BLAS and FFTW calculations by simply replacing the existing CPU-only BLAS or FFTW library with the new, GPU-accelerated equivalent,” the chip designer added.

Multi-GPU Scaling has also been added to the CUDA 6 programming language, introducing re-designed BLAS and FFT GPU libraries that automatically scale performance across up to eight GPUs in a single node. Nvidia said this provides over nine teraflops of double-precision performance per node, supporting larger workloads of up to 512GB in size, more than it’s supported before.

“In addition to the new features, the CUDA 6 platform offers a full suite of programming tools, GPU-accelerated math libraries, documentation and programming guides,” Nvidia said.

The previous CUDA 5.5 Release Candidate was issued last June, and added support for ARM based processors.

Aside from ARM support, Nvidia also improved Hyper-Q support in CUDA 5.5, which allowed developers to use MPI workload prioritisation. The firm also touted improved performance analysis and improved performance for cross-compilation on x86 processors.

Source

nVidia Launching New Cards

September 10, 2013 by  
Filed under Computing

Comments Off on nVidia Launching New Cards

We weren’t expecting this and it is just a rumour, but reports are emerging that Nvidia is readying two new cards for the winter season. AMD of course is launching new cards four weeks from now, so it is possible that Nvidia would try to counter it.

The big question is with what?

VideoCardz claims one of the cards is an Ultra, possibly the GTX Titan Ultra, while the second one is a dual-GPU job, the Geforce GTX 790. The Ultra is supposedly GK110 based, but it has 2880 unlocked CUDA cores, which is a bit more than the 2688 on the Titan.

The GTX 790 is said to feature two GK110 GPUs, but Nvidia will probably have to clip their wings to get a reasonable TDP.

We’re not entirely sure this is legit. It is plausible, but that doesn’t make it true. It would be good for Nvidia’s image, especially if the revamped GK110 products manage to steal the performance crown from AMD’s new Radeons. However, with such specs, they would end up quite pricey and Nvidia wouldn’t sell that many of them – most enthusiasts would probably be better off waiting for Maxwell.

Source

nVidia’s CUDA 5.5 Available

June 25, 2013 by  
Filed under Computing

Comments Off on nVidia’s CUDA 5.5 Available

Nvidia has made its CUDA 5.5 release candidate supporting ARM based processors available for download.

Nvidia has been aggressively pushing its CUDA programming language as a way for developers to exploit the floating point performance of its GPUs. Now the firm has announced the availability of a CUDA 5.5 release candidate, the first version of the language that supports ARM based processors.

Aside from ARM support, Nvidia has improved supported Hyper-Q support and now allows developers to have MPI workload prioritisation. The firm also touted improved performance analysis and improved performance for cross-compilation on x86 processors.

Ian Buck, GM of GPU Computing Software at Nvidia said, “Since developers started using CUDA in 2006, successive generations of better, exponentially faster CUDA GPUs have dramatically boosted the performance of applications on x86-based systems. With support for ARM, the new CUDA release gives developers tremendous flexibility to quickly and easily add GPU acceleration to applications on the broadest range of next-generation HPC platforms.”

Nvidia’s support for ARM processors in CUDA 5.5 is an indication that it will release CUDA enabled Tegra processors in the near future. However outside of the firm’s own Tegra processors, CUDA support is largely useless, as almost all other chip designers have chosen OpenCL as the programming language for their GPUs.

Nvidia did not say when it will release CUDA 5.5, but in the meantime the firm’s release candidate supports Windows, Mac OS X and just about every major Linux distribution.

Source

Are CUDA Applications Limited?

March 29, 2013 by  
Filed under Computing

Comments Off on Are CUDA Applications Limited?

Acceleware said at Nvidia’s GPU Technology Conference (GTC) today that most algorithms that run on GPGPUs are bound by GPU memory size.

Acceleware is partly funded by Nvidia to provide developer training for CUDA to help sell the language to those that are used to traditional C and C++ programming. The firm said that most CUDA algorithms are now limited by GPU local memory size rather than GPU computational performance.

Both AMD and Nvidia provide general purpose GPU (GPGPU) accelerator parts that provide significantly faster computational processing than traditional CPUs, however they have only between 6GB and 8GB of local memory that constrains the size of the dataset the GPU can process. While developers can push more data from system main memory, the latency cost negates the raw performance benefit of the GPU.

Kelly Goss, training program manager at Acceleware, said that “most algorithms are memory bound rather than GPU bound” and “maximising memory usage is key” to optimising GPGPU performance.

She further said that developers need to understand and take advantage of the memory hierarchy of Nvidia’s Kepler GPU and look at ways of reducing the number of memory accesses for every line of GPU computing.

The point Goss was making is that GPU computing is relatively cheap in terms of clock cycles relative to the time it takes to fetch data from local memory, let alone loading GPU memory from system main memory.

Goss, talking to a room full of developers, proceeded to outline some of the performance characteristics of the memory hierarchy in Nvidia’s Kepler GPU architecture, showing the level of detail that CUDA programmers need to pay attention to if they want to extract the full performance potential from Nvidia’s GPGPU computing architecture.

Given Goss’s observation that algorithms running on Nvidia’s GPGPUs are often constrained by local memory size rather than by the GPU itself, the firm might want to look at simplifying the tiers of memory involved and increasing the amount of GPU local memory so that CUDA software developers can process larger datasets.

Source

nVidia Speaks On Performance Issue

December 5, 2012 by  
Filed under Computing

Comments Off on nVidia Speaks On Performance Issue

Nvidia has said that most of the outlandish performance increase figures touted by GPGPU vendors was down to poor original code rather than sheer brute force computing power provided by GPUs.

Both AMD and Nvidia have been using real-world code examples and projects to promote the performance of their respective GPGPU accelerators for years, but now it seems some of the eye popping figures including speed ups of 100x or 200x were not down to just the computing power of GPGPUs. Sumit Gupta, GM of Nvidia’s Tesla business said that such figures were generally down to starting with unoptimized CPU code.

During Intel’s Xeon Phi pre-launch press conference call, the firm cast doubt on some of the orders of magnitude speed up claims that had been bandied about for years. Now Gupta told The INQUIRER that while those large speed ups did happen, it was possible because of poorly optimized code to begin with, thus the bar was set very low.

Gupta said, “Most of the time when you saw the 100x, 200x and larger numbers those came from universities. Nvidia may have taken university work and shown it and it has an 100x on it, but really most of those gains came from academic work. Typically we find when you investigate why someone got 100x [speed up] is because they didn’t have good CPU code to begin with. When you investigate why they didn’t have good CPU code you find that typically they are domain scientist’s not computer science guys – biologists, chemists, physics – and they wrote some C code and it wasn’t good on the CPU. It turns out most of those people find it easier to code in CUDA C or CUDA Fortran than they do to use MPI or Pthreads to go to multi-core CPUs, so CUDA programming for a GPU is easier than multi-core CPU programming.”

Source…