The acronym GPGPU stands for General Purpose GPU, that is using a GPU (Graphics Processing Unit) not just as a graphics processor but as a general purpose processor, taking advantage of the hundreds or even thousands of processing cores available on modern GPUs to accomplish computations with massive parallelism and speed. Introduced and steadily expanded by NVIDIA, the use of GPUs for general purpose calculation allows desktop computers to do in seconds what would take hours or days without GPU.
Both Manifold System Release 9 and Manifold Viewer provide GPGPU parallelism. In the past, Manifold Viewer was CPU parallel, but not GPU parallel like Release 9. Now, both Release 9 and Manifold Viewer automatically provide GPGPU parallelism. Fabulous!
It is not an exaggeration to say that GPGPU technology could well be the most revolutionary thing to happen in computing since the invention of the microprocessor. It's that fast, that inexpensive and has that much potential. GPGPU is so important that all Manifold users should insist that the computer hardware they procure includes a reasonably recent NVIDIA GPU, so it can be used for GPGPU parallel processing by Manifold,.
That is especially important considering the extremely low price of such GPU add-ins compared to the phenomenal performance gains they can provide. For under $100 we can buy a modern GPU card with hundreds of GPU cores. For under $200 we can purchase a modern GPU card with over 1500 cores.
The NVIDIA RTX 3090 card illustrated above provides 10,496 GPU cores for general purpose, parallel processing on a single card. The RTX 3090 is an expensive card, but it delivers genuine supercomputer performance for GPGPU processing. A less expensive option in a smaller card that is easier to fit into rigs is the RTX 3070, at 5,888 GPU cores for under $500. Plug in two such cards to get 11,776 cores for under $1000. Wow!
GPUs have become so fast that even an inexpensive GPU card, costing well under $100, can reduce the time required for some GPU-enabled analytics from over an hour without GPU parallelism to a few tens of seconds using GPGPU. For that matter, every modern system used for GIS, even very inexpensive ones, will have a GPU, just for reasonable graphics display performance if nothing else. That is a baseline part of the cost of any system used for GIS. If we choose an NVIDIA GPU for that basic capability, we almost certainly will also get a GPGPU-capable GPU at zero additional cost.
Manifold provides GPGPU capability using NVIDIA GPUs. NVIDIA's GPUs evolve rapidly, as do NVIDIA drivers for GPGPU, which NVIDIA calls CUDA. Manifold includes and automatically loads the necessary CUDA drivers for GPUs supported by Manifold. Manifold includes CUDA drivers for NVIDIA GPUs as far back as Fermi generation, which NVIDIA refers to as compute capability 2.0, as well as all more recent devices such as computed capability 3.0 (Kepler), compute capability 5.0 (Maxwell), compute capability 6.0 (Titan and upper-level GeForce GTX), and more recent, supporting CUDA 11 and more recent. Manifold tracks NVIDIA releases and updates frequently, with new CUDA levels appearing in Cutting Edge Manifold builds. See the Third Party Release Levels topic for the lastest CUDA level supported.
Manifold GPGPU modules are loaded from the GPGPU.DAT file included within installations of 64-bit versions of Manifold and Manifold Viewer. The file contains multiple versions of each module, with the GPGPU interface automatically loading the latest version that can be used with GPU devices that are installed. If the system has multiple GPGPU-capable devices, the GPGPU interface will load the latest version that can be used on all devices, to enforce consistency and avoid recompilations. Performance differences between versions are mostly small, with later versions performing very slightly better than earlier ones.
For the very newest cards, Manifold will automatically use more recent CUDA drivers than those built into Manifold if an NVIDIA card is installed with more recent CUDA drivers.
To ensure compatibility with NVIDIA hardware, Manifold uses NVIDIA drivers and software such as NVIDIA CUDA. NVIDIA discontinued GPGPU support for 32-bit Windows editions in 2018. Therefore, 32-bit versions of Manifold, both Release 9 and Viewer, also no longer support GPGPU. Please run 64-bit Windows to utilize GPGPU within Manifold Release 9 and Viewer.
We usually know if we have an NVIDIA GPU card installed in our systems, but sometimes we might not remember how many cards are installed in particular system or what they are. When using portable devices that have modern GPU integrated onto them we might want to confirm what GPU can be found.
If a GPGPU capable device is installed, Manifold will report it in Help - About dialog as a CUDA device in the GPU line, as marked with a magenta arrow in the illustration above. One card is reported with the type of card and the CUDA driver number. If we have two cards installed they are reported with less detail, for example, as GPU: CUDA (2 Devices).
For more detail, we can use GPU functions in the Command Window.
We choose View - New Command Window - SQL to launch the Command Window. If we cannot remember the name of the GPU functions we enter GPU into the filter box to reduce the long list of functions to only the GPU functions. We can then double-click on the SystemGpgpus function to use it. We enter:
? CALL SystemGpgpus()
... and then we press the ! run button in the main toolbar. We wrote ? CALL because the function returns a table, so we evaluate that using CALL.
The Results table reports all of the GPGPU-capable GPUs in our system and what each GPU reports about what it is. In the above illustration, we see we have two GPGPU capable GPUs in our system, of different generations, which Manifold can handle. Since GPU plug-in cards typically have one GPU per card the table normally reports the number of GPGPU capable cards in the system.
Manifold is inherently a parallel processing system. Whenever it makes sense to do so, Manifold will automatically utilize multiple processors or multiple processor cores by parallelizing a task into multiple threads for execution on more than one core. Given hyperthreading plus multi-core CPUs it is now routine to encounter desktop systems with 8, 16, 32 or even more CPU cores available.
In addition to this basic, parallel processing capability using multiple CPU cores Manifold also includes the ability to utilize massively parallel multiprocessing utilizing GPUs, potentially launching tasks on thousands of processing cores at once for true supercomputer computational performance, far beyond what can be achieved with CPUs.
Manifold automatically parallelizes and dispatches as many tasks as make sense to GPGPU, with automatic fallback to parallelized tasks dispatched to multiple CPU cores if a GPU is not available.
The California graphics chip company NVIDIA has long been known for producing outstanding graphics processors (GPUs) that have become popular as the basis for graphics cards. Huge financial investments by NVIDIA and other GPU makers have been made possible by the runaway popularity and size of the computer gaming market.
As an entertainment blockbuster, computer gaming is a far larger business financially than the worldwide movie industry. Over the years that vast flow of money has financed ever more computationally demanding games, which in turn have financed the creation of ever more powerful graphics processors to handle the ever greater resolution and ever more complex calculations required to give modern computer games a greater sense of reality and action.
The speed and processing power required cannot be achieved using CPUs but require supercomputer architectures in which hundreds or thousands of processing units work together. In the quest for maximum speed, NVIDIA GPUs have evolved far beyond single processors. Modern NVIDIA GPUs are not single processors but rather are parallel supercomputers on a chip that consist of very many, very fast processing cores with potentially thousands of processing cores per chip. They are so much faster at processing than whatever CPU is running Windows on the motherboard that in comparison the CPU seems no more capable than a digital coffee pot.
As GPUs increased in power it very quickly became obvious to computer scientists that programs other than graphics microcode for games could be uploaded into a GPU for parallel execution by the thousands of processing units in the GPU. Although the market impetus behind the creation of such supercomputers on a GPU chip were the computational demands of the PC gaming market, the scientific computing community began using GPUs for general purpose computing having nothing to do with games. That GPU cards were absurdly inexpensive (compared to supercomputers) because of the vast economies of scale of computer gaming was icing on the cake.
It turns out that many mathematical computations, such as matrix multiplication and transposition, which are required for complex visual and physics simulations in games are also exactly the same computations that must be performed in a wide variety of computing applications, including GIS, data mining, simulations, image processing, complex statistical analyses and many other tasks where conventional CPUs are too slow.
At first it was almost an accidental discovery that programs other than graphics microcode could be uploaded for execution into the many processing units of a modern GPU. But once it was realized there might be a market for such an approach that could provide far faster performance than running programs on the main CPU, NVIDIA took the chance of supporting the trend by investing resources into ensuring that NVIDIA GPUs could be used for GPGPU applications and by supporting such use with software and with architectural features in their GPUs to support GPGPU use.
NVIDIA created the CUDA (Compute Unified Device Architecture) interface library to allow applications developers to write code that can be uploaded into an NVIDIA GPU card for massively parallel execution by the many processing cores in the GPU. The CUDA library allows applications developers to write applications that will work with a very wide variety of NVIDIA GPUs, and it ensured that NVIDIA chips got off to an early lead among GPU vendors for use in GPGPU applications.
GPGPU offers such tremendous performance gains that all Manifold products now are designed to exploit GPGPU whenever feasible. Manifold automates this process at a breadth and depth never before seen in a commercial product, with automatic use of GPGPU throughout Manifold and Manifold SQL. If we have a reasonably recent NVIDIA GPU installed in our system, Manifold can take advantage of the phenomenal power of massively parallel processing to execute many tasks at much greater speed.
Because NVIDIA technology benefits from enormous economies of scale in the gaming market, GPGPU-capable cards have become very inexpensive for the performance they provide with a wide range of GPGPU-capable graphics cards that can be purchased at various prices and performance levels. It is easy and inexpensive to choose a card with the balance between performance and cost desired (more stream processors running at faster clock rate with more memory gives better performance).
Based on experience from GPGPU-enabled Manifold products it is clear that GPGPU will revolutionize computation. GPGPU processing is so fast that developers routinely say GPGPU renders the main processor almost superfluous, as if even the fastest multi-core Intel chip is relegated to being nothing but an accessory processor to handle the keyboard and mouse. That is not hyperbole given that GPUs can routinely run many computations hundreds of times faster than even the fastest Intel CPUs.
The first appearance of GPGPU code in Manifold products was in Manifold System Release 8.00, which has a limited but nonetheless extremely powerful ability to utilize GPGPU without requiring users to write low-level CUDA code or otherwise deal with the intimidating complexity of parallel programming.
Manifold 8 includes a Surface - Transform dialog that enables users to write expressions which perform computations on surfaces using straightforward expression syntax that is similar to how expressions can be written in SQL. Expressions written in the Surface - Transform dialog in 8 can utilize a wide range of Manifold 8 functions and operators, including 38 functions that were parallelized to utilize GPGPU automatically for computations if an NVIDIA GPU is available.
The Surface - Transform dialog in Manifold 8 takes an expression that can reference one or more surfaces, parses that expression, evaluates it using CPU computations together with automatic dispatch to GPGPU if functions supported for GPGPU are utilized in the expression and then saves the result into a new or existing surface.
Since GPGPUs can perform computations for hundreds or thousands of pixels at once, the Surface - Transform tool in Manifold 8 provides a significant performance benefit over doing the same computations on CPU, performing some computations hundreds of times faster on GPGPU than possible on the CPU.
The Surface - Transform tool in Manifold 8 works best when subexpressions are relatively bulky functions such as Aspect or Slope. That is because each subexpression in the tool must copy data to GPGPU device memory, perform computations in GPGPU and then copy the result back from GPGPU device memory into main memory. GPGPUs operate on data within the GPU device's local memory so there is an explicit copy step in both directions between the GPU device and the main computer memory used by the CPU when subexpressions are evaluated by the Surface - Transform tool in Manifold 8.
The overhead in Manifold 8 to and from GPGPU for small operations such as + or sin is so big that it does not make sense for the Surface - Transform tool to do them in GPGPU. It's faster just to do small operations on the main CPU. Therefore GPGPU in Manifold 8 is used only for more complex functions such as filters where the gain from using GPGPU is beyond the break-even time required to move data to and from the GPGPU device.
Manifold 8 makes no use of GPGPU besides the Surface - Transform dialog and its ability to utilize GPGPU within a set of 38 functions that operate on surfaces. There is no general GPGPU utilization within SQL in Manifold 8, for example, and no use of GPGPU for other tasks such as reprojecting images, manipulating vector drawings and so on.
In contrast, the Radian engine and Manifold products such as Manifold System GIS that are based on Radian are designed to utilize GPGPU automatically within SQL. That implies effective use of GPGPU throughout the entire system since SQL in Manifold underlies virtually everything in Manifold.
To use GPGPU effectively throughout the entire system Manifold must significantly reduce the overhead of utilizing GPGPU in virtually all situations, enabling expressions with many small calls to be accelerated as well as expressions with only a few bulky calls. Being able to accelerate a vast range of small calls has many big benefits, most importantly allowing the acceleration of bulky expressions that are constructed in queries which in turn utilize many small calls. This allows users to write bulky, complex queries in SQL that can benefit from GPGPU performance even if within the overall complexity of such queries there are no significantly complex or bulky built-in functions that are used.
To accomplish such unprecedented effectiveness in situations large and small, GPGPU utilization within Manifold went through numerous iterations during development with each iteration improving on the previous one. Highlights included:
Manifold started with nodes communicating with each other so that one node could leave its result on GPGPU and pass a pointer to it to other nodes, which avoided the inefficiency of downloading and then immediately re-uploading data. This gave some immediate results and reduced traffic to and from GPGPU significantly.
Manifold then merged all nodes for the same GPGPU expression into a single mega-node which would basically take the entire expression and pass it along with all arguments to a mega-function on GPGPU, which would then interpret the expression for each item, typically a pixel.. This eliminated all traffic to and from GPGPU that could be eliminated.
Finally, instead of taking the expression and packing it into a form parsable by the mega-function on GPGPU, Manifold compiled the expression directly into GPGPU code and ran that code instead of the mega-function. This eliminates unnecessary overhead on GPGPU.
The result of such continuous improvement is that now when we write something like SELECT tilea + tileb * 5 + tilec * 8 FROM ..., the Manifold engine takes the expression with three additions and two multiplications, generates GPGPU code for that function in a Just In Time (JIT) manner, uploads the resulting code to GPGPU, then uses that code to execute the computations.
To save execution time and boost efficiency, JIT code generation for GPGPU functions is cache-friendly for the driver. Running the same query again, or even running different queries for which the GPGPU expressions are sufficiently similar to each other, will engage the compilation cache maintained by the driver.
GPGPU acceleration works everywhere in Manifold SQL where worthwhile work arises: in the SELECT list, in WHERE, in EXECUTE, ...everywhere. For example, if we add to a table a computed field that combines multiple tiles together, that computed field will use GPGPU. If we do some tile math in a FUNCTION, that FUNCTION will use GPGPU as well.
If we save the project using that computed field or FUNCTION into a Manifold .map file and then bring that .map file onto a machine running Manifold that has no GPGPU, the computed field will be executed by Manifold automatically falling back to using Manifold's CPU parallelism, taking advantage of as many CPU cores are available using CPU core parallelism instead of GPGPU. If we bring the .map file back onto a machine that has a GPGPU Manifold will automatically use the GPGPU.
Other optimizations play along transparently. If a particular subexpression inside of an expression that runs on GPGPU is a constant in the context of that expression, it will only be evaluated once. If an expression that can run on GPGPU refers to data from multiple tables and has parts that only reference one of these tables, the join optimizer will split the GPGPU expression into pieces according to dependencies and will run these pieces separately and at different times, minimizing work. A SELECT with more than one thread will run multiple copies of GPGPU expressions simultaneously. There are many other similar optimizations automatically integrated with GPGPU utilization.
Note that some operations are so trivial in terms of computational requirements it makes no sense to dispatch them to GPGPU, the classic case being scalars (trivial) as opposed to tiles (more bulk). CASE expressions, conditionals and similar constructions or functions that operate on scalar values stay on the CPU while functions that operate on tile values generally go to GPGPU unless they use tiles in a trivial fashion, such as making a simple comparison.
Abs(v) takes a number and returns a number, it stays on CPU.
TileAbs(t) takes a tile and returns a tile, it can go to GPGPU.
TileContrast(t, c, p) takes a tile and two numbers, and returns a tile, it can go to GPGPU.
TileToValues stays on CPU since it is simply splitting pixels out of tile with no need for GPGPU for something so simple. If the operation was doing a computation on the pixels first and then splitting it might then be the sort of operation sent to GPGPU.
CASE conditions are scalar, so they stay on CPU. When CASE is used with tiles whether it is faster to dispatch the task to GPGPU depends on exactly how the tiles are used. Some examples where vXX are scalar values and tXX are tiles:
CASE WHEN v=2 THEN t1 ELSE t2 END
In the above not much is being done with the tiles so the entire construction stays on CPU.
CASE v WHEN 3 THEN TileAbs(t1)+ t2*t3 + TileSqrt(t4) ELSE t1 END
In the above, the expression in THEN will go to GPGPU while the rest of CASE will stay on CPU.
CASE WHEN t1 < t2 THEN 0 ELSE 8 END
In the above the comparison in WHEN does use tiles but it uses them like raw binary values, similar to how ORDER works, so it is more efficient to leave it on CPU.
While the automatic parallelization and dispatch of tasks to GPGPU is certainly cool, doing so with the automatic breadth and depth of Manifold is entirely unprecedented in a commercial software product. As a new technology extensively implemented on Windows for the first time, Manifold's extensive use of GPGPU has revealed some interesting new effects in how Windows interacts with GPGPU computation.
The key new effect of interest is that Windows does not always play well with GPU when the same GPU is used both for display functions as well as for GPGPU computations. When the same GPU is used both for display and GPGPU computation, if the code dispatched to the GPU for GPGPU computations either fails or runs for a long time, a Windows watchdog service that monitors how long it takes the GPU to process display requests will reboot the graphics stack, flushing the GPGPU computation. That will then cause Manifold to fall back to re-doing the calculation on parallelized CPU cores as if GPGPU had not been available. That's a "fail safe" action by Manifold but one which can result in much longer computation time than expected given the slower performance of CPU cores in tasks that can be faster run in GPGPU.
While bugs that cause failure of GPGPU code should, of course, either be eliminated during pre-production testing or become increasingly rare as they are eliminated in updates, the possibility of long-running GPGPU tasks will always remain because computation times depend upon the amount of data involved and the complexity of computation.
If we have only a single GPU in our system it will be used both for display by Windows and for GPGPU computations by Manifold. Manifold and GPGPU calculations in general are so fast that Windows rebooting the graphics stack due to longer running computations on GPGPU should be rare, so rare that such effects will never be encountered by almost all users. There are ways of turning the watchdog service off in Windows, but doing so may result in the display becoming obviously less responsive when long-running GPGPU tasks are being executed.
It should be emphasized that performance-robbing interactions between Windows using a GPU for display and code using the same GPU for GPGPU computations should be rare. It also makes sense to expect that anyone running such sophisticated tasks that they intensively utilize GPGPU is unlikely to have only a single GPU in their systems, given the insignificant cost (relatively speaking) to configure a system with multiple GPUs. If we have multiple GPUs in a system we can use PRAGMA to specify which are used for GPGPU to choose a GPU not used by Windows for display. But that ends up requiring a pragma directive for each such query.
A brute force strategy to force Windows not to touch GPUs utilized for computation is to install an inexpensive, non-GPGPU capable card to run displays and to plug in additional GPGPU-capable cards for computations. For example, we could use an older, pre-Kepler NVIDIA card or an AMD card or a built-into-the-motherboard Intel graphics chip for our display, none of which will be recognized as GPGPU capable by Manifold, and then we could plug in one or more Kepler or later NVIDIA GPU cards to run GPGPU calculations. Windows will be able to then happily watch the non-GPGPU graphics unit it is using for display while the GPGPU devices can take as long as they want to do their work.
Such a strategy may be taking too much of a "belt and suspenders" approach to eliminating the possibility that Windows will interfere with long-running GPGPU calculations on the card used for displays, but it could be a useful plan if massive GPGPU utilization is expected.
Does GPU parallelism speed up everything? - No. It only speeds up computational tasks that are big enough to be worth dispatching to the GPU. It does nothing at all for tasks that involve no computation. Consider a thought experiment, for example: suppose we want to copy a 10 GB file in Windows from one disk drive to another disk drive. We could have a hundred GPUs in our system and the job will not go any faster, because it involves no computation. It simply involves moving bytes between disk drives, which is a task that basically requires waiting around for bytes to be read from a terribly slow, rotating disk platter and then written onto the destination, terribly slow disk platter. Reading a large shapefile, for example, is not going to go any faster with GPU because that task also is all about waiting to get information off disk. There is no thought involved for the processor and no computation to speed up, just the very slow wait for bytes to come in from disk.
In contrast, a complex calculation doing much sophisticated mathematics for every pixel in a raster data set quite likely will gain significant performance from using GPU. The rule of thumb is that if the job is top-heavy with lots of computation, GPU parallelism will help. However, even then, the competition for GPU is CPU parallelism.
Keep in mind that Manifold automatically parallelizes both for CPU and GPU. Modern CPUs will often provide eight, twelve or more cores which can execute complex calculations with astonishing speed. When all of a CPU's cores are engaged in a parallelized task by Manifold, a modern eight-core CPU providing sixteen hypercores can execute many tasks so fast that it will be done before the job could be dispatched to GPU. In that case, Manifold will run the job using fully parallel CPU since dispatching it to GPU would be slower.
GPUs are so inexpensive it always is a good idea to toss an NVIDIA GPU into our systems. If we have bigger computational tasks, we can install more than one GPU and we can spend a bit more for a faster GPU with more CUDA cores in it. For most people, there is not much point in buying the latest, most expensive GPU since even inexpensive GPUs are so fast the number of jobs that will go significantly faster on a super-expensive GPU are few and far between. Even with very computationally intensive jobs usually a mid-range GPU is plenty.
Is an Intel graphics chip or AMD chip GPGPU capable? - Not within Manifold, and not within almost all other GPGPU-capable applications. AMD and Intel both make fine products, including graphics chips. However NVIDIA pulled ahead of both AMD and Intel in GPGPU capabilities by making a big early investment, years before AMD and Intel, in supporting GPGPU on NVIDIA GPUs with the CUDA library and with specific features within GPU silicon to support GPGPU operation. As a result, by probably a thousand to one margin, applications which do GPGPU run on NVIDIA GPUs but not on AMD or Intel GPUs. AMD and Intel are now playing catch-up, but given how far behind they are in GPGPU applications it is unclear when or if it will be possible for Manifold to provide GPU parallelism using AMD or Intel GPUs. For now, if we have an AMD or Intel graphics chip we cannot use it for GPU parallelism with Manifold.
Use the latest code - As always, it is a good idea when using GPGPU devices to update the video driver for the device from NVIDIA. This will ensure using the latest iteration of NVIDIA updates for the device, including for GPGPU.
Big GPGPU Code - GPGPU code synthesized by Manifold on the fly can use more than 4 GB of GPGPU memory per processing batch, an increase from the former limit of 4 GB per batch. The former limit was difficult to hit: To hit the limit, the query had to have a SQL expression that was dispatched entirely to GPGPU and that expression had to be extremely large. The increase in memory size is aimed at future scenarios, to allow splitting operations into bigger chunks in the future, enabling use of very large memory for processing. Actual code dispatched, of course, will be sized to memory available in whatever GPU cards are installed.
Should I spend more on GPU, on CPU, or on other hardware? - Always have at least one GPGPU capable card in your computer that delivers a few hundred CUDA cores. GPU computation is so fast that very quickly other parts of the system will become the bottleneck, so it usually makes no sense to spend many thousands on one or more GPU cards to plug into a four core CPU machine that has limited memory and slow hard disk. A single, mid-range GPU card costing $100 to $500 has so much GPGPU power that spending more is not necessary for almost all GIS work. Even after we invest into an eight to twelve core CPU with plenty of memory and multi-terabyte SSD drives, in most GIS work we might not see any difference between spending $500 on a GPU card and $2000 on a GPU card. Specialized applications that do very intensive mathematical calculations, of course, will show greater differences sooner. Because Manifold is very efficient at wringing the most out of GPGPU, the rule of thumb is to not overspend on GPU while under-spending on many-core CPU, SSD and main memory.
Must I use Quadro or Tesla or other special brands? - NVIDIA's higher end cards are sold under Quadro and Tesla branding, indicating features such as ECC memory for automatic error checking and correction. Such versions cost more than the analogous cards sold in NVIDIA's "gaming" brands such as GeForce and Titan. Manifold is happy to work with all NVIDIA cards that are designed to be plugged into standard PCs and are supported with Windows CUDA drivers.
Can I mix various GPU generations and use multiple cards? - Yes. Manifold will identify all GPGPU capable cards in the system and will take advantage of all of them, even older cards that may have fewer cores. A mix of GPU generations is very common in systems used for GIS, since GIS people often run with three display monitors while most CPU cards can only drive two monitors at a time. That means at least two GPU cards will be installed in the computer.
A popular strategy for users on a budget is to buy one more expensive card for $250 to get many cores, for example, 1000 or 2000 cores using a newer GPU generation, and then to buy an inexpensive card, for example, for $40 or $80, using an older GPU generation to power additional monitors. The high end card may have 1280 cores but the inexpensive card will have 192 or 384 cores as well. Manifold will use them all when parallelizing a task for dispatch to GPGPU: some parts of the task will run on the higher end card and some parts will run on the inexpensive card.
Can I use a mining rig GPU card? - If the mining cards are supported by NVIDIA for CUDA and GPGPU, sure, although it might not make sense to do so. "Mining" versions of GPU cards were introduced by some vendors in 2017 and later in response to immense demand for GPUs to run cryptocurrency mining tasks.
Such mining rig cards may have limitations built into them, either into the GPU chip or into the board, that were originally intended to keep the mining market separate from gaming and scientific GPGPU markets, but which now make them unsuited for GPGPU use with Manifold or other GPGPU applications. For example, cards may be limited to slower PCI bus speeds, which matters less for mining applications, or they may not be supported by CUDA drivers from NVIDIA.
Used mining rig cards may also be less stable than new cards, since they often have been overclocked or run at high temperatures. Mining rig GPUs usually have been used continuously 24 hours a day and thus may be nearing the end of their service life, at least for fans if not other components. The bottom line is that some GPU cards sold for mining or previously used for mining rigs may work and be suitable for GPGPU, and some may not.
What does the SystemGpgpuCount() function do? - The SystemGpgpuCount function shown in the Command Window illustration below reports the number of GPGPU-capable GPUs in the system.
In the Command window we enter:
... and then we press the ! run button in the main toolbar. We wrote ? without a CALL because the function returns a number, not a table, so it is evaluated without using CALL.
The report is a straightforward count of how many GPUs we have that are usable for GPGPU.
To get a listing of cards and the CUDA level they use, enter and execyte in the Command Window:
? CALL SystemGpgpus()
Windows Task Manager's Performance tab may not report actual GPGPU utilization using default settings. In the illustrations below, we show Task Manager screens in action during the Example: Enhance Terrain with Curvatures topic.
When the Slope : mean curvature template is being computed for the Crater Lake surface used in the example, the CPU runs at 100% utilization, with all 24 hypercores of the Ryzen 9 3900X being utilized 100%. Wow! However, note that the GPU 0 report in the lower left corner of the dialog reports only 1% use of the GPU.
If we switch to the GPU report, depending on how we have the GPU displays configured we may not see any GPU usage. GPUs have many uses, and the Task Manager displays might be configured to show specialized functions such as 3D, or Video Decode, and not GPGPU parallel computation. Generally, we have to choose Compute_0, Compute_1, or Cuda in the pull down lists to get a display of GPGPU function. Cuda is not always available for all GPUs, even if they are CUDA capable. If Cuda is not available, normally Compute_0 or Compute_1 will show parallel GPU computation.
In the illustration above choosing Cuda in the upper display shows 100% utilization of the GPU, all 1280 CUDA cores in the GTX 1060 GPU working at full saturation to perform the computation. That is an amazing level of utilization, only made possible by Manifold's use of parallel CPU together with massively parallel GPU to keep all of the GPU cores fully occupied. Note, however, that despite the GPU being used 100% with all cores running 100%, the readout in the lower left corner of the display still shows 0% GPU. If that is all we look at, we might wrongly conclude the GPU is not being utilized.