I thought of publishing this post to serve as a note of understanding on Smart Flash Cache in an Exadata database system. It outlines the general behavior and working mechanism of flash in co-ordination with storage intelligence. Here we go –
Exadata Smart Flash Cache
The Oracle Exadata is meant to deliver high performance through its smart features. One of its smartly carved out feature is intelligent and very smart flash cache. The flash cache is a hardware component configured in the exadata storage cell server which delivers high performance in read and write operations. There are four flash cards in each of the storage cell and each flash card contains a flash disk. In total, there are 16 flash disk contained by a single storage cell server.
In Exadata x3 version, the flash cache capacity has been increased by four times, 40 times the response speed and 100GB/sec scan rates. In X3 version, the Sun flash F40 PCIe cards total upto 22.4 TB of flash compared to 5.4TB from the X2 version. In addition, they are capable of scanning the data 1.4 times faster than X2 version. The flash belongs to eMLC category (enterprise grade multi-level cell) which employs different techniques to improve flash health, endurance and most importantly, accommodate more write cycles (20k to 30k). (Write cycle – transparent process of making room for the new incoming data by flushing out old one)
Flash Cache Hardware details
Exadata X2 version – [Sun F20 cards]
Capacity = 4(no of flash cards in a cell) * 4 (no of flash disk per card) * 24GB (single flash disk capacity) = 384GB per storage cell
Exadata X3 version – [Sun F40 cards]
Capacity = 4(no of flash cards in a cell) * 4 (no of flash disk per card) * 100GB (single flash disk capacity) = 1600GB per storage cell
The motivation and how the Flash works?
The IO operations on the disk are mechanical as a block has to travel a specified path and sit at a specific location. Being a static memory, the disk operation to place a block are significantly costlier. Flash cache is a volatile memory which enables huge reduction in IO operations by replacing them fast and rapid cache operations. In an OLTP system, the database is read write intensive and data grows at an uneven rate. The flash cache is designed to reduce the IO bottlenecks and yield atleast 20x benefits by speeding up IO operations. The cache operations are fully redundant and transparent to the end user except the statistics. The flash cache can also be worked in co ordination with IORM to control the use of flash in case of multiple databases. This feature enables the customers to reserve flash for the critical databases and ensure transparent performance benefits.
The smartness of the flash comes with the fact that it has a peculiar ability to understand the type of IO(s). Its because of this intelligence that it knows what IO(s) to cache and which one to skip. Flash cache caches IO pertaining to control files, file headers, data blocks and index blocks while it skips the IO(s) incurred in backups, mirroring, data pumps, and large table scans.
Important: Do not confuse the exadata smart flash cache with the flash cache option in Oracle database 11gr2 on solaris or linux. The “database flash cache” is an extension to SGA to expand buffer cache area on the database server. But the exadata smart flash cache resides on the storage server to cache the frequently accessed data and speed up the read (also write with X3) operations by reducing disk IO(s).
The exadata X3 series makes a considerable software level changes to the flash cache by speeding up not only read operations but write operations as well. It is capable of supporting 1 million 8k flash write per second and reading 1.5 million 8k blocks. This makes the new flash 20x more efficient than the X2 or V2 series flash. So now the cache is not only capable of keeping the hot data but in fact, all the data whatever comes in from the database. The credit goes to the new working policy adopted by the flash – known as Write Back Cache.
Make the flash persistent by partitioning flash disk into grid disk
The flash memory dumps the cachable hot data which can be lost once the power goes off (by virtue of being a volatile memory). Optionally, a portion of flashcache can be partitioned and utilized as persistent logical flash disks. Thereafter, the flash disks can be used to store (not cache) the hot data and shield it from power risks. Grid disks can be carved out of the flash based cell disks which can be further assigned to an ASM disk group. The partitioning and assigning process is very similar to the physical disks partitioning.
Though the flash based grid disks defeat the purpose of cache, but it can be used as a reserve disks in case of highly intensive write operations on the disk where the existing configuration of the disk is not sufficient. Another points which discourages partitioning the disk is that the flash storage will be trimmed to half to respect the mirroring.
Flash cache working policies – WTC and WBC
There are two working mechanisms for flash cache – write through and write back. The exadata systems before X3 release worked with write through policy. It was during the announcement of X3 exadata systems in OOW 12, it was learnt that the flash cache will also support write back caching mechanism too.
Write Through Cache (Read/Write)
The older mechanism was pretty direct and straight. The data is written directly to the disk without the intervention of flash in any ways. The acknowledgment is sent back to the database by CELLSRV via iDB. Thereafter, if the data block qualifies the caching criteria – it is written to flash as well.
While reading the data blocks, the cellsrv maintains a hash lookup to map the data blocks agains the target destination i.e. flash or disk. If flash is hit, requested data is sent to the database. If cache miss, the data is read from the disk, and once again it is validated against the caching criteria. If the block qualifies to be “hot”, the block is retained in the flash cache.
What is the caching criteria? An IO comes with two metadata values – 1) the
CELL_FLASH_CACHE parameter setting defined at the segment or partition level 2) the CACHE hint associated by the database.
The metadata contains the
CELL_FLASH_CACHE parameter setting for the object. Based on its value, the data block can be cached. DEFAULT means the smart flash cache has the authority to decide whether to cache it or not. KEEP means the smart flash cache must cache the data on priority. NONE means the data block caching is not required. A huge object with DEFAULT setting will not be cached. The KEEP and DEFAULT have different retention policies. Also, the upper ceiling limit for KEEP cached blocks is 80% of the total flash cache size. In addition, the unused KEEP cached blocks are flushed off from the cache if they fail the aging criteria.
The database adds another caching hint based on purpose of the IO. It can be CACHE, NOCACHE or EVICT. The first two are pretty straight and direct. The third one EVICT hints that the specific block has to be flushed out of cache.
Write Back Cache (Read/Write)
With Exadata x3 announcement, the smart flash cache adopts WBC mechanism to speed up the read as well as write operations with backward compatibility support. This implies that WBC feature can be enabled on earlier exadata systems (X2 or V2) as well by upgrading the storage cell software version and db version. The WBC feature is supported for cell storage version is 126.96.36.199 onwards and db version 188.8.131.52 BP 9 onwards. By default, it is disabled.
The flash cache can directly service the write operations on the database. During first time inserts, a block written to the flashcache is marked as “dirty” to signify the latest copy of the block. If the database requests to update a block which doesn’t resides in flash, it is pulled from the disk into the cache, updated and marked as “dirty”. If the database requests for a block, it is read directly from the cache, thus reducing the heavy IO operations. A block written in flash and frequently accessed can stay upto years in the flash. However, if the block is rarely accessed, only its primary copy can be retained in the flash. The rest of the data can be copied back to the disk.
Steps to enable the flash in Write Back mode –
CellCli> drop flashcache
CellCLI> alter cell shutdown services cellsrv
CellCLI> alter cell flashCacheMode = WriteBack
CellCLI> alter cell startup services cellsrv
CellCLI> create flashcache all
The Write Back Cache mode can be reverted back to the Write Through Cache mode by manually flushing all the dirty blocks back to the disk
CellCLI> alter flashcache all flush
CellCLI> drop flashcache
CellCLI> alter cell shutdown services cellsrv
CellCLI> alter cell flashCacheMode=Writethrough
CellCLI> alter cell startup services cellsrv
As I am finishing the blog post, I am realizing that I have scope of publishing one more post detailing on Write Back Cache. I shall be back with more details on Write Back Cache working and some hands on