diff --git a/Documentation/guides/index.rst b/Documentation/guides/index.rst
index 86bb8d25f65..40c261921f9 100644
--- a/Documentation/guides/index.rst
+++ b/Documentation/guides/index.rst
@@ -49,3 +49,4 @@ Guides
   integrate_newlib.rst
   protected_build.rst
   platform_directories.rst
+  port_drivers_to_stm32f7.rst
diff --git a/Documentation/guides/port_drivers_to_stm32f7.rst b/Documentation/guides/port_drivers_to_stm32f7.rst
new file mode 100644
index 00000000000..aa6ec9c2d8a
--- /dev/null
+++ b/Documentation/guides/port_drivers_to_stm32f7.rst
@@ -0,0 +1,442 @@
+===============================
+Porting Drivers to the STM32 F7
+===============================
+
+.. warning:: 
+    Migrated from: 
+    https://cwiki.apache.org/confluence/display/NUTTX/Porting+Drivers+to+the+STM32+F7
+
+Problem Statement
+=================
+
+I recently completed a port to the STMicro STM32F746G Discovery board. 
+That MCU is clearly a derivative of the STM32 F3/F4 and many peripherals 
+are, in fact, essentially identical to the STM32F429. The biggest 
+difference is that the STM32F746 sports a Cortex-M7 which includes 
+several improvements over the Cortex-M4 and including, most relevant 
+to this discussion, a fully integrated data cache (`D-Cache`).
+
+Because of this one difference, I chose to provide the STM32 F7 code its 
+own directories separate from the STM32 F1, F2, F3, and F4.
+
+Porting Simple Drivers
+======================
+
+Some of the STM32 F4 drivers can be used with the STM32 F7 can be ported 
+very simply; many ports would just be a matter of copying files and some 
+search-and-replacement. Like:
+
+* Compare the two register definitions files; make sure that the STM32 
+  F4 peripheral is identical (or nearly identical) to the F7 peripheral. 
+  If so then,
+* Copy the register definition file from the ``stm32/hardware`` to the 
+  ``stm32f7/hardware`` directory, making name changes as appropriate and 
+  updating any minor register differences.
+* Copy the corresponding C file (and possibly a ``.h`` file) from the 
+  ``stm32/`` directory to the ``stm32f7/`` directory, again making any naming 
+  changes and modifications for any register differences.
+* Update the ``Make.defs`` file to include the new C file in the build.
+
+Porting Complex Drivers
+=======================
+
+The Cortex-M7 D-Cache, however, does raise issues with the compatibility 
+of most complex STM32 F4 and F7 drivers. Even though the peripheral 
+registers may be essentially the same between the STM32F429 and the 
+the STM32F746, many drivers for the STM32F429 will not be directly 
+compatible with the STM32F746, particularly drivers that use DMA. 
+And that includes most complex STM32 drivers!
+
+Cache Coherency
+===============
+
+With DMA, physical RAM memory contents is accessed directly by peripheral 
+hardware without intervention from the CPU. The CPU itself deals only the 
+indirectly with RAM through the D-Cache: When you read data from RAM, it 
+is first loaded in the D-Cache then accessed by the CPU. If the RAM 
+contents is already in the D-Cache, then physical RAM is not accessed 
+at all! Similarly, when you write data into RAM (with write buffering 
+enabled), it may actually not be written to physical RAM but may just 
+remain in the D-Cache in a `dirty` cache line until that cache line is 
+flushed to memory. Thus, there may be inconsistencies in the contents 
+of the D-Cache and in the contents of contents of physical RAM due 
+related to DMA. Such issues are referred to as `Cache Coherency` problems.
+
+DMA
+===
+
+DMA Read Accesses
+-----------------
+
+A DMA read access occurs when we program DMA hardware to read data 
+from a peripheral and store that data into RAM. This happens, for 
+example, when we read a packet from the network, when we read a 
+serial byte of data from a UART, when we read a block from an 
+MMC/SD card, and so on.
+
+In this case, the DMA hardware will change the contents of physical 
+RAM without knowledge of the CPU. So if that same memory that was 
+modified by the DMA read operation is also in the D-Cache, then 
+the contents of the D-Cache will no longer be valid; it will no 
+longer match the physical contents of the memory. In order to fix 
+this, the Cortex-M7 supports a special `cache operation` that can be 
+used to `invalidate` the D-Cache contents associate with the read DMA 
+buffer address range. Invalidation simply means discarding the 
+currently cached D-Cache lines so that they will be refetched 
+from physical RAM. **Rule 1a**: Always invalidate RX DMA buffers 
+sometime before or after starting the read DMA but certainly `before` 
+accessing the read buffer data. **Rule 1b**: Never read from the read 
+DMA buffer before the read DMA buffer completes, or otherwise you 
+will re-cache the DMA buffer content.
+
+`What if the D-Cache line is also dirty? What if we have writes to 
+the DMA buffer that were never flushed to physical RAM?` Those writes 
+will then never make it to physical memory if the D-Cache is 
+invalidated. **Rule 2**: Never write to read DMA buffer memory! 
+**Rule 3**: Make sure that all DMA read buffers are aligned to the 
+D-Cache line size so that there are no spill-over cache effects 
+at the boarders of the invalidated cache line.
+
+DMA Write Accesses
+------------------
+
+A DMA write access occurs when we program DMA hardware to write data from 
+RAM into a peripheral. This happen for example, when we send a packet on 
+a network or when we write a block of data to an MMC/SD card. In this, 
+the hardware expects the correct data to be in physical RAM when write 
+DMA is performed. If not then, the wrong data will be sent.
+
+We assure that we do not have pending writes in a `dirty` cache line by 
+`cleaning` (or `flushing`) the `dirty` cache lines; i.e., for forcing any 
+pending writes in the D-Cache lines to be written to physical RAM. 
+**Rule 4**: Always `clean` (or `flush`) the D-Cache to force all data to 
+be written from the D-Cache into physical RAM.
+
+`What if you had two adjacent DMA buffers side-by-side? Couldn't the 
+cleaning of the write buffer force writing into the adjacent read 
+buffer?`` Yes! **Rule 5**: Make sure that all DMA write buffers are 
+aligned to the D-Cache line size so that there are no spill-over 
+cache effects at the borders of the cleaned cache line.
+
+Write-back vs. Write-through D-Cache
+------------------------------------
+
+The Cortex-M7 supports both `write-back` and `write-through` data cache 
+configurations. The write-back D-Cache works just as described above: 
+`dirty` cache lines are not written to physical memory until the cache 
+line is flushed. But write-through D-Cache works just as without the 
+D-Cache. Writes always go directly to physical RAM.
+
+`If I am using a write-through D-Cache, can't I just forget about 
+cleaning the D-Cache?` No, because you don't know how a user is going 
+to configuration the D-Cache. **Rule 6**: Always assume that `write-back` 
+caching is being performed; otherwise, your driver will not be portable.
+
+You may notice in ``/arch/arm/src/armv7-m/cache.h``:
+
+.. code-block:: c
+
+    #if defined(CONFIG_ARMV7M_DCACHE) && !defined(CONFIG_ARMV7M_DCACHE_WRITETHROUGH)
+    void arch_clean_dcache(uintptr_t start, uintptr_t end);
+    #else
+    #  define arch_clean_dcache(s,e)
+    #endif
+
+NOTE: I have experienced other cases (on the SAMV7) where write buffering 
+`must` be disabled: In one case, a certain peripheral used 16-byte DMA 
+descriptors in an array. Clearly it is impossible to manage the 
+caching of the 16-byte DMA descriptors with a 32-byte cache line in 
+this case: I think that the only option is to disabled the write buffer.
+
+And what if the driver receives arbitrarily aligned buffers from the 
+application? Then what? Should write buffering be disabled in that 
+case too? And what is the performance cost for disabling the write 
+buffer?
+
+
+DMA Module
+----------
+
+Some STM32 F7 peripherals have built in DMA. The STM32 F7 Ethernet 
+driver discussed below is a good example of such a peripheral with 
+built in DMA capability. Most STM32 F7 peripherals, however, have 
+no built-in DMA capability and, instead, must use a common STM32 
+F7 DMA module to perform DMA data transfers. The interfaces to that 
+common DMA module are described in ``arch/arm/src/stm32f7/stm32_dma.h``.
+
+The DMA modules `does not do any cache operations`. Rather, the client 
+of the DMA module must perform the cache operations. Here are the 
+basic rules:
+
+* TX DMA Transfers. Before calling ``stm32_dmastart()`` to start an TX 
+  transfer, the DMA client must clean the DMA buffer so that the 
+  content to be DMA'ed is present in physical memory.
+* RX DMA transfers. At the completion of all DMAs, the DMA client 
+  will receive a callback providing the final status of the DMA 
+  transfer. For the case of RX DMA completion callbacks, logic in 
+  the callback handler should invalidate the RX buffer before any 
+  attempt is made to access new RX buffer content.
+
+Converting an STM32F429 Driver for the STM32F746
+================================================
+
+Since the STM32 F7 is so similar to the STM32 F4, we have a wealth 
+of working drivers to port from. Only a little effort is required. 
+Below is a summary of the kinds of things that you would have to do 
+to convert an STM32F429 driver to the STM32F746.
+
+An Example
+----------
+
+There is a good example in the STM32 Ethernet driver. The STM32 F7 
+Ethernet driver (``arch/arm/src/stm32f7/stm32_ethernet.c``) derives 
+directly from the STM32 F4 Ethernet driver 
+(``arch/arm/src/stm32/stm32_eth.c``). These two Ethernet MAC peripherals 
+are nearly identical. Only changes that are a direct consequence of the 
+STM32 F7 D-Cache were required to make the driver work on the STM32 F7. 
+Those changes are summarized below.
+
+Reorganize DMA Data Structure
+-----------------------------
+
+The STM32 Ethernet driver has four different kinds DMA buffers:
+
+* RX DMA descriptor,
+* TX DMA descriptors,
+* RX packet buffers, and
+* TX packet buffers,
+
+In the STM32F429 driver, these are simply implemented as part of the 
+driver data structure:
+
+.. code-block:: c
+
+    struct stm32_ethmac_s
+    {
+        ...
+        /* Descriptor allocations */
+        
+        struct eth_rxdesc_s rxtable[CONFIG_STM32_ETH_NRXDESC];
+        struct eth_txdesc_s txtable[CONFIG_STM32_ETH_NTXDESC];
+        
+        /* Buffer allocations */
+        
+        uint8_t rxbuffer[CONFIG_STM32_ETH_NRXDESC*CONFIG_STM32_ETH_BUFSIZE];
+        uint8_t alloc[STM32_ETH_NFREEBUFFERS*CONFIG_STM32_ETH_BUFSIZE];
+    };
+
+There are potentially three problems with this: (1) We don't know what 
+kind of memory the data structure will be defined in. What if it is 
+DTCM memory? Then the DMAs will fail. (2) We don't know the alignment 
+of the DMA buffers. They must be aligned on D-Cache line boundaries. 
+(3a) The size of RX or TX descriptor is either 16- or 32-bytes. In 
+order to individually clean or invalidate the cache line, they must 
+be sized in multiples of the cache line size and (3b) the same applies 
+to the DMA buffers.
+
+To fix this, several things were done:
+
+* The buffer allocations were moved from the device structure into 
+  separate declarations that can have attributes.
+* One attribute that could be added would be a section name to assure 
+  that the structures are linked into DMA-able memory (via definitions 
+  in the linker script).
+* Another attribute is that we can force the alignment of the structure 
+  to the D-Cache line size.
+
+The following definitions were added to support aligning the sizes of 
+the buffers to the Cortex-M7 D-Cache line size:
+
+.. code-block:: c
+
+    /* Buffers use fro DMA access must begin on an address aligned with the
+   * D-Cache line and must be an even multiple of the D-Cache line size.
+   * These size/alignment requirements are necessary so that D-Cache flush
+   * and invalidate operations will not have any additional effects.
+   *
+   * The TX and RX descriptors are normally 16 bytes in size but could be
+   * 32 bytes in size if the enhanced descriptor format is used (it is not).
+   */
+    
+    #define DMA_BUFFER_MASK    (ARMV7M_DCACHE_LINESIZE - 1)
+    #define DMA_ALIGN_UP(n)    (((n) + DMA_BUFFER_MASK) & ~DMA_BUFFER_MASK)
+    #define DMA_ALIGN_DOWN(n)  ((n) & ~DMA_BUFFER_MASK)
+    
+    #ifndef CONFIG_STM32F7_ETH_ENHANCEDDESC
+    #  define RXDESC_SIZE       16
+    #  define TXDESC_SIZE       16
+    #else
+    #  define RXDESC_SIZE       32
+    #  define TXDESC_SIZE       32
+    #endif
+    
+    #define RXDESC_PADSIZE      DMA_ALIGN_UP(RXDESC_SIZE)
+    #define TXDESC_PADSIZE      DMA_ALIGN_UP(TXDESC_SIZE)
+    #define ALIGNED_BUFSIZE     DMA_ALIGN_UP(ETH_BUFSIZE)
+    
+    #define RXTABLE_SIZE        (STM32F7_NETHERNET * CONFIG_STM32F7_ETH_NRXDESC)
+    #define TXTABLE_SIZE        (STM32F7_NETHERNET * CONFIG_STM32F7_ETH_NTXDESC)
+    
+    #define RXBUFFER_SIZE       (CONFIG_STM32F7_ETH_NRXDESC * ALIGNED_BUFSIZE)
+    #define RXBUFFER_ALLOC      (STM32F7_NETHERNET * RXBUFFER_SIZE)
+    
+    #define TXBUFFER_SIZE       (STM32_ETH_NFREEBUFFERS * ALIGNED_BUFSIZE)
+    #define TXBUFFER_ALLOC      (STM32F7_NETHERNET * TXBUFFER_SIZE)
+
+The RX and TX descriptor types are replace with a union type 
+that assures that the allocations will be aligned in size:
+
+.. code-block:: c
+
+    /* This union type forces the allocated size of RX descriptors to be the
+    * padded to a exact multiple of the Cortex-M7 D-Cache line size.
+    */
+     
+    union stm32_txdesc_u
+    {
+      uint8_t             pad[TXDESC_PADSIZE];
+      struct eth_txdesc_s txdesc;
+    };
+     
+    union stm32_rxdesc_u
+    {
+      uint8_t             pad[RXDESC_PADSIZE];
+      struct eth_rxdesc_s rxdesc;
+    };
+
+Then, finally, the new buffers are defined by the following globals:
+
+.. code-block:: c
+
+    /* DMA buffers.  DMA buffers must:
+    *
+    * 1. Be a multiple of the D-Cache line size.  This requirement is assured
+    *    by the definition of RXDMA buffer size above.
+    * 2. Be aligned a D-Cache line boundaries, and
+    * 3. Be positioned in DMA-able memory (*NOT* DTCM memory).  This must
+    *    be managed by logic in the linker script file.
+    *
+    * These DMA buffers are defined sequentially here to best assure optimal
+    * packing of the buffers.
+    */
+    
+    /* Descriptor allocations */
+    
+    static union stm32_rxdesc_u g_rxtable[RXTABLE_SIZE]
+    __attribute__((aligned(ARMV7M_DCACHE_LINESIZE)));
+    static union stm32_txdesc_u g_txtable[TXTABLE_SIZE]
+    __attribute__((aligned(ARMV7M_DCACHE_LINESIZE)));
+    
+    /* Buffer allocations */
+    
+    static uint8_t g_rxbuffer[RXBUFFER_ALLOC]
+    __attribute__((aligned(ARMV7M_DCACHE_LINESIZE)));
+    static uint8_t g_txbuffer[TXBUFFER_ALLOC]
+    __attribute__((aligned(ARMV7M_DCACHE_LINESIZE)));
+
+This does, of course, force additional changes to the functions 
+that initialize the buffer chains, but I will leave that to the 
+interested reader to discover.
+
+Add Cache Operations
+--------------------
+
+The Cortex-M7 cache operations are available the following file is included:
+
+
+.. code-block:: c
+
+    #include "cache.h"
+
+Here is an example where the RX descriptors are invalidated:
+
+.. code-block:: c
+
+    static int stm32_recvframe(struct stm32_ethmac_s *priv)
+    {
+    ...
+    /* Scan descriptors owned by the CPU.  */
+    
+    rxdesc = priv->rxhead;
+    
+    /* Forces the first RX descriptor to be re-read from physical memory */
+    
+    arch_invalidate_dcache((uintptr_t)rxdesc,
+                            (uintptr_t)rxdesc + sizeof(struct eth_rxdesc_s));
+    
+    for (i = 0;
+        (rxdesc->rdes0 & ETH_RDES0_OWN) == 0 &&
+            i < CONFIG_STM32F7_ETH_NRXDESC &&
+            priv->inflight < CONFIG_STM32F7_ETH_NTXDESC;
+        i++)
+        {
+        ...
+        /* Try the next descriptor */
+    
+        rxdesc = (struct eth_rxdesc_s *)rxdesc->rdes3;
+    
+        /* Force the next RX descriptor to be re-read from physical memory */
+    
+        arch_invalidate_dcache((uintptr_t)rxdesc,
+                                (uintptr_t)rxdesc + sizeof(struct eth_rxdesc_s));
+        }
+    ...
+    }
+
+Here is an example where a TX descriptor is cleaned:
+
+.. code-block:: c
+
+    static int stm32_transmit(struct stm32_ethmac_s *priv)
+    {
+    ...
+            /* Give the descriptor to DMA */
+    
+            txdesc->tdes0 |= ETH_TDES0_OWN;
+    
+            /* Flush the contents of the modified TX descriptor into physical
+            * memory.
+            */
+    
+            arch_clean_dcache((uintptr_t)txdesc,
+                                (uintptr_t)txdesc + sizeof(struct eth_txdesc_s));
+    ...
+    }
+
+Here is where the read buffer is invalidated just after 
+completed a read DMA:
+
+.. code-block:: c
+
+    static int stm32_recvframe(struct stm32_ethmac_s *priv)
+    {
+    ...
+        /* Force the completed RX DMA buffer to be re-read from
+        * physical memory.
+        */
+    
+        arch_invalidate_dcache((uintptr_t)dev->d_buf,
+                            (uintptr_t)dev->d_buf + dev->d_len);
+    
+        nllvdbg("rxhead: %p d_buf: %p d_len: %d\n",
+                priv->rxhead, dev->d_buf, dev->d_len);
+    
+        /* Return success*/
+    
+        return OK;
+    ...
+    }
+
+Here is where the write buffer in clean prior to starting a write DMA:
+
+.. code-block:: c
+
+    static int stm32_transmit(struct stm32_ethmac_s *priv)
+    {
+    ...
+    /* Flush the contents of the TX buffer into physical memory */
+    
+    arch_clean_dcache((uintptr_t)priv->dev.d_buf,
+                        (uintptr_t)priv->dev.d_buf + priv->dev.d_len);
+    ...
+    }
\ No newline at end of file