From 919609d3cba52431b16501a63e6a5d45266a25c6 Mon Sep 17 00:00:00 2001 From: Roland Dreier Date: Wed, 20 Jul 2011 06:22:21 -0700 Subject: intel-iommu: Fix AB-BA lockdep report commit 3e7abe2556b583e87dabda3e0e6178a67b20d06f upstream. When unbinding a device so that I could pass it through to a KVM VM, I got the lockdep report below. It looks like a legitimate lock ordering problem: - domain_context_mapping_one() takes iommu->lock and calls iommu_support_dev_iotlb(), which takes device_domain_lock (inside iommu->lock). - domain_remove_one_dev_info() starts by taking device_domain_lock then takes iommu->lock inside it (near the end of the function). So this is the classic AB-BA deadlock. It looks like a safe fix is to simply release device_domain_lock a bit earlier, since as far as I can tell, it doesn't protect any of the stuff accessed at the end of domain_remove_one_dev_info() anyway. BTW, the use of device_domain_lock looks a bit unsafe to me... it's at least not obvious to me why we aren't vulnerable to the race below: iommu_support_dev_iotlb() domain_remove_dev_info() lock device_domain_lock find info unlock device_domain_lock lock device_domain_lock find same info unlock device_domain_lock free_devinfo_mem(info) do stuff with info after it's free However I don't understand the locking here well enough to know if this is a real problem, let alone what the best fix is. Anyway here's the full lockdep output that prompted all of this: ======================================================= [ INFO: possible circular locking dependency detected ] 2.6.39.1+ #1 ------------------------------------------------------- bash/13954 is trying to acquire lock: (&(&iommu->lock)->rlock){......}, at: [] domain_remove_one_dev_info+0x121/0x230 but task is already holding lock: (device_domain_lock){-.-...}, at: [] domain_remove_one_dev_info+0x208/0x230 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (device_domain_lock){-.-...}: [] lock_acquire+0x9d/0x130 [] _raw_spin_lock_irqsave+0x55/0xa0 [] domain_context_mapping_one+0x600/0x750 [] domain_context_mapping+0x3f/0x120 [] iommu_prepare_identity_map+0x1c5/0x1e0 [] intel_iommu_init+0x88e/0xb5e [] pci_iommu_init+0x16/0x41 [] do_one_initcall+0x45/0x190 [] kernel_init+0xe3/0x168 [] kernel_thread_helper+0x4/0x10 -> #0 (&(&iommu->lock)->rlock){......}: [] __lock_acquire+0x195e/0x1e10 [] lock_acquire+0x9d/0x130 [] _raw_spin_lock_irqsave+0x55/0xa0 [] domain_remove_one_dev_info+0x121/0x230 [] device_notifier+0x72/0x90 [] notifier_call_chain+0x8c/0xc0 [] __blocking_notifier_call_chain+0x78/0xb0 [] blocking_notifier_call_chain+0x16/0x20 [] __device_release_driver+0xbc/0xe0 [] device_release_driver+0x2f/0x50 [] driver_unbind+0xa3/0xc0 [] drv_attr_store+0x2c/0x30 [] sysfs_write_file+0xe6/0x170 [] vfs_write+0xce/0x190 [] sys_write+0x54/0xa0 [] system_call_fastpath+0x16/0x1b other info that might help us debug this: 6 locks held by bash/13954: #0: (&buffer->mutex){+.+.+.}, at: [] sysfs_write_file+0x44/0x170 #1: (s_active#3){++++.+}, at: [] sysfs_write_file+0xcd/0x170 #2: (&__lockdep_no_validate__){+.+.+.}, at: [] driver_unbind+0x9b/0xc0 #3: (&__lockdep_no_validate__){+.+.+.}, at: [] device_release_driver+0x27/0x50 #4: (&(&priv->bus_notifier)->rwsem){.+.+.+}, at: [] __blocking_notifier_call_chain+0x5f/0xb0 #5: (device_domain_lock){-.-...}, at: [] domain_remove_one_dev_info+0x208/0x230 stack backtrace: Pid: 13954, comm: bash Not tainted 2.6.39.1+ #1 Call Trace: [] print_circular_bug+0xf7/0x100 [] __lock_acquire+0x195e/0x1e10 [] ? trace_hardirqs_off+0xd/0x10 [] ? trace_hardirqs_on_caller+0x13d/0x180 [] lock_acquire+0x9d/0x130 [] ? domain_remove_one_dev_info+0x121/0x230 [] _raw_spin_lock_irqsave+0x55/0xa0 [] ? domain_remove_one_dev_info+0x121/0x230 [] ? trace_hardirqs_off+0xd/0x10 [] domain_remove_one_dev_info+0x121/0x230 [] device_notifier+0x72/0x90 [] notifier_call_chain+0x8c/0xc0 [] __blocking_notifier_call_chain+0x78/0xb0 [] blocking_notifier_call_chain+0x16/0x20 [] __device_release_driver+0xbc/0xe0 [] device_release_driver+0x2f/0x50 [] driver_unbind+0xa3/0xc0 [] drv_attr_store+0x2c/0x30 [] sysfs_write_file+0xe6/0x170 [] vfs_write+0xce/0x190 [] sys_write+0x54/0xa0 [] system_call_fastpath+0x16/0x1b Signed-off-by: Roland Dreier Signed-off-by: David Woodhouse Cc: Steven Rostedt Signed-off-by: Greg Kroah-Hartman --- drivers/pci/intel-iommu.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'drivers/pci/intel-iommu.c') diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 0ec8930..0b4dbcd 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -3573,6 +3573,8 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain, found = 1; } + spin_unlock_irqrestore(&device_domain_lock, flags); + if (found == 0) { unsigned long tmp_flags; spin_lock_irqsave(&domain->iommu_lock, tmp_flags); @@ -3589,8 +3591,6 @@ static void domain_remove_one_dev_info(struct dmar_domain *domain, spin_unlock_irqrestore(&iommu->lock, tmp_flags); } } - - spin_unlock_irqrestore(&device_domain_lock, flags); } static void vm_domain_remove_all_dev_info(struct dmar_domain *domain) -- cgit v1.1 From 5b8692bf7ca5beed9bb75cf97b47dc1d3e5691e2 Mon Sep 17 00:00:00 2001 From: "Woodhouse, David" Date: Wed, 19 Dec 2012 13:25:35 +0000 Subject: intel-iommu: Free old page tables before creating superpage commit 6491d4d02893d9787ba67279595990217177b351 upstream. The dma_pte_free_pagetable() function will only free a page table page if it is asked to free the *entire* 2MiB range that it covers. So if a page table page was used for one or more small mappings, it's likely to end up still present in the page tables... but with no valid PTEs. This was fine when we'd only be repopulating it with 4KiB PTEs anyway but the same virtual address range can end up being reused for a *large-page* mapping. And in that case were were trying to insert the large page into the second-level page table, and getting a complaint from the sanity check in __domain_mapping() because there was already a corresponding entry. This was *relatively* harmless; it led to a memory leak of the old page table page, but no other ill-effects. Fix it by calling dma_pte_clear_range (hopefully redundant) and dma_pte_free_pagetable() before setting up the new large page. Signed-off-by: David Woodhouse Tested-by: Ravi Murty Tested-by: Sudeep Dutt Signed-off-by: Linus Torvalds Signed-off-by: CAI Qian Signed-off-by: Greg Kroah-Hartman --- drivers/pci/intel-iommu.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) (limited to 'drivers/pci/intel-iommu.c') diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 0b4dbcd..3f9a891 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -1793,10 +1793,17 @@ static int __domain_mapping(struct dmar_domain *domain, unsigned long iov_pfn, if (!pte) return -ENOMEM; /* It is large page*/ - if (largepage_lvl > 1) + if (largepage_lvl > 1) { pteval |= DMA_PTE_LARGE_PAGE; - else + /* Ensure that old small page tables are removed to make room + for superpage, if they exist. */ + dma_pte_clear_range(domain, iov_pfn, + iov_pfn + lvl_to_nr_pages(largepage_lvl) - 1); + dma_pte_free_pagetable(domain, iov_pfn, + iov_pfn + lvl_to_nr_pages(largepage_lvl) - 1); + } else { pteval &= ~(uint64_t)DMA_PTE_LARGE_PAGE; + } } /* We don't need lock here, nobody else -- cgit v1.1 From e16037a0ab5939064c95635f44e0753ecc520516 Mon Sep 17 00:00:00 2001 From: Tom Mingarelli Date: Tue, 20 Nov 2012 19:43:17 +0000 Subject: intel-iommu: Prevent devices with RMRRs from being placed into SI Domain commit ea2447f700cab264019b52e2b417d689e052dcfd upstream. This patch is to prevent non-USB devices that have RMRRs associated with them from being placed into the SI Domain during init. This fixes the issue where the RMRR info for devices being placed in and out of the SI Domain gets lost. Signed-off-by: Thomas Mingarelli Tested-by: Shuah Khan Reviewed-by: Donald Dutile Reviewed-by: Alex Williamson Signed-off-by: Joerg Roedel Signed-off-by: CAI Qian Signed-off-by: Greg Kroah-Hartman --- drivers/pci/intel-iommu.c | 31 +++++++++++++++++++++++++++++++ 1 file changed, 31 insertions(+) (limited to 'drivers/pci/intel-iommu.c') diff --git a/drivers/pci/intel-iommu.c b/drivers/pci/intel-iommu.c index 3f9a891..ae762ec 100644 --- a/drivers/pci/intel-iommu.c +++ b/drivers/pci/intel-iommu.c @@ -2289,8 +2289,39 @@ static int domain_add_dev_info(struct dmar_domain *domain, return 0; } +static bool device_has_rmrr(struct pci_dev *dev) +{ + struct dmar_rmrr_unit *rmrr; + int i; + + for_each_rmrr_units(rmrr) { + for (i = 0; i < rmrr->devices_cnt; i++) { + /* + * Return TRUE if this RMRR contains the device that + * is passed in. + */ + if (rmrr->devices[i] == dev) + return true; + } + } + return false; +} + static int iommu_should_identity_map(struct pci_dev *pdev, int startup) { + + /* + * We want to prevent any device associated with an RMRR from + * getting placed into the SI Domain. This is done because + * problems exist when devices are moved in and out of domains + * and their respective RMRR info is lost. We exempt USB devices + * from this process due to their usage of RMRRs that are known + * to not be needed after BIOS hand-off to OS. + */ + if (device_has_rmrr(pdev) && + (pdev->class >> 8) != PCI_CLASS_SERIAL_USB) + return 0; + if ((iommu_identity_mapping & IDENTMAP_AZALIA) && IS_AZALIA(pdev)) return 1; -- cgit v1.1