Skip to content

Commit 0001740

Browse files
johnhubbardhnaz
authored andcommitted
mm/gup: factor out duplicate code from four routines
Patch series "mm/gup: prereqs to track dma-pinned pages: FOLL_PIN", v12. Overview: This is a prerequisite to solving the problem of proper interactions between file-backed pages, and [R]DMA activities, as discussed in [1], [2], [3], and in a remarkable number of email threads since about 2017. :) A new internal gup flag, FOLL_PIN is introduced, and thoroughly documented in the last patch's Documentation/vm/pin_user_pages.rst. I believe that this will provide a good starting point for doing the layout lease work that Ira Weiny has been working on. That's because these new wrapper functions provide a clean, constrained, systematically named set of functionality that, again, is required in order to even know if a page is "dma-pinned". In contrast to earlier approaches, the page tracking can be incrementally applied to the kernel call sites that, until now, have been simply calling get_user_pages() ("gup"). In other words, opt-in by changing from this: get_user_pages() (sets FOLL_GET) put_page() to this: pin_user_pages() (sets FOLL_PIN) unpin_user_page() Testing: * I've done some overall kernel testing (LTP, and a few other goodies), and some directed testing to exercise some of the changes. And as you can see, gup_benchmark is enhanced to exercise this. Basically, I've been able to runtime test the core get_user_pages() and pin_user_pages() and related routines, but not so much on several of the call sites--but those are generally just a couple of lines changed, each. Not much of the kernel is actually using this, which on one hand reduces risk quite a lot. But on the other hand, testing coverage is low. So I'd love it if, in particular, the Infiniband and PowerPC folks could do a smoke test of this series for me. Runtime testing for the call sites so far is pretty light: * io_uring: Some directed tests from liburing exercise this, and they pass. * process_vm_access.c: A small directed test passes. * gup_benchmark: the enhanced version hits the new gup.c code, and passes. * infiniband: Ran rdma-core tests: rdma-core/build/bin/run_tests.py * VFIO: compiles (I'm vowing to set up a run time test soon, but it's not ready just yet) * powerpc: it compiles... * drm/via: compiles... * goldfish: compiles... * net/xdp: compiles... * media/v4l2: compiles... [1] Some slow progress on get_user_pages() (Apr 2, 2019): https://lwn.net/Articles/784574/ [2] DMA and get_user_pages() (LPC: Dec 12, 2018): https://lwn.net/Articles/774411/ [3] The trouble with get_user_pages() (Apr 30, 2018): https://lwn.net/Articles/753027/ This patch (of 22): There are four locations in gup.c that have a fair amount of code duplication. This means that changing one requires making the same changes in four places, not to mention reading the same code four times, and wondering if there are subtle differences. Factor out the common code into static functions, thus reducing the overall line count and the code's complexity. Also, take the opportunity to slightly improve the efficiency of the error cases, by doing a mass subtraction of the refcount, surrounded by get_page()/put_page(). Also, further simplify (slightly), by waiting until the the successful end of each routine, to increment *nr. Link: http://lkml.kernel.org/r/[email protected] Signed-off-by: John Hubbard <[email protected]> Reviewed-by: Christoph Hellwig <[email protected]> Reviewed-by: Jérôme Glisse <[email protected]> Reviewed-by: Jan Kara <[email protected]> Cc: Kirill A. Shutemov <[email protected]> Cc: Ira Weiny <[email protected]> Cc: Christoph Hellwig <[email protected]> Cc: Aneesh Kumar K.V <[email protected]> Cc: Alex Williamson <[email protected]> Cc: Björn Töpel <[email protected]> Cc: Daniel Vetter <[email protected]> Cc: Dan Williams <[email protected]> Cc: Hans Verkuil <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jason Gunthorpe <[email protected]> Cc: Jens Axboe <[email protected]> Cc: Jonathan Corbet <[email protected]> Cc: Leon Romanovsky <[email protected]> Cc: Mauro Carvalho Chehab <[email protected]> Cc: Mike Rapoport <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent 7be7621 commit 0001740

File tree

1 file changed

+40
-55
lines changed

1 file changed

+40
-55
lines changed

mm/gup.c

+40-55
Original file line numberDiff line numberDiff line change
@@ -1978,6 +1978,29 @@ static int __gup_device_huge_pud(pud_t pud, pud_t *pudp, unsigned long addr,
19781978
}
19791979
#endif
19801980

1981+
static int record_subpages(struct page *page, unsigned long addr,
1982+
unsigned long end, struct page **pages)
1983+
{
1984+
int nr;
1985+
1986+
for (nr = 0; addr != end; addr += PAGE_SIZE)
1987+
pages[nr++] = page++;
1988+
1989+
return nr;
1990+
}
1991+
1992+
static void put_compound_head(struct page *page, int refs)
1993+
{
1994+
VM_BUG_ON_PAGE(page_ref_count(page) < refs, page);
1995+
/*
1996+
* Calling put_page() for each ref is unnecessarily slow. Only the last
1997+
* ref needs a put_page().
1998+
*/
1999+
if (refs > 1)
2000+
page_ref_sub(page, refs - 1);
2001+
put_page(page);
2002+
}
2003+
19812004
#ifdef CONFIG_ARCH_HAS_HUGEPD
19822005
static unsigned long hugepte_addr_end(unsigned long addr, unsigned long end,
19832006
unsigned long sz)
@@ -2007,32 +2030,20 @@ static int gup_hugepte(pte_t *ptep, unsigned long sz, unsigned long addr,
20072030
/* hugepages are never "special" */
20082031
VM_BUG_ON(!pfn_valid(pte_pfn(pte)));
20092032

2010-
refs = 0;
20112033
head = pte_page(pte);
2012-
20132034
page = head + ((addr & (sz-1)) >> PAGE_SHIFT);
2014-
do {
2015-
VM_BUG_ON(compound_head(page) != head);
2016-
pages[*nr] = page;
2017-
(*nr)++;
2018-
page++;
2019-
refs++;
2020-
} while (addr += PAGE_SIZE, addr != end);
2035+
refs = record_subpages(page, addr, end, pages + *nr);
20212036

20222037
head = try_get_compound_head(head, refs);
2023-
if (!head) {
2024-
*nr -= refs;
2038+
if (!head)
20252039
return 0;
2026-
}
20272040

20282041
if (unlikely(pte_val(pte) != pte_val(*ptep))) {
2029-
/* Could be optimized better */
2030-
*nr -= refs;
2031-
while (refs--)
2032-
put_page(head);
2042+
put_compound_head(head, refs);
20332043
return 0;
20342044
}
20352045

2046+
*nr += refs;
20362047
SetPageReferenced(head);
20372048
return 1;
20382049
}
@@ -2079,28 +2090,19 @@ static int gup_huge_pmd(pmd_t orig, pmd_t *pmdp, unsigned long addr,
20792090
return __gup_device_huge_pmd(orig, pmdp, addr, end, pages, nr);
20802091
}
20812092

2082-
refs = 0;
20832093
page = pmd_page(orig) + ((addr & ~PMD_MASK) >> PAGE_SHIFT);
2084-
do {
2085-
pages[*nr] = page;
2086-
(*nr)++;
2087-
page++;
2088-
refs++;
2089-
} while (addr += PAGE_SIZE, addr != end);
2094+
refs = record_subpages(page, addr, end, pages + *nr);
20902095

20912096
head = try_get_compound_head(pmd_page(orig), refs);
2092-
if (!head) {
2093-
*nr -= refs;
2097+
if (!head)
20942098
return 0;
2095-
}
20962099

20972100
if (unlikely(pmd_val(orig) != pmd_val(*pmdp))) {
2098-
*nr -= refs;
2099-
while (refs--)
2100-
put_page(head);
2101+
put_compound_head(head, refs);
21012102
return 0;
21022103
}
21032104

2105+
*nr += refs;
21042106
SetPageReferenced(head);
21052107
return 1;
21062108
}
@@ -2120,28 +2122,19 @@ static int gup_huge_pud(pud_t orig, pud_t *pudp, unsigned long addr,
21202122
return __gup_device_huge_pud(orig, pudp, addr, end, pages, nr);
21212123
}
21222124

2123-
refs = 0;
21242125
page = pud_page(orig) + ((addr & ~PUD_MASK) >> PAGE_SHIFT);
2125-
do {
2126-
pages[*nr] = page;
2127-
(*nr)++;
2128-
page++;
2129-
refs++;
2130-
} while (addr += PAGE_SIZE, addr != end);
2126+
refs = record_subpages(page, addr, end, pages + *nr);
21312127

21322128
head = try_get_compound_head(pud_page(orig), refs);
2133-
if (!head) {
2134-
*nr -= refs;
2129+
if (!head)
21352130
return 0;
2136-
}
21372131

21382132
if (unlikely(pud_val(orig) != pud_val(*pudp))) {
2139-
*nr -= refs;
2140-
while (refs--)
2141-
put_page(head);
2133+
put_compound_head(head, refs);
21422134
return 0;
21432135
}
21442136

2137+
*nr += refs;
21452138
SetPageReferenced(head);
21462139
return 1;
21472140
}
@@ -2157,28 +2150,20 @@ static int gup_huge_pgd(pgd_t orig, pgd_t *pgdp, unsigned long addr,
21572150
return 0;
21582151

21592152
BUILD_BUG_ON(pgd_devmap(orig));
2160-
refs = 0;
2153+
21612154
page = pgd_page(orig) + ((addr & ~PGDIR_MASK) >> PAGE_SHIFT);
2162-
do {
2163-
pages[*nr] = page;
2164-
(*nr)++;
2165-
page++;
2166-
refs++;
2167-
} while (addr += PAGE_SIZE, addr != end);
2155+
refs = record_subpages(page, addr, end, pages + *nr);
21682156

21692157
head = try_get_compound_head(pgd_page(orig), refs);
2170-
if (!head) {
2171-
*nr -= refs;
2158+
if (!head)
21722159
return 0;
2173-
}
21742160

21752161
if (unlikely(pgd_val(orig) != pgd_val(*pgdp))) {
2176-
*nr -= refs;
2177-
while (refs--)
2178-
put_page(head);
2162+
put_compound_head(head, refs);
21792163
return 0;
21802164
}
21812165

2166+
*nr += refs;
21822167
SetPageReferenced(head);
21832168
return 1;
21842169
}

0 commit comments

Comments
 (0)