Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc++][modules] Rewrite the modulemap to have fewer top-level modules #107638

Merged
merged 6 commits into from
Sep 26, 2024

Conversation

ldionne
Copy link
Member

@ldionne ldionne commented Sep 6, 2024

This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the std module, libc++'s <stdlib.h> header does #include_next <stdlib.h> to get the C library's <stdlib.h>, so libc++ depends on the clib module.

However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the std module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level std module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers.

Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic std module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this.

Fixes #86193

@ldionne ldionne requested a review from a team as a code owner September 6, 2024 20:49
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Sep 6, 2024
@llvmbot
Copy link
Collaborator

llvmbot commented Sep 6, 2024

@llvm/pr-subscribers-libcxx

Author: Louis Dionne (ldionne)

Changes

This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the std module, libc++'s <stdlib.h> header does #include_next &lt;stdlib.h&gt; to get the C library's <stdlib.h>, so libc++ depends on the clib module.

However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the std module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level std module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers.

Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic std module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this.


Patch is 168.75 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/107638.diff

6 Files Affected:

  • (modified) libcxx/include/CMakeLists.txt (-1)
  • (removed) libcxx/include/__std_clang_module (-215)
  • (modified) libcxx/include/module.modulemap (+1536-2038)
  • (modified) libcxx/test/libcxx/clang_modules_include.gen.py (+1-1)
  • (modified) libcxx/utils/CMakeLists.txt (-4)
  • (removed) libcxx/utils/generate_std_clang_module_header.py (-63)
diff --git a/libcxx/include/CMakeLists.txt b/libcxx/include/CMakeLists.txt
index a571832ab724d4..d98e05a64f5ec6 100644
--- a/libcxx/include/CMakeLists.txt
+++ b/libcxx/include/CMakeLists.txt
@@ -684,7 +684,6 @@ set(files
   __ranges/views.h
   __ranges/zip_view.h
   __split_buffer
-  __std_clang_module
   __std_mbstate_t.h
   __stop_token/atomic_unique_lock.h
   __stop_token/intrusive_list_view.h
diff --git a/libcxx/include/__std_clang_module b/libcxx/include/__std_clang_module
deleted file mode 100644
index 18d6ce6b46c1f6..00000000000000
--- a/libcxx/include/__std_clang_module
+++ /dev/null
@@ -1,215 +0,0 @@
-// -*- C++ -*-
-//===----------------------------------------------------------------------===//
-//
-// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
-// See https://llvm.org/LICENSE.txt for license information.
-// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
-//
-//===----------------------------------------------------------------------===//
-
-// WARNING, this entire header is generated by
-// utils/generate_std_clang_module_header.py
-// DO NOT MODIFY!
-
-// This header should not be directly included, it's exclusively to import all
-// of the libc++ public clang modules for the `std` clang module to export. In
-// other words, it's to facilitate `@import std;` in Objective-C++ and `import std`
-// in Swift to expose all of the libc++ interfaces. This is generally not
-// recommended, however there are some clients that need to import all of libc++
-// without knowing what "all" is.
-#if !__building_module(std)
-#  error "Do not include this header directly, include individual headers instead"
-#endif
-
-#include <__config>
-
-#if !defined(_LIBCPP_HAS_NO_PRAGMA_SYSTEM_HEADER)
-#  pragma GCC system_header
-#endif
-
-#include <algorithm>
-#include <any>
-#include <array>
-#if !defined(_LIBCPP_HAS_NO_ATOMIC_HEADER)
-#  include <atomic>
-#endif
-#if !defined(_LIBCPP_HAS_NO_THREADS)
-#  include <barrier>
-#endif
-#include <bit>
-#include <bitset>
-#include <cassert>
-#include <ccomplex>
-#include <cctype>
-#include <cerrno>
-#include <cfenv>
-#include <cfloat>
-#include <charconv>
-#include <chrono>
-#include <cinttypes>
-#include <ciso646>
-#include <climits>
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <clocale>
-#endif
-#include <cmath>
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <codecvt>
-#endif
-#include <compare>
-#include <complex.h>
-#include <complex>
-#include <concepts>
-#include <condition_variable>
-#include <coroutine>
-#include <csetjmp>
-#include <csignal>
-#include <cstdarg>
-#include <cstdbool>
-#include <cstddef>
-#include <cstdint>
-#include <cstdio>
-#include <cstdlib>
-#include <cstring>
-#include <ctgmath>
-#include <ctime>
-#include <ctype.h>
-#include <cuchar>
-#if !defined(_LIBCPP_HAS_NO_WIDE_CHARACTERS)
-#  include <cwchar>
-#endif
-#if !defined(_LIBCPP_HAS_NO_WIDE_CHARACTERS)
-#  include <cwctype>
-#endif
-#include <deque>
-#include <errno.h>
-#include <exception>
-#include <execution>
-#include <expected>
-#include <experimental/iterator>
-#include <experimental/memory>
-#include <experimental/propagate_const>
-#include <experimental/simd>
-#include <experimental/type_traits>
-#include <experimental/utility>
-#include <fenv.h>
-#include <filesystem>
-#include <float.h>
-#include <format>
-#include <forward_list>
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <fstream>
-#endif
-#include <functional>
-#if !defined(_LIBCPP_HAS_NO_THREADS)
-#  include <future>
-#endif
-#include <initializer_list>
-#include <inttypes.h>
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <iomanip>
-#endif
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <ios>
-#endif
-#include <iosfwd>
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <iostream>
-#endif
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <istream>
-#endif
-#include <iterator>
-#if !defined(_LIBCPP_HAS_NO_THREADS)
-#  include <latch>
-#endif
-#include <limits>
-#include <list>
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <locale.h>
-#endif
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <locale>
-#endif
-#include <map>
-#include <math.h>
-#include <mdspan>
-#include <memory>
-#include <memory_resource>
-#include <mutex>
-#include <new>
-#include <numbers>
-#include <numeric>
-#include <optional>
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <ostream>
-#endif
-#include <print>
-#include <queue>
-#include <random>
-#include <ranges>
-#include <ratio>
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <regex>
-#endif
-#include <scoped_allocator>
-#if !defined(_LIBCPP_HAS_NO_THREADS)
-#  include <semaphore>
-#endif
-#include <set>
-#if !defined(_LIBCPP_HAS_NO_THREADS)
-#  include <shared_mutex>
-#endif
-#include <source_location>
-#include <span>
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <sstream>
-#endif
-#include <stack>
-#if !defined(_LIBCPP_HAS_NO_ATOMIC_HEADER)
-#  include <stdatomic.h>
-#endif
-#include <stdbool.h>
-#include <stddef.h>
-#include <stdexcept>
-#include <stdint.h>
-#include <stdio.h>
-#include <stdlib.h>
-#if !defined(_LIBCPP_HAS_NO_THREADS)
-#  include <stop_token>
-#endif
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <streambuf>
-#endif
-#include <string.h>
-#include <string>
-#include <string_view>
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <strstream>
-#endif
-#if !defined(_LIBCPP_HAS_NO_LOCALIZATION)
-#  include <syncstream>
-#endif
-#include <system_error>
-#include <tgmath.h>
-#if !defined(_LIBCPP_HAS_NO_THREADS)
-#  include <thread>
-#endif
-#include <tuple>
-#include <type_traits>
-#include <typeindex>
-#include <typeinfo>
-#include <uchar.h>
-#include <unordered_map>
-#include <unordered_set>
-#include <utility>
-#include <valarray>
-#include <variant>
-#include <vector>
-#include <version>
-#if !defined(_LIBCPP_HAS_NO_WIDE_CHARACTERS)
-#  include <wchar.h>
-#endif
-#if !defined(_LIBCPP_HAS_NO_WIDE_CHARACTERS)
-#  include <wctype.h>
-#endif
diff --git a/libcxx/include/module.modulemap b/libcxx/include/module.modulemap
index 65df579b8d6dd7..277c8fd965690c 100644
--- a/libcxx/include/module.modulemap
+++ b/libcxx/include/module.modulemap
@@ -1,2111 +1,1609 @@
-// Main C++ standard library interfaces
-module std_algorithm [system] {
-  header "algorithm"
-  export *
-}
-module std_any [system] {
-  header "any"
-  export *
-}
-module std_array [system] {
-  header "array"
-  export *
-}
-module std_atomic [system] {
-  header "atomic"
-  export *
-}
-module std_barrier [system] {
-  header "barrier"
-  export *
-}
-module std_bit [system] {
-  header "bit"
-  export *
-}
-module std_bitset [system] {
-  header "bitset"
-  export *
-}
-module std_charconv [system] {
-  header "charconv"
-  module chars_format            { header "__charconv/chars_format.h" }
-  module from_chars_integral     { header "__charconv/from_chars_integral.h" }
-  module from_chars_result       { header "__charconv/from_chars_result.h" }
-  module tables                  { header "__charconv/tables.h" }
-  module to_chars                { header "__charconv/to_chars.h" }
-  module to_chars_base_10        { header "__charconv/to_chars_base_10.h" }
-  module to_chars_floating_point { header "__charconv/to_chars_floating_point.h" }
-  module to_chars_integral       { header "__charconv/to_chars_integral.h" }
-  module to_chars_result         { header "__charconv/to_chars_result.h" }
-  module traits                  { header "__charconv/traits.h" }
-  export *
-}
-module std_chrono [system] {
-  header "chrono"
-  export *
-}
-module std_codecvt [system] {
-  header "codecvt"
-  export *
-}
-module std_compare [system] {
-  header "compare"
-  export *
-}
-module std_complex [system] {
-  header "complex"
-  export *
-}
-module std_concepts [system] {
-  header "concepts"
-  export *
-}
-module std_condition_variable [system] {
-  header "condition_variable"
-  module condition_variable { header "__condition_variable/condition_variable.h" }
-  export *
-}
-module std_coroutine [system] {
-  header "coroutine"
-  module coroutine_handle      { header "__coroutine/coroutine_handle.h" }
-  module coroutine_traits      { header "__coroutine/coroutine_traits.h" }
-  module noop_coroutine_handle { header "__coroutine/noop_coroutine_handle.h" }
-  module trivial_awaitables    { header "__coroutine/trivial_awaitables.h" }
-  export *
-}
-module std_deque [system] {
-  header "deque"
-  export *
-}
-module std_exception [system] {
-  header "exception"
-  export *
-}
-module std_execution [system] {
-  header "execution"
+// This module contains headers related to the configuration of the library. These headers
+// are free of any dependency on the rest of libc++.
+module std_config [system] {
+  textual header "__config"
+  textual header "__configuration/abi.h"
+  textual header "__configuration/availability.h"
+  textual header "__configuration/compiler.h"
+  textual header "__configuration/language.h"
+  textual header "__configuration/platform.h"
+  textual header "version"
   export *
 }
-module std_expected [system] {
-  header "expected"
+
+module std_core [system] {
+  module cstddef {
+    header "__cstddef/byte.h"
+    header "__cstddef/max_align_t.h"
+    header "__cstddef/nullptr_t.h"
+    header "__cstddef/ptrdiff_t.h"
+    header "__cstddef/size_t.h"
+    export *
+  }
+
+  module cstdint {
+    header "cstdint"
+    export *
+  }
+
+  module fwd {
+    header "__fwd/byte.h"
+    header "__fwd/functional.h"
+    header "__fwd/pair.h"
+    header "__fwd/tuple.h"
+    export *
+  }
+
+  module limits {
+    header "limits"
+    export *
+  }
+
+  module math {
+    header "__math/abs.h"
+    header "__math/copysign.h"
+    header "__math/error_functions.h"
+    header "__math/exponential_functions.h"
+    header "__math/fdim.h"
+    header "__math/fma.h"
+    header "__math/gamma.h"
+    header "__math/hyperbolic_functions.h"
+    header "__math/hypot.h"
+    header "__math/inverse_hyperbolic_functions.h"
+    header "__math/inverse_trigonometric_functions.h"
+    header "__math/logarithms.h"
+    header "__math/min_max.h"
+    header "__math/modulo.h"
+    header "__math/remainder.h"
+    header "__math/roots.h"
+    header "__math/rounding_functions.h"
+    header "__math/special_functions.h"
+    header "__math/traits.h"
+    header "__math/trigonometric_functions.h"
+    export *
+  }
+
+  module type_traits {
+    header "type_traits"
+    header "__type_traits/add_const.h"
+    header "__type_traits/add_cv.h"
+    header "__type_traits/add_lvalue_reference.h"
+    header "__type_traits/add_pointer.h"
+    header "__type_traits/add_rvalue_reference.h"
+    header "__type_traits/add_volatile.h"
+    header "__type_traits/aligned_storage.h"
+    header "__type_traits/aligned_union.h"
+    header "__type_traits/alignment_of.h"
+    header "__type_traits/can_extract_key.h"
+    header "__type_traits/common_reference.h"
+    header "__type_traits/common_type.h"
+    header "__type_traits/conditional.h"
+    header "__type_traits/conjunction.h"
+    header "__type_traits/copy_cv.h"
+    header "__type_traits/copy_cvref.h"
+    header "__type_traits/datasizeof.h"
+    header "__type_traits/decay.h"
+    header "__type_traits/dependent_type.h"
+    header "__type_traits/desugars_to.h"
+    header "__type_traits/disjunction.h"
+    header "__type_traits/enable_if.h"
+    header "__type_traits/extent.h"
+    header "__type_traits/has_unique_object_representation.h"
+    header "__type_traits/has_virtual_destructor.h"
+    header "__type_traits/integral_constant.h"
+    header "__type_traits/invoke.h"
+    header "__type_traits/is_abstract.h"
+    header "__type_traits/is_aggregate.h"
+    header "__type_traits/is_allocator.h"
+    header "__type_traits/is_always_bitcastable.h"
+    header "__type_traits/is_arithmetic.h"
+    header "__type_traits/is_array.h"
+    header "__type_traits/is_assignable.h"
+    header "__type_traits/is_base_of.h"
+    header "__type_traits/is_bounded_array.h"
+    header "__type_traits/is_callable.h"
+    header "__type_traits/is_char_like_type.h"
+    header "__type_traits/is_class.h"
+    header "__type_traits/is_compound.h"
+    header "__type_traits/is_const.h"
+    header "__type_traits/is_constant_evaluated.h"
+    header "__type_traits/is_constructible.h"
+    header "__type_traits/is_convertible.h"
+    header "__type_traits/is_core_convertible.h"
+    header "__type_traits/is_destructible.h"
+    header "__type_traits/is_empty.h"
+    header "__type_traits/is_enum.h"
+    header "__type_traits/is_equality_comparable.h"
+    header "__type_traits/is_execution_policy.h"
+    header "__type_traits/is_final.h"
+    header "__type_traits/is_floating_point.h"
+    header "__type_traits/is_function.h"
+    header "__type_traits/is_fundamental.h"
+    header "__type_traits/is_implicitly_default_constructible.h"
+    header "__type_traits/is_integral.h"
+    header "__type_traits/is_literal_type.h"
+    header "__type_traits/is_member_pointer.h"
+    header "__type_traits/is_nothrow_assignable.h"
+    header "__type_traits/is_nothrow_constructible.h"
+    header "__type_traits/is_nothrow_convertible.h"
+    header "__type_traits/is_nothrow_destructible.h"
+    header "__type_traits/is_null_pointer.h"
+    header "__type_traits/is_object.h"
+    header "__type_traits/is_pod.h"
+    header "__type_traits/is_pointer.h"
+    header "__type_traits/is_polymorphic.h"
+    header "__type_traits/is_primary_template.h"
+    header "__type_traits/is_reference_wrapper.h"
+    header "__type_traits/is_reference.h"
+    header "__type_traits/is_referenceable.h"
+    header "__type_traits/is_same.h"
+    header "__type_traits/is_scalar.h"
+    header "__type_traits/is_signed_integer.h"
+    header "__type_traits/is_signed.h"
+    header "__type_traits/is_specialization.h"
+    header "__type_traits/is_standard_layout.h"
+    header "__type_traits/is_swappable.h"
+    header "__type_traits/is_trivial.h"
+    header "__type_traits/is_trivially_assignable.h"
+    header "__type_traits/is_trivially_constructible.h"
+    header "__type_traits/is_trivially_copyable.h"
+    header "__type_traits/is_trivially_destructible.h"
+    header "__type_traits/is_trivially_lexicographically_comparable.h"
+    header "__type_traits/is_trivially_relocatable.h"
+    header "__type_traits/is_unbounded_array.h"
+    header "__type_traits/is_union.h"
+    header "__type_traits/is_unsigned_integer.h"
+    header "__type_traits/is_unsigned.h"
+    header "__type_traits/is_valid_expansion.h"
+    header "__type_traits/is_void.h"
+    header "__type_traits/is_volatile.h"
+    header "__type_traits/lazy.h"
+    header "__type_traits/make_32_64_or_128_bit.h"
+    header "__type_traits/make_const_lvalue_ref.h"
+    header "__type_traits/make_signed.h"
+    header "__type_traits/make_unsigned.h"
+    header "__type_traits/maybe_const.h"
+    header "__type_traits/nat.h"
+    header "__type_traits/negation.h"
+    header "__type_traits/promote.h"
+    header "__type_traits/rank.h"
+    header "__type_traits/remove_all_extents.h"
+    header "__type_traits/remove_const_ref.h"
+    header "__type_traits/remove_const.h"
+    header "__type_traits/remove_cv.h"
+    header "__type_traits/remove_cvref.h"
+    header "__type_traits/remove_extent.h"
+    header "__type_traits/remove_pointer.h"
+    header "__type_traits/remove_reference.h"
+    header "__type_traits/remove_volatile.h"
+    header "__type_traits/result_of.h"
+    header "__type_traits/strip_signature.h"
+    header "__type_traits/type_identity.h"
+    header "__type_traits/type_list.h"
+    header "__type_traits/underlying_type.h"
+    header "__type_traits/unwrap_ref.h"
+    header "__type_traits/void_t.h"
+    export *
+  } // module type_traits
+
+  // Only the truly dependency-free parts of __utility are here
+  module utility {
+    header "__utility/declval.h"
+    header "__utility/forward.h"
+    export *
+  }
+
   export *
-}
-module std_filesystem [system] {
-  header "filesystem"
-  module copy_options                 { header "__filesystem/copy_options.h" }
-  module directory_entry              { header "__filesystem/directory_entry.h" }
-  module directory_iterator           { header "__filesystem/directory_iterator.h" }
-  module directory_options            { header "__filesystem/directory_options.h" }
-  module file_status                  { header "__filesystem/file_status.h" }
-  module file_time_type               { header "__filesystem/file_time_type.h" }
-  module file_type                    { header "__filesystem/file_type.h" }
-  module filesystem_error             {
-    header "__filesystem/filesystem_error.h"
-    export std_private_memory_shared_ptr
+} // module std_core
+
+module std [system] {
+  module algorithm {
+    header "algorithm"
+    header "__algorithm/adjacent_find.h"
+    header "__algorithm/all_of.h"
+    header "__algorithm/any_of.h"
+    header "__algorithm/binary_search.h"
+    header "__algorithm/clamp.h"
+    header "__algorithm/comp_ref_type.h"
+    header "__algorithm/comp.h"
+    header "__algorithm/copy_backward.h"
+    header "__algorithm/copy_if.h"
+    header "__algorithm/copy_move_common.h"
+    header "__algorithm/copy_n.h"
+    header "__algorithm/copy.h"
+    header "__algorithm/count_if.h"
+    header "__algorithm/count.h"
+    header "__algorithm/equal_range.h"
+    header "__algorithm/equal.h"
+    header "__algorithm/fill_n.h"
+    header "__algorithm/fill.h"
+    header "__algorithm/find_end.h"
+    header "__algorithm/find_first_of.h"
+    header "__algorithm/find_if_not.h"
+    header "__algorithm/find_if.h"
+    header "__algorithm/find_segment_if.h"
+    header "__algorithm/find.h"
+    header "__algorithm/fold.h"
+    header "__algorithm/for_each_n.h"
+    header "__algorithm/for_each_segment.h"
+    header "__algorithm/for_each.h"
+    header "__algorithm/generate_n.h"
+    header "__algorithm/generate.h"
+    header "__algorithm/half_positive.h"
+    header "__algorithm/in_found_result.h"
+    header "__algorithm/in_fun_result.h"
+    header "__algorithm/in_in_out_result.h"
+    header "__algorithm/in_in_result.h"
+    header "__algorithm/in_out_out_result.h"
+    header "__algorithm/in_out_result.h"
+    header "__algorithm/includes.h"
+    header "__algorithm/inplace_merge.h"
+    header "__algorithm/is_heap_until.h"
+    header "__algorithm/is_heap.h"
+    header "__algorithm/is_partitioned.h"
+    header "__algorithm/is_permutation.h"
+    header "__algorithm/is_sorted_until.h"
+    header "__algorithm/is_sorted.h"
+    header "__algorithm/iter_swap.h"
+    header "__algorithm/iterator_operations.h"
+    header "__algorithm/lexicographical_compare_three_way.h"
+    header "__algorithm/lexicographical_compare.h"
+    header "__algorithm/lower_bound.h"
+    header "__algorithm/make_heap.h"
+    header "__algorithm/make_projected.h"
+    header "__algorithm/max_element.h"
+    header "__algorithm/max.h"
+    header "__algorithm/merge.h"
+    header "__algorithm/min_element.h"
+    header "__algorithm/min_max_result.h"
+    header "__algorithm/min.h"
+    header "__algorithm/minmax_element.h"
+    header "__algorithm/minmax.h"
+    header "__algorithm/mismatch.h"
+    header "__algorithm/move_backward.h"
+    header "__algorithm/move.h"
+    header "__algorithm/next_permutation.h"
+    header "__algorithm/none_of.h"
+    header "__algorithm/nth_element.h"
+    header "__algorithm/partial_sort_copy.h"
+    header "__algorithm/partial_sort.h"
+    header "__algorithm/partition_copy.h"
+    header "__algorithm/partition_point.h"
+    header "__algorithm/partition.h"
+    header "__algorithm/pop_heap.h"
+    header "__algorithm/prev_permutation.h"
+    header "__algorithm/pstl.h"
+    header "__algorithm/push_heap.h"
+    header "__algorithm/ranges_adjacent_find.h"
+    header "__algorithm/ranges_all_of.h"
+    header "__algorithm/ranges_any_of.h"
+    header "__algorithm/ranges_binary_search.h"
+    header "__algorithm/ranges_clamp.h"
+    header "__algorithm/ranges_contains_subrange.h"
+    header "__algorithm/ranges_contains.h"
+    header "__algorithm/ranges_copy_backward.h"
+    header "__algorithm/ranges_copy_if.h"
+    header "__algorithm/ranges_copy_n.h"
+    header "__algorithm/ranges_copy.h"
+    header "__algorithm/ranges_count_if.h"
+    header "__algorithm/ranges_count.h"
+    header "__algorithm/ranges_ends_with.h"
+    header "__algorith...
[truncated]

@ldionne
Copy link
Member Author

ldionne commented Sep 6, 2024

CC interested folks: @vgvassilev @EricWF @ian-twilightcoder

Copy link

github-actions bot commented Sep 6, 2024

⚠️ Python code formatter, darker found issues in your code. ⚠️

You can test this locally with the following command:
darker --check --diff -r 1c1bb7749860b4265c002528cbfe4b6c623b934c...e02cd479df633907a68fe5505210d4035a50a265 libcxx/test/libcxx/clang_modules_include.gen.py
View the diff from darker here.
--- clang_modules_include.gen.py	2024-09-26 13:41:47.000000 +0000
+++ clang_modules_include.gen.py	2024-09-26 13:44:53.862469 +0000
@@ -18,11 +18,12 @@
 import sys
 sys.path.append(sys.argv[1])
 from libcxx.header_information import lit_header_restrictions, public_headers
 
 for header in public_headers:
-  print(f"""\
+    print(
+        f"""\
 //--- {header}.compile.pass.cpp
 // RUN: %{{cxx}} %s %{{flags}} %{{compile_flags}} -fmodules -fcxx-modules -fmodules-cache-path=%t -fsyntax-only
 
 // GCC doesn't support -fcxx-modules
 // UNSUPPORTED: gcc
@@ -41,11 +42,12 @@
 // UNSUPPORTED: LIBCXX-PICOLIBC-FIXME
 
 {lit_header_restrictions.get(header, '')}
 
 #include <{header}>
-""")
+"""
+    )
 
 print(
     f"""\
 //--- import_std.compile.pass.mm
 // RUN: %{{cxx}} %s %{{flags}} %{{compile_flags}} -fmodules -fcxx-modules -fmodules-cache-path=%t -fsyntax-only

@ldionne ldionne force-pushed the review/modularization-top-level-std branch from 40c47c5 to 6b17118 Compare September 6, 2024 20:57
export iterator.aliasing_iterator
export iterator.iterator_traits
export type_traits.is_trivially_copyable
export utility.pair
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've run into weird issues when I do export specific_thing and export * at the same time. In theory the specific exports are redundant and already included by *, but in practice it's caused some random behavior where not everything gets exported and it doesn't even seem related to the specific exports. If we're need to export * then we do already get everything, and so we should remove or at least comment out the specific exports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-exports are useful when two header files are dependent on each other and we do not know which one is included first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but export * will export everything the module imported, and export whatever doesn't do anything if you didn't import whatever. So if you have export *, that's all you need, and in fact adding specific modules will mess clang up.

Copy link
Contributor

@ian-twilightcoder ian-twilightcoder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow this is great! It was what I tried to do initially but I wasn't intimate enough with the libc++ headers to be able to do all the cleanup you did to make this possible. Thank you!

Requesting changes for the submodule thing.

}

module math {
header "__math/abs.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could use umbrella headers if we are enumerating all headers in a subdirectory. That will make the modulemap future proof and we won’t have to change it on removing or adding new files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't even need a header, you can use umbrella directories. At least until you need to add an export to a particular submodule.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you could probably use an umbrella directory and then spell out the submodules that need exports.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umbrella directory is what I meant indeed. I forgot there is also an umbrella headers in apple’s ecosystem.

libcxx/include/module.modulemap Outdated Show resolved Hide resolved
libcxx/include/module.modulemap Outdated Show resolved Hide resolved
// This module contains headers related to the configuration of the library. These headers
// are free of any dependency on the rest of libc++.
module std_config [system] {
textual header "__config"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think any of these should actually be textual. None of them care about the preprocessor environment of the includer do they?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think they are. Some macros in __config are intended to pick up definitions that could be in the file that includes it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do those have to come from the includers or could they be set on the command line? The problem with __config being textual is that its declarations are going to be copied into every pcm that sees it. That can confuse Swift because it'll see the same declarations with different names since it uses the module name as part of the declaration name. It's probably ok since nobody's going to write anything declared in __config in code, but if we could change these to not be textual that would be ideal

Copy link
Contributor

@vgvassilev vgvassilev Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some files cannot be properly in pcms and require textual inclusion. A notable example is the cassert which needs to be expanded in every translation unit according to the current PP state. I believe these header files have similar constraints. Clang sees multiple declarations but it is usually able to merge them into a redeclaration chain. You are right that we should avoid it because longer redeclaration chains result in inefficiency but sometimes we just can’t.

Copy link
Contributor

@ian-twilightcoder ian-twilightcoder Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell these don't have the same constraint that assert does. I don't think they need to support being include multiple times with different defines, they look like they can just take a single set of defines on the command line.

Type merging basically doesn't work for Swift. In Swift the module name is part of the type name, so even if e.g. int32_t has the exact same definition in modules A and B, Swift treats A.int32_t and B.int32_t as separate and incompatible types. Say what you will about that, but it's the state we're in and so we need to be very careful that declarations don't get put into multiple modules via textual headers.

libcxx/include/module.modulemap Outdated Show resolved Hide resolved
libcxx/include/module.modulemap Outdated Show resolved Hide resolved
} // module type_traits

// Only the truly dependency-free parts of __utility are here
module utility_core {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why utility_core? It's already std_core.utility, does it really need to be std_core.utility_core?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was @philnik777 's comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When updating the module map you (or at least I) usually just grep for the header you want to add an extra include to. It's not at all obvious that it's not actually the utility you want without the _core at the end.

Copy link
Contributor

@ian-twilightcoder ian-twilightcoder Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess? Still results in a weird module name and you could grep for "utility" instead. It's not the weirdest one ever, however the rest of you want it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly concerned about the module name, since you never actually use it anywhere AFAIK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could write @import std_core.utility.declval; if you wanted to in ObjC++, or similar from Swift with C++ interop enabled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then again, you aren't supposed to include/import any of the implementation headers directly so maybe it matters even less. Still strikes me as weirdly against the pattern for one very particular use of grep. That said, I still don't feel that strongly about it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could perhaps rename all these internal modules to start with __ to make it clear that they should not be imported directly from user code?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would be a good idea for the submodules. The top level modules need to start with std_ but we could keep with the std_private_ name that we've been using?

}

module math {
header "__math/abs.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually you could probably use an umbrella directory and then spell out the submodules that need exports.


module std [system] {
module algorithm {
module adjacent_find { header "__algorithm/adjacent_find.h" }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should all of these be explicit? Otherwise including algorithm will include all of these headers even if algorithm itself doesn't.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that would make sense. However, I would suggest tackling that as a follow-up. I finally got to the point where I am able to mostly succeed the test suite after putting everything in submodules (which broke a ton of stuff that was missing includes & more). The same kind of thing seems to be happening with explicit where it's a lot stricter than without explicit.

Since almost all the benefits of this change don't require using explicit, I'd like to move forward without explicit and I can tackle that as a follow-up change.

@ldionne ldionne force-pushed the review/modularization-top-level-std branch from 3c26eb2 to 051869a Compare September 12, 2024 19:51
Copy link

github-actions bot commented Sep 12, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

export *
}

module fwd {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should prefix top-level modules that don't expose user-facing modules with __?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could keep std_private_ like we currently have

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I believe Swift relies on the name starting with std_)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only thing that Swift requires is a top-level module named std which imports everything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should verify that with @egorzhdan but I'm pretty sure Swift needs to find all of the libc++ modules and not just std

Copy link
Contributor

@egorzhdan egorzhdan Sep 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Swift relies on the name starting with std_

@ian-twilightcoder this is correct. Swift relies on the name of the top-level module to start with std. We can certainly change this in Swift, I'm just wondering if there is rationale for this change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well we haven't changed anything yet, right now all the top level modules are still std_, we were just exploring our options for top level module names. So I think std_private_ is the path of least resistance.

@ldionne ldionne force-pushed the review/modularization-top-level-std branch from 051869a to 4bc2d89 Compare September 13, 2024 13:19
libcxx/include/module.modulemap Show resolved Hide resolved
libcxx/include/module.modulemap Outdated Show resolved Hide resolved
libcxx/include/module.modulemap Outdated Show resolved Hide resolved
libcxx/include/module.modulemap Outdated Show resolved Hide resolved
libcxx/include/module.modulemap Outdated Show resolved Hide resolved
libcxx/include/module.modulemap Outdated Show resolved Hide resolved
@ldionne ldionne force-pushed the review/modularization-top-level-std branch 5 times, most recently from ecca9ca to b71b975 Compare September 16, 2024 19:06
@ldionne
Copy link
Member Author

ldionne commented Sep 16, 2024

Okay folks, the CI should be passing on this version and all the pre-requisite patches should have landed already.

I know there are some things we can still improve with this version of the modulemap, however unless there are actual blocking issues, I would favor merging this once the CI is green because this patch has a high merge conflict potential. I think we should land this ASAP and then keep improving the modulemap in follow-ups.

@ldionne ldionne force-pushed the review/modularization-top-level-std branch from b71b975 to 29ec842 Compare September 17, 2024 12:13
@vgvassilev
Copy link
Contributor

@ldionne, thank you for your efforts. This is a step in the direction where we would like to be. Do you plan further simplification of the modulemap and switching to even less modules?

@ldionne
Copy link
Member Author

ldionne commented Sep 17, 2024

@ldionne, thank you for your efforts. This is a step in the direction where we would like to be. Do you plan further simplification of the modulemap and switching to even less modules?

If we can, yes. I'd like to investigate umbrella headers to simplify the maintenance of the modulemap, but if you have other suggestions for simplifications and improvements I'd be all ears since I don't know what else we could do.

@ldionne
Copy link
Member Author

ldionne commented Sep 17, 2024

@ldionne, thank you for your efforts. This is a step in the direction where we would like to be. Do you plan further simplification of the modulemap and switching to even less modules?

If we can, yes. I'd like to investigate umbrella headers to simplify the maintenance of the modulemap, but if you have other suggestions for simplifications and improvements I'd be all ears since I don't know what else we could do.

I think umbrella headers would help us reduce the size of the modulemap and help with removal/addition of header files.

I agree.

Another super useful thing would be to introduce a CI check which essentially enforces self-contained header files -- every header file should be able to compile on its own. For instance, cat header.h header.h | gcc -fsyntax-only -xc++ -. This command concatenates header.h twice before compiling it to make sure it has proper include protectors and compiles on its own. Once our codebase becomes compliant to this simple rules we can do pretty much what we want in the modulemap.

We already have basically that: libcxx/test/libcxx/clang_modules_include.gen.py

Are you thinking about something slightly different?

@ian-twilightcoder
Copy link
Contributor

@ldionne, thank you for your efforts. This is a step in the direction where we would like to be. Do you plan further simplification of the modulemap and switching to even less modules?

If we can, yes. I'd like to investigate umbrella headers to simplify the maintenance of the modulemap, but if you have other suggestions for simplifications and improvements I'd be all ears since I don't know what else we could do.

I think umbrella headers would help us reduce the size of the modulemap and help with removal/addition of header files.

I agree.

You should be able to use umbrella directories for most of these without having to add a bunch of headers that have to be generated/maintained. The only ones where you can't are where the directory is split between different modules like __utility

@vgvassilev
Copy link
Contributor

@ldionne, thank you for your efforts. This is a step in the direction where we would like to be. Do you plan further simplification of the modulemap and switching to even less modules?

If we can, yes. I'd like to investigate umbrella headers to simplify the maintenance of the modulemap, but if you have other suggestions for simplifications and improvements I'd be all ears since I don't know what else we could do.

I think umbrella headers would help us reduce the size of the modulemap and help with removal/addition of header files.

I agree.

Another super useful thing would be to introduce a CI check which essentially enforces self-contained header files -- every header file should be able to compile on its own. For instance, cat header.h header.h | gcc -fsyntax-only -xc++ -. This command concatenates header.h twice before compiling it to make sure it has proper include protectors and compiles on its own. Once our codebase becomes compliant to this simple rules we can do pretty much what we want in the modulemap.

We already have basically that: libcxx/test/libcxx/clang_modules_include.gen.py

Are you thinking about something slightly different?

I think I am proposing a complementary check which may be a superset of what clang_modules_include.gen.py. It's independent on if we have modules or not and it should pass on every header file (private included) of libcxx.

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header
in the library, including private headers. This had the well-known
problem of making compilation times terrible, in addition to being
somewhat against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module
cache went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in
a single top-level module. Unfortunately, this doesn't work because
libc++ provides C compatibility headers (e.g. stdlib.h) which create
cycles when the C Standard Library headers are modularized too. This
is especially tricky since base systems are usually not modularized:
as far as I know, only Xcode 16 beta contains a modularized SDK that
makes this issue visible. To understand it, imagine we have the
following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the
C library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each
C compatibility header. The rest of the libc++ headers are located in
a single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the simplest
and the most effective way of doing this.
@ldionne ldionne force-pushed the review/modularization-top-level-std branch from 985b3f1 to 7e0718a Compare September 25, 2024 16:46
@ldionne
Copy link
Member Author

ldionne commented Sep 26, 2024

Okay folks, so the CI is back online and it looks like it's going to pass after my last few tweaks. I will merge this momentarily.

@ldionne ldionne merged commit bc6bd3b into llvm:main Sep 26, 2024
59 of 62 checks passed
@ldionne ldionne deleted the review/modularization-top-level-std branch September 26, 2024 17:19
augusto2112 pushed a commit to augusto2112/llvm-project that referenced this pull request Sep 26, 2024
…es (llvm#107638)

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
@hekota
Copy link
Member

hekota commented Sep 27, 2024

Hello @ldionne, me and @tex3d are seeing PR build failures in libc++ since yesterday. Could this be related to your change?

#110187
#110079

FAIL: llvm-libc++-shared.cfg.in :: std/thread/futures/futures.shared_future/get.pass.cpp (7633 of 9745)
******************** TEST 'llvm-libc++-shared.cfg.in :: std/thread/futures/futures.shared_future/get.pass.cpp' FAILED ********************
# .---command stderr------------
--
  | # \| In module 'std' imported from /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-f4kc6-1/llvm-project/github-pull-requests/libcxx/test/std/thread/futures/futures.shared_future/get.pass.cpp:20:
  | # \| /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-f4kc6-1/llvm-project/github-pull-requests/build-runtimes/libcxx/test-suite-install/include/c++/v1/__chrono/duration.h:90:101: error: no type named 'type' in 'std::common_type<long double, long double, long>'
  | # \|    90 \|     typedef typename common_type<typename _ToDuration::rep, typename _FromDuration::rep, intmax_t>::type _Ct;
  | # \|       \|             ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~
  | # \| /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-f4kc6-1/llvm-project/github-pull-requests/build-runtimes/libcxx/test-suite-install/include/c++/v1/__chrono/duration.h:107:10: note: in instantiation of member function 'std::chrono::__duration_cast<std::chrono::duration<long double>, std::chrono::duration<long double, std::ratio<1, 1000>>>::operator()' requested here
  | # \|   107 \|   return __duration_cast<duration<_Rep, _Period>, _ToDuration>()(__fd);
  | # \|       \|          ^
  | # \| /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-f4kc6-1/llvm-project/github-pull-requests/build-runtimes/libcxx/test-suite-install/include/c++/v1/__chrono/duration.h:220:24: note: in instantiation of function template specialization 'std::chrono::duration_cast<std::chrono::duration<long double, std::ratio<1, 1000>>, long double, std::ratio<1>, 0>' requested here
  | # \|   220 \|       : __rep_(chrono::duration_cast<duration>(__d).count()) {}
  | # \|       \|                        ^
  | # \| /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-f4kc6-1/llvm-project/github-pull-requests/build-runtimes/libcxx/test-suite-install/include/c++/v1/__chrono/duration.h:385:33: note: in instantiation of function template specialization 'std::chrono::duration<long double, std::ratio<1, 1000>>::duration<long double, std::ratio<1>, 0>' requested here
  | # \|   385 \|   return _Ct(__lhs).count() <=> _Ct(__rhs).count();
  | # \|       \|                                 ^
  | # \| /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-f4kc6-1/llvm-project/github-pull-requests/build-runtimes/libcxx/test-suite-install/include/c++/v1/__thread/this_thread.h:45:13: note: in instantiation of function template specialization 'std::chrono::operator<=><long long, std::ratio<1, 1000>, long double, std::ratio<1>>' requested here
  | # \|    45 \|     if (__d < __max) {
  | # \|       \|             ^
  | # \| /var/lib/buildkite-agent/builds/linux-56-59b8f5d88-f4kc6-1/llvm-project/github-pull-requests/libcxx/test/std/thread/futures/futures.shared_future/get.pass.cpp:28:23: note: in instantiation of function template specialization 'std::this_thread::sleep_for<long long, std::ratio<1, 1000>>' requested here
  | # \|    28 \|     std::this_thread::sleep_for(std::chrono::milliseconds(500));
  | # \|       \|                       ^
  | # \| 1 error generated.
  | # `-----------------------------
  | # error: command failed with exit status: 1

Sterling-Augustine pushed a commit to Sterling-Augustine/llvm-project that referenced this pull request Sep 27, 2024
…es (llvm#107638)

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
llvm-beanz added a commit to llvm-beanz/llvm-project that referenced this pull request Sep 28, 2024
modules (llvm#107638)"

This reverts commit bc6bd3b.

# Conflicts:
#	libcxx/include/module.modulemap
llvm-beanz added a commit to llvm-beanz/llvm-project that referenced this pull request Sep 28, 2024
llvm-beanz added a commit that referenced this pull request Sep 29, 2024
…el modules (#107638)" (#110384)

This reverts 3 commits:
45a09d1
24bc324
bc6bd3b

The GitHub pre-merge CI has been broken since this PR went in. This
change reverts it to see if I can get the pre-merge CI working again.
ldionne added a commit to ldionne/llvm-project that referenced this pull request Sep 30, 2024
…es (llvm#107638)

This is a re-application of bc6bd3b which was reverted in f11abac
because it broke the Clang pre-commit CI.

Original commit message:

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
@ldionne
Copy link
Member Author

ldionne commented Sep 30, 2024

For information, this was the exact thing that broke when combined with this change: #99473 (comment)

ldionne added a commit to ldionne/llvm-project that referenced this pull request Sep 30, 2024
…es (llvm#107638)

This is a re-application of bc6bd3b which was reverted in f11abac
because it broke the Clang pre-commit CI.

Original commit message:

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
…es (llvm#107638)

This patch rewrites the modulemap to have fewer top-level modules.
Previously, our modulemap had one top level module for each header in
the library, including private headers. This had the well-known problem
of making compilation times terrible, in addition to being somewhat
against the design principles of Clang modules.

This patch provides almost an order of magnitude compilation time
improvement when building modularized code (certainly subject to
variations). For example, including <ccomplex> without a module cache
went from 22.4 seconds to 1.6 seconds, a 14x improvement.

To achieve this, one might be tempted to simply put all the headers in a
single top-level module. Unfortunately, this doesn't work because libc++
provides C compatibility headers (e.g. stdlib.h) which create cycles
when the C Standard Library headers are modularized too. This is
especially tricky since base systems are usually not modularized: as far
as I know, only Xcode 16 beta contains a modularized SDK that makes this
issue visible. To understand it, imagine we have the following setup:

   // in libc++'s include/c++/v1/module.modulemap
   module std {
      header stddef.h
      header stdlib.h
   }

   // in the C library's include/module.modulemap
   module clib {
      header stddef.h
      header stdlib.h
   }

Now, imagine that the C library's <stdlib.h> includes <stddef.h>,
perhaps as an implementation detail. When building the `std` module,
libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C
library's <stdlib.h>, so libc++ depends on the `clib` module.

However, remember that the C library's <stdlib.h> header includes
<stddef.h> as an implementation detail. Since the header search paths
for libc++ are (and must be) before the search paths for the C library,
the C library ends up including libc++'s <stddef.h>, which means it
depends on the `std` module. That's a cycle.

To solve this issue, this patch creates one top-level module for each C
compatibility header. The rest of the libc++ headers are located in a
single top-level `std` module, with two main exceptions. First, the
module containing configuration headers (e.g. <__config>) has its own
top-level module too, because those headers are included by the C
compatibility headers.

Second, we create a top-level std_core module that contains several
dependency-free utilities used (directly or indirectly) from the __math
subdirectory. This is needed because __math pulls in a bunch of stuff,
and __math is used from the C compatibility header <math.h>.

As a direct benefit of this change, we don't need to generate an
artificial __std_clang_module header anymore to provide a monolithic
`std` module, since our modulemap does it naturally by construction.

A next step after this change would be to look into whether math.h
really needs to include the contents of __math, and if so, whether
libc++'s math.h truly needs to include the C library's math.h header.
Removing either dependency would break this annoying cycle.

Thanks to Eric Fiselier for pointing out this approach during a recent
meeting. This wasn't viable before some recent refactoring, but wrapping
everything (except the C headers) in a large module is by far the
simplest and the most effective way of doing this.

Fixes llvm#86193
xgupta pushed a commit to xgupta/llvm-project that referenced this pull request Oct 4, 2024
…el modules (llvm#107638)" (llvm#110384)

This reverts 3 commits:
45a09d1
24bc324
bc6bd3b

The GitHub pre-merge CI has been broken since this PR went in. This
change reverts it to see if I can get the pre-merge CI working again.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[libc++] Per-header header modules
8 participants