Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix type instability of closures capturing types (2) #40985

Merged
merged 5 commits into from
Oct 12, 2024

Conversation

simeonschaub
Copy link
Member

@simeonschaub simeonschaub commented May 28, 2021

Instead of closures lowering to typeof for the types of captured fields, this introduces a new function _typeof_captured_variable that returns Type{T} if T is a type (w/o free typevars). Also adds special inference support for that function to make it a no-op in most cases.

replaces/closes #35970
fixes #23618

@simeonschaub simeonschaub added performance Must go faster compiler:lowering Syntax lowering (compiler front end, 2nd stage) labels May 28, 2021
@oscardssmith oscardssmith added this to the 1.7 milestone May 28, 2021
base/compiler/compiler.jl Outdated Show resolved Hide resolved
base/Base.jl Outdated Show resolved Hide resolved
@tkf
Copy link
Member

tkf commented May 28, 2021

I just run ]test ChainRules. Just eyeballing the test result, I don't see a major regression. Though maybe rulesets/LinearAlgebra/symmetric.jl is slower? (307 in this PR to 251 on master) (Edit: never mind, the compilation time is < 1%)

This PR

Testing ChainRules.jl
Testing rulesets/Base/base.jl:
...
113.387431 seconds (194.65 M allocations: 11.406 GiB, 4.15% gc time, 0.18% compilation time)
Testing rulesets/Base/fastmath_able.jl:
135.469393 seconds (201.52 M allocations: 11.869 GiB, 3.89% gc time, 0.03% compilation time)
Testing rulesets/Base/evalpoly.jl:
 35.726451 seconds (64.08 M allocations: 3.702 GiB, 3.45% gc time, 76.07% compilation time)
Testing rulesets/Base/array.jl:
 10.367831 seconds (17.88 M allocations: 1.074 GiB, 3.64% gc time, 21.27% compilation time)
Testing rulesets/Base/arraymath.jl:
...
188.849576 seconds (341.39 M allocations: 19.978 GiB, 3.86% gc time, 92.38% compilation time)
Testing rulesets/Base/indexing.jl:
 17.777968 seconds (26.13 M allocations: 1.505 GiB, 3.51% gc time, 98.34% compilation time)
Testing rulesets/Base/mapreduce.jl:
113.682090 seconds (221.61 M allocations: 12.675 GiB, 5.68% gc time, 0.00% compilation time)
Testing rulesets/Base/sort.jl:
  9.309037 seconds (14.99 M allocations: 894.253 MiB, 4.40% gc time, 10.71% compilation time)

Testing rulesets/Statistics/statistics.jl:
  3.154018 seconds (5.54 M allocations: 337.662 MiB, 6.72% gc time, 18.29% compilation time)

Testing rulesets/LinearAlgebra/dense.jl:
 90.476702 seconds (144.34 M allocations: 8.792 GiB, 4.83% gc time, 20.87% compilation time)
Testing rulesets/LinearAlgebra/norm.jl:
153.307127 seconds (241.14 M allocations: 14.354 GiB, 4.56% gc time, 0.00% compilation time)
Testing rulesets/LinearAlgebra/matfun.jl:
 16.050503 seconds (27.39 M allocations: 2.706 GiB, 8.82% gc time, 0.37% compilation time)
Testing rulesets/LinearAlgebra/structured.jl:
 81.032083 seconds (112.20 M allocations: 6.698 GiB, 3.87% gc time, 21.03% compilation time)
Testing rulesets/LinearAlgebra/symmetric.jl:
307.083321 seconds (385.80 M allocations: 45.909 GiB, 6.67% gc time, 0.46% compilation time)
...
Testing rulesets/LinearAlgebra/blas.jl:
 57.792782 seconds (93.93 M allocations: 7.993 GiB, 5.22% gc time, 0.00% compilation time)
Testing rulesets/LinearAlgebra/lapack.jl:
 18.828087 seconds (29.57 M allocations: 1.831 GiB, 4.03% gc time, 17.00% compilation time)

Testing rulesets/Random/random.jl:
  0.783869 seconds (821.83 k allocations: 50.719 MiB, 90.65% compilation time)

Testing rulesets/packages/NaNMath.jl:
  0.000138 seconds (56 allocations: 4.266 KiB)

...

julia> Base.GIT_VERSION_INFO
Base.GitVersionInfo("75086967ef7e5385b54369cb0f99b8eb9cc7480b", "75086967ef", "sds/tkf/type-capturing", 1189, "2021-05-28 18:48 UTC", false, 3, 1.622195605e9)

master

Testing ChainRules.jl
Testing rulesets/Base/base.jl:
...
110.912486 seconds (191.40 M allocations: 11.192 GiB, 3.72% gc time, 0.18% compilation time)
Testing rulesets/Base/fastmath_able.jl:
133.547750 seconds (199.73 M allocations: 11.757 GiB, 3.86% gc time, 0.03% compilation time)
Testing rulesets/Base/evalpoly.jl:
 34.510045 seconds (63.45 M allocations: 3.660 GiB, 3.54% gc time, 75.96% compilation time)
Testing rulesets/Base/array.jl:
 10.106647 seconds (17.67 M allocations: 1.061 GiB, 3.55% gc time, 20.78% compilation time)
Testing rulesets/Base/arraymath.jl:
...
185.997513 seconds (339.63 M allocations: 19.862 GiB, 4.00% gc time, 92.22% compilation time)
Testing rulesets/Base/indexing.jl:
 17.365063 seconds (25.89 M allocations: 1.489 GiB, 3.21% gc time, 98.30% compilation time)
Testing rulesets/Base/mapreduce.jl:
110.458438 seconds (220.21 M allocations: 12.584 GiB, 4.98% gc time, 0.00% compilation time)
Testing rulesets/Base/sort.jl:
  9.054407 seconds (14.83 M allocations: 882.215 MiB, 4.29% gc time, 11.54% compilation time)

Testing rulesets/Statistics/statistics.jl:
  3.022534 seconds (5.51 M allocations: 335.324 MiB, 4.24% gc time, 16.47% compilation time)

Testing rulesets/LinearAlgebra/dense.jl:
 88.285892 seconds (143.68 M allocations: 8.751 GiB, 4.53% gc time, 20.88% compilation time)
Testing rulesets/LinearAlgebra/norm.jl:
150.491677 seconds (239.33 M allocations: 14.238 GiB, 4.41% gc time, 0.00% compilation time)
Testing rulesets/LinearAlgebra/matfun.jl:
 16.358479 seconds (27.57 M allocations: 2.718 GiB, 9.26% gc time, 0.36% compilation time)
Testing rulesets/LinearAlgebra/structured.jl:
 76.466074 seconds (108.54 M allocations: 6.501 GiB, 4.32% gc time, 22.90% compilation time)
Testing rulesets/LinearAlgebra/symmetric.jl:
251.823712 seconds (348.90 M allocations: 44.452 GiB, 7.87% gc time, 0.61% compilation time)
...
Testing rulesets/LinearAlgebra/blas.jl:
 56.097692 seconds (93.35 M allocations: 7.957 GiB, 5.91% gc time, 0.00% compilation time)
Testing rulesets/LinearAlgebra/lapack.jl:
 18.680355 seconds (29.96 M allocations: 1.859 GiB, 4.90% gc time, 17.17% compilation time)

Testing rulesets/Random/random.jl:
  0.771533 seconds (817.91 k allocations: 50.452 MiB, 90.70% compilation time)

Testing rulesets/packages/NaNMath.jl:
  0.000109 seconds (56 allocations: 4.266 KiB)

...

julia> Base.GIT_VERSION_INFO
Base.GitVersionInfo("61701d7a84be62beaf129b2178ede42882a7a46a", "61701d7a84", "master", 1186, "2021-05-28 09:53 UTC", false, 0, 1.622195605e9)

@simeonschaub simeonschaub marked this pull request as ready for review May 29, 2021 22:24
@KristofferC
Copy link
Member

@oscardssmith Why is this added to the 1.7 milestone?

@vtjnash vtjnash removed this from the 1.7 milestone May 31, 2021
@vtjnash vtjnash added the feature Indicates new feature / enhancement requests label May 31, 2021
@vtjnash
Copy link
Member

vtjnash commented May 31, 2021

This is not going into v1.7, but is something to review and consider for a next release

@simeonschaub simeonschaub added needs nanosoldier run This PR should have benchmarks run on it needs pkgeval Tests for all registered packages should be run with this change labels Jul 1, 2021
@simeonschaub
Copy link
Member Author

Triage echoed worries about significantly increased latency in some cases. We should at least run all Base benchmarks on this and we also talked about potentially doing a PkgEval run and analyzing whether there are any outliers in terms of run times.

@simeonschaub
Copy link
Member Author

@nanosoldier runbenchmarks(ALL, vs=":master")

@nanosoldier runtests(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here. cc @maleadt

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @christopher-dG

@simeonschaub
Copy link
Member Author

Sorry it took me so long to get back to this! Overall, I couldn't measure any significant mean increase in run time for the PkgEval jobs. Just to be safe, I'll rerun PkgEval for all packages whose tests took more than 20% longer.

@nanosoldier runtests(["HealthBase", "P4est", "Pitaya", "Geophysics", "Synthesizer", "LatinHypercubeSampling", "SequentialMonteCarlo", "Batsrus", "InterProcessCommunication", "MixedModels", "RandomizedNMF", "EasyConfig", "Crayons", "CooperativeGames", "DashBootstrapComponents", "Word2Vec", "PatModules", "IntervalTrees", "IncrementalInference", "VT100", "TraitWrappers", "URIParser", "KeywordCalls", "ProgrammableAPI", "LikelihoodProfiler", "DataFitting", "FeatureDescriptors", "StatsDiscretizations", "LittleEndianBase128", "ConventionalApp", "RealContinuedFractions", "PredictMD", "Avalon", "FromDigits", "H3", "SLEEFInline", "DashTable", "MultivariateOrthogonalPolynomials", "Vinyl", "CIAOAlgorithms", "Zabbix", "RenoiseOSC", "UnitTestDesign", "Remark", "Scalpels", "MixedModelsExtras", "RegularizationTools", "ChainLadder", "Morton", "CPLEX", "LCMGL", "LibDeflate", "SuperLU", "QuadraticToBinary", "AbstractIndices", "Gloria", "GLNS", "ArgParse", "TriangleMesh", "BlockBootstrap", "ForecastEval", "SalesForceBulkApi", "MLDataUtils", "SuiteSparseGraphBLAS", "ARFFFiles", "FeatherLib", "StippleLatex", "GtkMarkdownTextView", "ProgressMeter", "LinearFractionalTransformations", "SimpleLife", "Plots", "Glob", "BDF", "CurrenciesBase", "Skyler", "TrackingTimers", "BioServices", "BoltzmannMachines", "ClassicalOrthogonalPolynomials", "Taro", "TensorKit", "ReusePatterns", "SymArrays", "RiskAdjustedLinearizations", "YahooFinance", "JuliaPetra", "PooledArrays", "StaticNumbers", "ExponentialUtilities", "LogCompose", "GitCommand", "Pilot", "FastRounding", "TSAnalysis", "ConstantArrays", "Intervals", "NeuroCore", "Parameters", "AMRVW", "HighFrequencyCovariance", "ComputabilityTheory", "GenFlux", "Shoco", "KiteConnect", "EditionBuilders", "FastFloat16s", "ImGuiOpenGLBackend", "Yao", "ConstructiveGeometry", "CircoCore", "Vortice", "AbaqusReader", "RelocatableFolders", "BayesianNonparametrics", "NearestNeighbors", "ElementaryChemistry", "NLPModelsJuMP", "IPNets", "Tectonic", "JacobiDavidson", "Multisets", "PropertyUtils", "BitInformation", "TensArrays", "ValidatedNumerics", "EffectiveWaves", "MitosisStochasticDiffEq", "Chron", "ApproxFunFourier", "Quadrature", "StateSpaceReconstruction", "HOODESolver", "EcoSISTEM", "MPIMapReduce"], vs = ":master")

@simeonschaub simeonschaub added this to the 1.8 milestone Oct 18, 2021
@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - no new issues were detected. A full report can be found here.

@simeonschaub
Copy link
Member Author

@nanosoldier runtests(["JacobiDavidson"], vs = ":master")

@nanosoldier
Copy link
Collaborator

Your package evaluation job has completed - possible new issues were detected. A full report can be found here.

@simeonschaub
Copy link
Member Author

Yeah, doesn't seem like there are any significant outliers, so I will mark this for triage.

@simeonschaub simeonschaub added triage This should be discussed on a triage call and removed needs nanosoldier run This PR should have benchmarks run on it needs pkgeval Tests for all registered packages should be run with this change labels Oct 18, 2021
base/Base.jl Outdated
# for closures
function _typeof_captured_variable(Core.@nospecialize x)
if x isa DataType
if x.layout === C_NULL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we able to optimize this out? And what is the reason for widening only this specific kind of abstract type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently not. This is from my #35970 but I don't remember why I checked it. It could be just a silly artifact of my blind try-and-error back then.

For inference, it looks like

_typeof_captured_variable(Core.@nospecialize x) =
    isconcretetype(x) ? Core.Typeof(x) : DataType

works. But I wonder if we can just do

const _typeof_captured_variable = Core.Typeof

?

rt = Const(has_free_typevars(tv) ? typeof(tv) : Core.Typeof(tv))
elseif isType(t)
tv = t.parameters[1]
rt = Const(has_free_typevars(tv) ? typeof(tv) : Core.Typeof(tv))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems mostly likely incorrect changing an uncertain value to a guaranteed one

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean just the isType branch? Or is there an issue with this optimization as a whole?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bump. @vtjnash What would a correct version of this look like?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just delete it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me try that again. I remember there being cases where the call wouldn't be eliminated otherwise, but that was also a while ago so things might have changed

@vtjnash vtjnash removed the merge me PR is reviewed. Merge when all tests are passing label Sep 17, 2024
@simeonschaub
Copy link
Member Author

Ok, this seemed to work, all tests still pass and I've confirmed locally that the call is still eliminated in all cases I care about. Good to merge?

base/compiler/abstractinterpretation.jl Outdated Show resolved Hide resolved
test/core.jl Show resolved Hide resolved
@aviatesk aviatesk force-pushed the sds/tkf/type-capturing branch 2 times, most recently from e6eef3c to 1b749b9 Compare October 10, 2024 16:04
@aviatesk aviatesk force-pushed the sds/tkf/type-capturing branch from 1b749b9 to 7230e78 Compare October 11, 2024 03:56
@aviatesk
Copy link
Member

I tweaked the implementation of _typeof_captured_variable a bit to remove the special case in inference. I think this PR is fine now, so I'll go ahead and merge it as is.

@aviatesk aviatesk merged commit dc34428 into master Oct 12, 2024
5 of 7 checks passed
@aviatesk aviatesk deleted the sds/tkf/type-capturing branch October 12, 2024 05:07
Zentrik pushed a commit to Zentrik/julia that referenced this pull request Oct 13, 2024
Instead of closures lowering to `typeof` for the types of captured
fields, this introduces a new function `_typeof_captured_variable` that
returns `Type{T}` if `T` is a type (w/o free typevars).

- replaces/closes JuliaLang#35970
- fixes JuliaLang#23618

---------

Co-authored-by: Takafumi Arakaki <[email protected]>
Co-authored-by: Shuhei Kadowaki <[email protected]>
udesou added a commit to mmtk/julia that referenced this pull request Oct 22, 2024
* Add filesystem func to transform a path to a URI (#55454)

In a few places across Base and the stdlib, we emit paths that we like
people to be able to click on in their terminal and editor. Up to this
point, we have relied on auto-filepath detection, but this does not
allow for alternative link text, such as contracted paths.

Doing so (via OSC 8 terminal links for example) requires filepath URI
encoding.

This functionality was previously part of a PR modifying stacktrace
printing (#51816), but after that became held up for unrelated reasons
and another PR appeared that would benefit from this utility (#55335),
I've split out this functionality so it can be used before the
stacktrace printing PR is resolved.

* constrain the path argument of `include` functions to `AbstractString` (#55466)

Each `Module` defined with `module` automatically gets an `include`
function with two methods. Each of those two methods takes a file path
as its last argument. Even though the path argument is unconstrained by
dispatch, it's documented as constrained with `::AbstractString`:

https://docs.julialang.org/en/v1.11-dev/base/base/#include

Furthermore, I think that any invocation of `include` with a
non-`AbstractString` path will necessarily throw a `MethodError`
eventually. Thus this change should be harmless.

Adding the type constraint to the path argument is an improvement
because any possible exception would be thrown earlier than before.

Apart from modules defined with `module`, the same issue is present with
the anonymous modules created by `evalfile`, which is also addressed.

Sidenote: `evalfile` seems to be completely untested apart from the test
added here.

Co-authored-by: Florian <[email protected]>

* Mmap: fix grow! for non file IOs (#55849)

Fixes https://github.com/JuliaLang/julia/issues/54203
Requires #55641

Based on
https://github.com/JuliaLang/julia/pull/55641#issuecomment-2334162489
cc. @JakeZw @ronisbr

---------

Co-authored-by: Jameson Nash <[email protected]>

* codegen: split gc roots from other bits on stack (#55767)

In order to help avoid memory provenance issues, and better utilize
stack space (somewhat), and use FCA less, change the preferred
representation of an immutable object to be a pair of
`<packed-data,roots>` values. This packing requires some care at the
boundaries and if the expected field alignment exceeds that of a
pointer. The change is expected to eventually make codegen more flexible
at representing unions of values with both bits and pointer regions.

Eventually we can also have someone improve the late-gc-lowering pass to
take advantage of this increased information accuracy, but currently it
will not be any better than before at laying out the frame.

* Refactoring to be considered before adding MMTk

* Removing jl_gc_notify_image_load, since it's a new function and not part of the refactoring

* Moving gc_enable code to gc-common.c

* Addressing PR comments

* Push resolution of merge conflict

* Removing jl_gc_mark_queue_obj_explicit extern definition from scheduler.c

* Don't need the getter function since it's possible to use jl_small_typeof directly

* WIP: Adding support for MMTk/Immix

* Refactoring to be considered before adding MMTk

* Adding fastpath allocation

* Fixing removed newlines

* Refactoring to be considered before adding MMTk

* Adding a few comments; Moving some functions to be closer together

* Fixing merge conflicts

* Applying changes from refactoring before adding MMTk

* Update TaskLocalRNG docstring according to #49110 (#55863)

Since #49110, which is included in 1.10 and 1.11, spawning a task no
longer advances the parent task's RNG state, so this statement in the
docs was incorrect.

* Root globals in toplevel exprs (#54433)

This fixes #54422, the code here assumes that top level exprs are always
rooted, but I don't see that referenced anywhere else, or guaranteed, so
conservatively always root objects that show up in code.

* codegen: fix alignment typos (#55880)

So easy to type jl_datatype_align to get the natural alignment instead
of julia_alignment to get the actual alignment. This should fix the
Revise workload.

Change is visible with
```
julia> code_llvm(Random.XoshiroSimd.forkRand, (Random.TaskLocalRNG, Base.Val{8}))
```

* Fix some corner cases of `isapprox` with unsigned integers (#55828)

* 🤖 [master] Bump the Pkg stdlib from ef9f76c17 to 51d4910c1 (#55896)

* Profile: fix order of fields in heapsnapshot & improve formatting (#55890)

* Profile: Improve generation of clickable terminal links (#55857)

* inference: add missing `TypeVar` handling for `instanceof_tfunc` (#55884)

I thought these sort of problems had been addressed by d60f92c, but it
seems some were missed. Specifically, `t.a` and `t.b` from `t::Union`
could be `TypeVar`, and if they are passed to a subroutine or recursed
without being unwrapped or rewrapped, errors like JuliaLang/julia#55882
could occur.

This commit resolves the issue by calling `unwraptv` in the `Union`
handling within `instanceof_tfunc`. I also found a similar issue inside
`nfields_tfunc`, so that has also been fixed, and test cases have been
added. While I haven't been able to make up a test case specifically for
the fix in `instanceof_tfunc`, I have confirmed that this commit
certainly fixes the issue reported in JuliaLang/julia#55882.

- fixes JuliaLang/julia#55882

* Install terminfo data under /usr/share/julia (#55881)

Just like all other libraries, we don't want internal Julia files to
mess with system files.

Introduced by https://github.com/JuliaLang/julia/pull/55411.

* expose metric to report reasons why full GCs were triggered (#55826)

Additional GC observability tool.

This will help us to diagnose why some of our servers are triggering so
many full GCs in certain circumstances.

* Revert "Improve printing of several arguments" (#55894)

Reverts JuliaLang/julia#55754 as it overrode some performance heuristics
which appeared to be giving a significant gain/loss in performance:
Closes https://github.com/JuliaLang/julia/issues/55893

* Do not trigger deprecation warnings in `Test.detect_ambiguities` and `Test.detect_unbound_args` (#55869)

#55868

* do not intentionally suppress errors in precompile script from being reported or failing the result (#55909)

I was slightly annoying that the build was set up to succeed if this
step failed, so I removed the error suppression and fixed up the script
slightly

* Remove eigvecs method for SymTridiagonal (#55903)

The fallback method does the same, so this specialized method isn't
necessary

* add --trim option for generating smaller binaries (#55047)

This adds a command line option `--trim` that builds images where code
is only included if it is statically reachable from methods marked using
the new function `entrypoint`. Compile-time errors are given for call
sites that are too dynamic to allow trimming the call graph (however
there is an `unsafe` option if you want to try building anyway to see
what happens).

The PR has two other components. One is changes to Base that generally
allow more code to be compiled in this mode. These changes will either
be merged in separate PRs or moved to a separate part of the workflow
(where we will build a custom system image for this purpose). The branch
is set up this way to make it easy to check out and try the
functionality.

The other component is everything in the `juliac/` directory, which
implements a compiler driver script based on this new option, along with
some examples and tests. This will eventually become a package "app"
that depends on PackageCompiler and provides a CLI for all of this
stuff, so it will not be merged here. To try an example:

```
julia contrib/juliac.jl --output-exe hello --trim test/trimming/hello.jl
```

When stripped the resulting executable is currently about 900kb on my
machine.

Also includes a lot of work by @topolarity

---------

Co-authored-by: Gabriel Baraldi <[email protected]>
Co-authored-by: Tim Holy <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>

* fix rawbigints OOB issues (#55917)

Fixes issues introduced in #50691 and found in #55906:
* use `@inbounds` and `@boundscheck` macros in rawbigints, for catching
OOB with `--check-bounds=yes`
* fix OOB in `truncate`

* prevent loading other extensions when precompiling an extension (#55589)

The current way of loading extensions when precompiling an extension
very easily leads to cycles. For example, if you have more than one
extension and you happen to transitively depend on the triggers of one
of your extensions you will immediately hit a cycle where the extensions
will try to load each other indefinitely. This is an issue because you
cannot directly influence your transitive dependency graph so from this
p.o.v the current system of loading extension is "unsound".

The test added here checks this scenario and we can now precompile and
load it without any warnings or issues.

Would have made https://github.com/JuliaLang/julia/issues/55517 a non
issue.

Fixes https://github.com/JuliaLang/julia/issues/55557

---------

Co-authored-by: KristofferC <[email protected]>

* TOML: Avoid type-pirating `Base.TOML.Parser` (#55892)

Since stdlibs can be duplicated but Base never is, `Base.require_stdlib`
makes type piracy even more complicated than it normally would be.

To adapt, this changes `TOML.Parser` to be a type defined by the TOML
stdlib, so that we can define methods on it without committing
type-piracy and avoid problems like Pkg.jl#4017

Resolves
https://github.com/JuliaLang/Pkg.jl/issues/4017#issuecomment-2377589989

* [FileWatching] fix PollingFileWatcher design and add workaround for a stat bug

What started as an innocent fix for a stat bug on Apple (#48667) turned
into a full blown investigation into the design problems with the libuv
backend for PollingFileWatcher, and writing my own implementation of it
instead which could avoid those singled-threaded concurrency bugs.

* [FileWatching] fix FileMonitor similarly and improve pidfile reliability

Previously pidfile used the same poll_interval as sleep to detect if
this code made any concurrency mistakes, but we do not really need to do
that once FileMonitor is fixed to be reliable in the presence of
parallel concurrency (instead of using watch_file).

* [FileWatching] reorganize file and add docs

* Add `--trace-dispatch` (#55848)

* relocation: account for trailing path separator in depot paths (#55355)

Fixes #55340

* change compiler to be stackless (#55575)

This change ensures the compiler uses very little stack, making it
compatible with running on any arbitrary system stack size and depths
much more reliably. It also could be further modified now to easily add
various forms of pause-able/resumable inference, since there is no
implicit state on the stack--everything is local and explicit now.

Whereas before, less than 900 frames would crash in less than a second:
```
$ time ./julia -e 'f(::Val{N}) where {N} = N <= 0 ? 0 : f(Val(N - 1)); f(Val(1000))'
Warning: detected a stack overflow; program state may be corrupted, so further execution might be unreliable.
Internal error: during type inference of
f(Base.Val{1000})
Encountered stack overflow.
This might be caused by recursion over very long tuples or argument lists.

[23763] signal 6: Abort trap: 6
in expression starting at none:1
__pthread_kill at /usr/lib/system/libsystem_kernel.dylib (unknown line)
Allocations: 1 (Pool: 1; Big: 0); GC: 0
Abort trap: 6

real	0m0.233s
user	0m0.165s
sys	0m0.049s
````

Now: it is effectively unlimited, as long as you are willing to wait for
it:
```
$ time ./julia -e 'f(::Val{N}) where {N} = N <= 0 ? 0 : f(Val(N - 1)); f(Val(50000))'
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 2500 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 5000 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 10000 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 20000 frames (may be slow).
info: inference of f(Base.Val{50000}) from f(Base.Val{N}) where {N} exceeding 40000 frames (may be slow).
real	7m4.988s

$ time ./julia -e 'f(::Val{N}) where {N} = N <= 0 ? 0 : f(Val(N - 1)); f(Val(1000))'
real	0m0.214s
user	0m0.164s
sys	0m0.044s

$ time ./julia -e '@noinline f(::Val{N}) where {N} = N <= 0 ? GC.safepoint() : f(Val(N - 1)); f(Val(5000))'
info: inference of f(Base.Val{5000}) from f(Base.Val{N}) where {N} exceeding 2500 frames (may be slow).
info: inference of f(Base.Val{5000}) from f(Base.Val{N}) where {N} exceeding 5000 frames (may be slow).
real	0m8.609s
user	0m8.358s
sys	0m0.240s
```

* optimizer: simplify the finalizer inlining pass a bit (#55934)

Minor adjustments have been made to the algorithm of the finalizer
inlining pass. Previously, it required that the finalizer registration
dominate all uses, but this is not always necessary as far as the
finalizer inlining point dominates all the uses. So the check has been
relaxed. Other minor fixes have been made as well, but their importance
is low.

* Limit `@inbounds` to indexing in the dual-iterator branch in `copyto_unaliased!` (#55919)

This simplifies the `copyto_unalised!` implementation where the source
and destination have different `IndexStyle`s, and limits the `@inbounds`
to only the indexing operation. In particular, the iteration over
`eachindex(dest)` is not marked as `@inbounds` anymore. This seems to
help with performance when the destination uses Cartesian indexing.
Reduced implementation of the branch:
```julia
function copyto_proposed!(dest, src)
    axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
    iterdest, itersrc = eachindex(dest), eachindex(src)
    for (destind, srcind) in zip(iterdest, itersrc)
        @inbounds dest[destind] = src[srcind]
    end
    dest
end

function copyto_current!(dest, src)
    axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
    iterdest, itersrc = eachindex(dest), eachindex(src)
    ret = iterate(iterdest)
    @inbounds for a in src
        idx, state = ret::NTuple{2,Any}
        dest[idx] = a
        ret = iterate(iterdest, state)
    end
    dest
end

function copyto_current_limitinbounds!(dest, src)
    axes(dest) == axes(src) || throw(ArgumentError("incompatible sizes"))
    iterdest, itersrc = eachindex(dest), eachindex(src)
    ret = iterate(iterdest)
    for isrc in itersrc
        idx, state = ret::NTuple{2,Any}
        @inbounds dest[idx] = src[isrc]
        ret = iterate(iterdest, state)
    end
    dest
end
```
```julia
julia> a = zeros(40000,4000); b = rand(size(a)...);

julia> av = view(a, UnitRange.(axes(a))...);

julia> @btime copyto_current!($av, $b);
  617.704 ms (0 allocations: 0 bytes)

julia> @btime copyto_current_limitinbounds!($av, $b);
  304.146 ms (0 allocations: 0 bytes)

julia> @btime copyto_proposed!($av, $b);
  240.217 ms (0 allocations: 0 bytes)

julia> versioninfo()
Julia Version 1.12.0-DEV.1260
Commit 4a4ca9c8152 (2024-09-28 01:49 UTC)
Build Info:
  Official https://julialang.org release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 8 × Intel(R) Core(TM) i5-10310U CPU @ 1.70GHz
  WORD_SIZE: 64
  LLVM: libLLVM-18.1.7 (ORCJIT, skylake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)
Environment:
  JULIA_EDITOR = subl
```
I'm not quite certain why the proposed implementation here
(`copyto_proposed!`) is even faster than
`copyto_current_limitinbounds!`. In any case, `copyto_proposed!` is
easier to read, so I'm not complaining.

This fixes https://github.com/JuliaLang/julia/issues/53158

* Strong zero in Diagonal triple multiplication (#55927)

Currently, triple multiplication with a `LinearAlgebra.BandedMatrix`
sandwiched between two `Diagonal`s isn't associative, as this is
implemented using broadcasting, which doesn't assume a strong zero,
whereas the two-term matrix multiplication does.
```julia
julia> D = Diagonal(StepRangeLen(NaN, 0, 3));

julia> B = Bidiagonal(1:3, 1:2, :U);

julia> D * B * D
3×3 Matrix{Float64}:
 NaN  NaN  NaN
 NaN  NaN  NaN
 NaN  NaN  NaN

julia> (D * B) * D
3×3 Bidiagonal{Float64, Vector{Float64}}:
 NaN    NaN       ⋅ 
    ⋅   NaN    NaN
    ⋅      ⋅   NaN

julia> D * (B * D)
3×3 Bidiagonal{Float64, Vector{Float64}}:
 NaN    NaN       ⋅ 
    ⋅   NaN    NaN
    ⋅      ⋅   NaN
```
This PR ensures that the 3-term multiplication is evaluated as a
sequence of two-term multiplications, which fixes this issue. This also
improves performance, as only the bands need to be evaluated now.
```julia
julia> D = Diagonal(1:1000); B = Bidiagonal(1:1000, 1:999, :U);

julia> @btime $D * $B * $D;
  656.364 μs (11 allocations: 7.63 MiB) # v"1.12.0-DEV.1262"
  2.483 μs (12 allocations: 31.50 KiB) # This PR
```

* Fix dispatch on `alg` in Float16 Hermitian eigen (#55928)

Currently,
```julia
julia> using LinearAlgebra

julia> A = Hermitian(reshape(Float16[1:16;], 4, 4));

julia> eigen(A).values |> typeof
Vector{Float16} (alias for Array{Float16, 1})

julia> eigen(A, LinearAlgebra.QRIteration()).values |> typeof
Vector{Float32} (alias for Array{Float32, 1})
```
This PR moves the specialization on the `eltype` to an internal method,
so that firstly all `alg`s dispatch to that method, and secondly, there
are no ambiguities introduce by specializing the top-level `eigen`. The
latter currently causes test failures in `StaticArrays`
(https://github.com/JuliaArrays/StaticArrays.jl/actions/runs/11092206012/job/30816955210?pr=1279),
and should be fixed by this PR.

* Remove specialized `ishermitian` method for `Diagonal{<:Real}` (#55948)

The fallback method for `Diagonal{<:Number}` handles this already by
checking that the `diag` is real, so we don't need this additional
specialization.

* Fix logic in `?` docstring example (#55945)

* fix `unwrap_macrocalls` (#55950)

The implementation of `unwrap_macrocalls` has assumed that what
`:macrocall` wraps is always an `Expr` object, but that is not
necessarily correct:
```julia
julia> Base.@assume_effects :nothrow @show 42
ERROR: LoadError: TypeError: in typeassert, expected Expr, got a value of type Int64
Stacktrace:
 [1] unwrap_macrocalls(ex::Expr)
   @ Base ./expr.jl:906
 [2] var"@assume_effects"(__source__::LineNumberNode, __module__::Module, args::Vararg{Any})
   @ Base ./expr.jl:756
in expression starting at REPL[1]:1
```
This commit addresses this issue.

* make faster BigFloats (#55906)

We can coalesce the two required allocations for the MFPR BigFloat API
design into one allocation, hopefully giving a easy performance boost.
It would have been slightly easier and more efficient if MPFR BigFloat
was already a VLA instead of containing a pointer here, but that does
not prevent the optimization.

* Add propagate_inbounds_meta to atomic genericmemory ops (#55902)

`memoryref(mem, i)` will otherwise emit a boundscheck.

```
; │ @ /home/vchuravy/WorkstealingQueues/src/CLL.jl:53 within `setindex_atomic!` @ genericmemory.jl:329
; │┌ @ boot.jl:545 within `memoryref`
    %ptls_field = getelementptr inbounds i8, ptr %tls_pgcstack, i64 16
    %ptls_load = load ptr, ptr %ptls_field, align 8
    %"box::GenericMemoryRef" = call noalias nonnull align 8 dereferenceable(32) ptr @ijl_gc_small_alloc(ptr %ptls_load, i32 552, i32 32, i64 23456076646928) #9
    %"box::GenericMemoryRef.tag_addr" = getelementptr inbounds i64, ptr %"box::GenericMemoryRef", i64 -1
    store atomic i64 23456076646928, ptr %"box::GenericMemoryRef.tag_addr" unordered, align 8
    store ptr %memoryref_data, ptr %"box::GenericMemoryRef", align 8
    %.repack8 = getelementptr inbounds { ptr, ptr }, ptr %"box::GenericMemoryRef", i64 0, i32 1
    store ptr %memoryref_mem, ptr %.repack8, align 8
    call void @ijl_bounds_error_int(ptr nonnull %"box::GenericMemoryRef", i64 %7)
    unreachable
```

For the Julia code:

```julia
function Base.setindex_atomic!(buf::WSBuffer{T}, order::Symbol, val::T, idx::Int64) where T
    @inbounds Base.setindex_atomic!(buf.buffer, order, val,((idx - 1) & buf.mask) + 1)
end
```

from
https://github.com/gbaraldi/WorkstealingQueues.jl/blob/0ebc57237cf0c90feedf99e4338577d04b67805b/src/CLL.jl#L41

* fix rounding mode in construction of `BigFloat` from pi (#55911)

The default argument of the method was outdated, reading the global
default rounding directly, bypassing the `ScopedValue` stuff.

* fix `nonsetable_type_hint_handler` (#55962)

The current implementation is wrong, causing it to display inappropriate
hints like the following:
```julia
julia> s = Some("foo");

julia> s[] = "bar"
ERROR: MethodError: no method matching setindex!(::Some{String}, ::String)
The function `setindex!` exists, but no method is defined for this combination of argument types.
You attempted to index the type String, rather than an instance of the type. Make sure you create the type using its constructor: d = String([...]) rather than d = String
Stacktrace:
 [1] top-level scope
   @ REPL[2]:1
```

* REPL: make UndefVarError aware of imported modules (#55932)

* fix test/staged.jl (#55967)

In particular, the implementation of `overdub_generator54341` was
dangerous. This fixes it up.

* Explicitly store a module's location (#55963)

Revise wants to know what file a module's `module` definition is in.
Currently it does this by looking at the source location for the
implicitly generated `eval` method. This is terrible for two reasons:

1. The method may not exist if the module is a baremodule (which is not
particularly common, which is probably why we haven't seen it).
2. The fact that the implicitly generated `eval` method has this
location information is an implementation detail that I'd like to get
rid of (#55949).

This PR adds explicit file/line info to `Module`, so that Revise doesn't
have to use the hack anymore.

* mergewith: add single argument example to docstring (#55964)

I ran into this edge case. I though it should be documented.
---------

Co-authored-by: Lilith Orion Hafner <[email protected]>

* [build] avoid libedit linkage and align libccalllazy* SONAMEs (#55968)

While building the 1.11.0-rc4 in Homebrew[^1] in preparation for 1.11.0
release (and to confirm Sequoia successfully builds) I noticed some odd
linkage for our Linux builds, which included of:

1. LLVM libraries were linking to `libedit.so`, e.g.
    ```
    Dynamic Section:
      NEEDED       libedit.so.0
      NEEDED       libz.so.1
      NEEDED       libzstd.so.1
      NEEDED       libstdc++.so.6
      NEEDED       libm.so.6
      NEEDED       libgcc_s.so.1
      NEEDED       libc.so.6
      NEEDED       ld-linux-x86-64.so.2
      SONAME       libLLVM-16jl.so
    ```
    CMakeCache.txt showed
    ```
    //Use libedit if available.
    LLVM_ENABLE_LIBEDIT:BOOL=ON
    ```
Which might be overriding `HAVE_LIBEDIT` at
https://github.com/JuliaLang/llvm-project/blob/julia-release/16.x/llvm/cmake/config-ix.cmake#L222-L225.
So just added `LLVM_ENABLE_LIBEDIT`

2. Wasn't sure if there was a reason for this but `libccalllazy*` had
mismatched SONAME:
    ```console
    ❯ objdump -p lib/julia/libccalllazy* | rg '\.so'
    lib/julia/libccalllazybar.so:	file format elf64-x86-64
      NEEDED       ccalllazyfoo.so
      SONAME       ccalllazybar.so
    lib/julia/libccalllazyfoo.so:	file format elf64-x86-64
      SONAME       ccalllazyfoo.so
    ```
    Modifying this, but can drop if intentional.

---

[^1]: https://github.com/Homebrew/homebrew-core/pull/192116

* Add missing `copy!(::AbstractMatrix, ::UniformScaling)` method (#55970)

Hi everyone! First PR to Julia here.

It was noticed in a Slack thread yesterday
that `copy!(A, I)` doesn't work, but `copyto!(A, I)` does. This PR adds
the missing method for `copy!(::AbstractMatrix, ::UniformScaling)`,
which simply defers to `copyto!`, and corresponding tests.

I added a `compat` notice for Julia 1.12.

---------

Co-authored-by: Lilith Orion Hafner <[email protected]>

* Add forward progress update to NEWS.md (#54089)

Closes #40009 which was left open because of the needs news tag.

---------

Co-authored-by: Ian Butterworth <[email protected]>

* Fix an intermittent test failure in `core` test (#55973)

The test wants to assert that `Module` is not resolved in `Main`, but
other tests do resolve this identifier, so the test can fail depending
on test order (and I've been seeing such failures on CI recently). Fix
that by running the test in a fresh subprocess.

* fix comma logic in time_print (#55977)

Minor formatting fix

* optimizer: fix up the inlining algorithm to use correct `nargs`/`isva` (#55976)

It appears that inlining.jl was not updated in JuliaLang/julia#54341.
Specifically, using `nargs`/`isva` from `mi.def::Method` in
`ir_prepare_inlining!` causes the following error to occur:
```julia
function generate_lambda_ex(world::UInt, source::LineNumberNode,
                            argnames, spnames, @nospecialize body)
    stub = Core.GeneratedFunctionStub(identity, Core.svec(argnames...), Core.svec(spnames...))
    return stub(world, source, body)
end
function overdubbee54341(a, b)
    return a + b
end
const overdubee_codeinfo54341 = code_lowered(overdubbee54341, Tuple{Any, Any})[1]
function overdub_generator54341(world::UInt, source::LineNumberNode, selftype, fargtypes)
    if length(fargtypes) != 2
        return generate_lambda_ex(world, source,
            (:overdub54341, :args), (), :(error("Wrong number of arguments")))
    else
        return copy(overdubee_codeinfo54341)
    end
end
@eval function overdub54341(args...)
    $(Expr(:meta, :generated, overdub_generator54341))
    $(Expr(:meta, :generated_only))
end
topfunc(x) = overdub54341(x, 2)
```
```julia
julia> topfunc(1)
Internal error: during type inference of
topfunc(Int64)
Encountered unexpected error in runtime:
BoundsError(a=Array{Any, 1}(dims=(2,), mem=Memory{Any}(8, 0x10632e780)[SSAValue(2), SSAValue(3), #<null>, #<null>, #<null>, #<null>, #<null>, #<null>]), i=(3,))
throw_boundserror at ./essentials.jl:14
getindex at ./essentials.jl:909 [inlined]
ssa_substitute_op! at ./compiler/ssair/inlining.jl:1798
ssa_substitute_op! at ./compiler/ssair/inlining.jl:1852
ir_inline_item! at ./compiler/ssair/inlining.jl:386
...
```

This commit updates the abstract interpretation and inlining algorithm
to use the `nargs`/`isva` values held by `CodeInfo`. Similar
modifications have also been made to EscapeAnalysis.jl.

@nanosoldier `runbenchmarks("inference", vs=":master")`

* Add `.zed` directory to `.gitignore` (#55974)

Similar to the `vscode` config directory, we may ignore the `zed`
directory as well.

* typeintersect: reduce unneeded allocations from `merge_env`

`merge_env` and `final_merge_env` could be skipped
for emptiness test or if we know there's only 1 valid Union state.

* typeintersect: trunc env before nested `intersect_all` if valid.

This only covers the simplest cases. We might want a full dependence analysis and keep env length minimum in the future.

* `@time` actually fix time report commas & add tests (#55982)

https://github.com/JuliaLang/julia/pull/55977 looked simple but wasn't
quite right because of a bad pattern in the lock conflicts report
section.

So fix and add tests.

* adjust EA to JuliaLang/julia#52527 (#55986)

`EnterNode.catch_dest` can now be `0` after the `try`/`catch` elision
feature implemented in JuliaLang/julia#52527, and we actually need to
adjust `EscapeAnalysis.compute_frameinfo` too.

* Improvements to JITLink

Seeing what this will look like, since it has a number of features
(delayed compilation, concurrent compilation) that are starting to
become important, so it would be nice to switch to only supporting one
common implementation of memory management.

Refs #50248

I am expecting https://github.com/llvm/llvm-project/issues/63236 may
cause some problems, since we reconfigured some CI machines to minimize
that issue, but it is still likely relevant.

* rewrite catchjmp asm to use normal relocations instead of manual editing

* add logic to prefer loading modules that are already loaded (#55908)

Iterate over the list of existing loaded modules for PkgId whenever
loading a new module for PkgId, so that we will use that existing
build_id content if it otherwise passes the other stale_checks.

* Apple: fix bus error on smaller readonly file in unix (#55859)

Enables the fix for #28245 in #44354 for Apple now that the Julia bugs are
fixed by #55641 and #55877.

Closes #28245

* Add `Float16` to `Base.HWReal` (#55929)

* docs: make mod an operator (#55988)

* InteractiveUtils: add `@trace_compile` and `@trace_dispatch` (#55915)

* Profile: document heap snapshot viewing tools (#55743)

* [REPL] Fix #55850 by using `safe_realpath` instead of `abspath` in `projname` (#55851)

* optimizer: enable load forwarding with the `finalizer` elision (#55991)

When the finalizer elision pass is used, load forwarding is not
performed currently, regardless of whether the pass succeeds or not. But
this is not necessary, and by keeping the `setfield!` call, we can
safely forward `getfield` even if finalizer elision is tried.

* Avoid `stat`-ing stdlib path if it's unreadable (#55992)

* doc: manual: cmd: fix Markdown in table entry for `--trim` (#55979)

* Avoid conversions to `Float64` in non-literal powers of `Float16` (#55994)

Co-authored-by: Alex Arslan <[email protected]>

* Remove unreachable error branch in memset calls (and in repeat) (#55985)

Some places use the pattern memset(A, v, length(A)), which requires a
conversion UInt(length(A)). This is technically fallible, but can't
actually fail when A is a Memory or Array.
Remove the dead error branch by casting to UInt instead.

Similarly, in repeat(x, r), r is first checked to be nonnegative, then
converted to UInt, then used in multiple calls where it is converted to
UInt each time. Here, only do it once.

* fix up docstring of `mod` (#56000)

* fix typos (#56008)

these are all in markdown files

Co-authored-by: spaette <[email protected]>

* Vectorise random vectors of `Float16` (#55997)

* Clarify `div` docstring for floating-point input (#55918)

Closes #55837

This is a variant of the warning found in the `fld` docstring clarifying
floating-point behaviour.

* improve getproperty(Pairs) warnings (#55989)

- Only call `depwarn` if the field is `itr` or `data`; otherwise let the field error happen as normal
- Give a more specific deprecation warning.

* Document type-piracy / type-leakage restrictions for `require_stdlib` (#56005)

I was a recent offender in
https://github.com/JuliaLang/Pkg.jl/issues/4017#issuecomment-2377589989

This PR tries to lay down some guidelines for the behavior that stdlibs
and the callers of `require_stdlib` must adhere to to avoid "duplicate
stdlib" bugs

These bugs are particularly nasty because they are experienced
semi-rarely and under pretty specific circumstances (they only occur
when `require_stdlib` loads another copy of a stdlib, often in a
particular order and/or with a particular state of your pre-compile /
loading cache) so they may make it a long way through a pre-release
cycle without an actionable bug report.

* [LinearAlgebra] Remove unreliable doctests (#56011)

The exact textual representation of the output of these doctests depend
on the specific kernel used by the BLAS backend, and can vary between
versions of OpenBLAS (as it did in #41973), or between different CPUs,
which makes these doctests unreliable.

Fix #55998.

* cleanup functions of Hermitian matrices (#55951)

The functions of Hermitian matrices are a bit of a mess. For example, if
we have a Hermitian matrix `a` with negative eigenvalues, `a^0.5`
doesn't produce the `Symmetric` wrapper, but `sqrt(a)` does. On the
other hand, if we have a positive definite `b`, `b^0.5` will be
`Hermitian`, but `sqrt(b)` will be `Symmetric`:
```julia
using LinearAlgebra
a = Hermitian([1.0 2.0;2.0 1.0])
a^0.5
sqrt(a)
b = Hermitian([2.0 1.0; 1.0 2.0])
b^0.5
sqrt(b)
```
This sort of arbitrary assignment of wrappers happens with pretty much
all functions defined there. There's also some oddities, such as `cis`
being the only function defined for `SymTridiagonal`, even though all
`eigen`-based functions work, and `cbrt` being the only function not
defined for complex Hermitian matrices.

I did a cleanup: I defined all functions for `SymTridiagonal` and
`Hermitian{<:Complex}`, and always assigned the appropriate wrapper,
preserving the input one when possible.

There's an inconsistency remaining that I didn't fix, that only `sqrt`
and `log` accept a tolerance argument, as changing that is probably
breaking.

There were also hardly any tests that I could find (only `exp`, `log`,
`cis`, and `sqrt`). I'm happy to add them if it's desired.

* Fix no-arg `ScopedValues.@with` within a scope (#56019)

Fixes https://github.com/JuliaLang/julia/issues/56017

* LinearAlgebra: make matprod_dest public (#55537)

Currently, in a matrix multiplication `A * B`, we use `B` to construct
the destination. However, this may not produce the optimal destination
type, and is essentially single-dispatch. Letting packages specialize
`matprod_dest` would help us obtain the optimal type by dispatching on
both the arguments. This may significantly improve performance in the
matrix multiplication. As an example:
```julia
julia> using LinearAlgebra, FillArrays, SparseArrays

julia> F = Fill(3, 10, 10);

julia> s = sprand(10, 10, 0.1);

julia> @btime $F * $s;
  15.225 μs (10 allocations: 4.14 KiB)

julia> typeof(F * s)
SparseMatrixCSC{Float64, Int64}

julia> nnz(F * s)
80

julia> VERSION
v"1.12.0-DEV.1074"
```
In this case, the destination is a sparse matrix with 80% of its
elements filled and being set one-by-one, which is terrible for
performance. Instead, if we specialize `matprod_dest` to return a dense
destination, we may obtain
```julia
julia> LinearAlgebra.matprod_dest(F::FillArrays.AbstractFill, S::SparseMatrixCSC, ::Type{T}) where {T} = Matrix{T}(undef, size(F,1), size(S,2))

julia> @btime $F * $s;
  754.632 ns (2 allocations: 944 bytes)

julia> typeof(F * s)
Matrix{Float64}
```
Potentially, this may be improved further by specializing `mul!`, but
this is a 20x improvement just by choosing the right destination.

Since this is being made public, we may want to bikeshed on an
appropriate name for the function.

* Sockets: Warn when local network access not granted. (#56023)

Works around https://github.com/JuliaLang/julia/issues/56022

* Update test due to switch to intel syntax by default in #48103 (#55993)

* add require_lock call to maybe_loaded_precompile (#56027)

If we expect this to be a public API
(https://github.com/timholy/Revise.jl for some reason is trying to
access this state), we should lock around it for consistency with the
other similar functions.

Needed for https://github.com/timholy/Revise.jl/pull/856

* fix `power_by_squaring`: use `promote` instead of type inference (#55634)

Fixes #53504

Fixes #55633

* Don't show keymap `@error` for hints (#56041)

It's too disruptive to show errors for hints. The error will still be
shown if tab is pressed.

Helps issues like https://github.com/JuliaLang/julia/issues/56037

* Refactoring to be considered before adding MMTk

* Removing jl_gc_notify_image_load, since it's a new function and not part of the refactoring

* Moving gc_enable code to gc-common.c

* Addressing PR comments

* Push resolution of merge conflict

* Removing jl_gc_mark_queue_obj_explicit extern definition from scheduler.c

* Don't need the getter function since it's possible to use jl_small_typeof directly

* Remove extern from free_stack declaration in julia_internal.h

* Putting everything that is common GC tls into gc-tls-common.h

* Typo

* Adding gc-tls-common.h to Makefile as a public header

* Removing gc-tls-common fields from gc-tls-mmtk.h

* Fix typo in sockets tests. (#56038)

* EA: use `is_mutation_free_argtype` for the escapability check (#56028)

EA has been using `isbitstype` for type-level escapability checks, but a
better criterion (`is_mutation_free`) is available these days, so we
would like to use that instead.

* effects: fix `Base.@_noub_meta` (#56061)

This had the incorrect number of arguments to `Expr(:purity, ...)`
causing it to be silently ignored.

* effects: improve `:noub_if_noinbounds` documentation (#56060)

Just a small touch-up

* Disallow assigning asymmetric values to SymTridiagonal (#56068)

Currently, we can assign an asymmetric value to a `SymTridiagonal`,
which goes against what `setindex!` is expected to do. This is because
`SymTridiagonal` symmetrizes the values along the diagonal, so setting a
diagonal entry to an asymmetric value would lead to a subsequent
`getindex` producing a different result.
```julia
julia> s = SMatrix{2,2}(1:4);

julia> S = SymTridiagonal(fill(s,4), fill(s,3))
4×4 SymTridiagonal{SMatrix{2, 2, Int64, 4}, Vector{SMatrix{2, 2, Int64, 4}}}:
 [1 3; 3 4]  [1 3; 2 4]      ⋅           ⋅     
 [1 2; 3 4]  [1 3; 3 4]  [1 3; 2 4]      ⋅     
     ⋅       [1 2; 3 4]  [1 3; 3 4]  [1 3; 2 4]
     ⋅           ⋅       [1 2; 3 4]  [1 3; 3 4]

julia> S[1,1] = s
2×2 SMatrix{2, 2, Int64, 4} with indices SOneTo(2)×SOneTo(2):
 1  3
 2  4

julia> S[1,1] == s
false

julia> S[1,1]
2×2 Symmetric{Int64, SMatrix{2, 2, Int64, 4}} with indices SOneTo(2)×SOneTo(2):
 1  3
 3  4
```
After this PR,
```julia
julia> S[1,1] = s
ERROR: ArgumentError: cannot set a diagonal entry of a SymTridiagonal to an asymmetric value
```

* Remove unused matrix type params in diag methods (#56048)

These parameters are not used in the method, and are unnecessary for
dispatch.

* LinearAlgebra: diagzero for non-OneTo axes (#55252)

Currently, the off-diagonal zeros for a block-`Diagonal` matrix is
computed using `diagzero`, which calls `zeros` for the sizes of the
elements. This returns an `Array`, unless one specializes `diagzero` for
the custom `Diagonal` matrix type.

This PR defines a `zeroslike` function that dispatches on the axes of
the elements, which lets packages specialize on the axes to return
custom `AbstractArray`s. Choosing to specialize on the `eltype` avoids
the need to specialize on the container, and allows packages to return
appropriate types for custom axis types.

With this,
```julia
julia> LinearAlgebra.zeroslike(::Type{S}, ax::Tuple{SOneTo, Vararg{SOneTo}}) where {S<:SMatrix} = SMatrix{map(length, ax)...}(ntuple(_->zero(eltype(S)), prod(length, ax)))

julia> D = Diagonal(fill(SMatrix{2,3}(1:6), 2))
2×2 Diagonal{SMatrix{2, 3, Int64, 6}, Vector{SMatrix{2, 3, Int64, 6}}}:
 [1 3 5; 2 4 6]        ⋅       
       ⋅         [1 3 5; 2 4 6]

julia> D[1,2] # now an SMatrix
2×3 SMatrix{2, 3, Int64, 6} with indices SOneTo(2)×SOneTo(3):
 0  0  0
 0  0  0

julia> LinearAlgebra.zeroslike(::Type{S}, ax::Tuple{SOneTo, Vararg{SOneTo}}) where {S<:MMatrix} = MMatrix{map(length, ax)...}(ntuple(_->zero(eltype(S)), prod(length, ax)))

julia> D = Diagonal(fill(MMatrix{2,3}(1:6), 2))
2×2 Diagonal{MMatrix{2, 3, Int64, 6}, Vector{MMatrix{2, 3, Int64, 6}}}:
 [1 3 5; 2 4 6]        ⋅       
       ⋅         [1 3 5; 2 4 6]

julia> D[1,2] # now an MMatrix
2×3 MMatrix{2, 3, Int64, 6} with indices SOneTo(2)×SOneTo(3):
 0  0  0
 0  0  0
```
The reason this can't be the default behavior is that we are not
guaranteed that there exists a `similar` method that accepts the
combination of axes. This is why we have to fall back to using the
sizes, unless a specialized method is provided by a package.

One positive outcome of this is that indexing into such a block-diagonal
matrix will now usually be type-stable, which mitigates
https://github.com/JuliaLang/julia/issues/45535 to some extent (although
it doesn't resolve the issue).

I've also updated the `getindex` for `Bidiagonal` to use `diagzero`,
instead of the similarly defined `bidiagzero` function that it was
using. Structured block matrices may now use `diagzero` uniformly to
generate the zero elements.

* Multi-argument `gcdx(a, b, c...)` (#55935)

Previously, `gcdx` only worked for two arguments - but the underlying
idea extends to any (nonzero) number of arguments. Similarly, `gcd`
already works for 1, 2, 3+ arguments.

This PR implements the 1 and 3+ argument versions of `gcdx`, following
the [wiki
page](https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm#The_case_of_more_than_two_numbers)
for the Extended Euclidean algorithm.

* Refactoring to be considered before adding MMTk

* Removing jl_gc_notify_image_load, since it's a new function and not part of the refactoring

* Moving gc_enable code to gc-common.c

* Addressing PR comments

* Push resolution of merge conflict

* Removing jl_gc_mark_queue_obj_explicit extern definition from scheduler.c

* Don't need the getter function since it's possible to use jl_small_typeof directly

* Remove extern from free_stack declaration in julia_internal.h

* Putting everything that is common GC tls into gc-tls-common.h

* Typo

* Adding gc-tls-common.h to Makefile as a public header

* Adding jl_full_sweep_reasons since timing.jl depends on it

* Fixing issue with jl_full_sweep_reasons (missing constants)

* fix `_growbeg!` unncessary resizing (#56029)

This was very explicitly designed such that if there was a bunch of
extra space at the end of the array, we would copy rather than
allocating, but by making `newmemlen` be at least
`overallocation(memlen)` rather than `overallocation(len)`, this branch
was never hit. found by https://github.com/JuliaLang/julia/issues/56026

* REPL: hide any prints to stdio during `complete_line` (#55959)

* teach llvm-alloc-helpers about `gc_loaded` (#56030)

combined with https://github.com/JuliaLang/julia/pull/55913, the
compiler is smart enough to fully remove
```
function f()
    m = Memory{Int}(undef, 3)
    @inbounds m[1] = 2
    @inbounds m[2] = 2
    @inbounds m[3] = 4
    @inbounds return m[1] + m[2] + m[3]
end
```

* mpfr: prevent changing precision (#56049)

Changing precision requires reallocating the data field, which is better
done by making a new BigFloat (since they are conceptually immutable
anyways). Also do a bit a cleanup while here.

Closes #56044

* stackwalk: fix jl_thread_suspend_and_get_state race (#56047)

There was a missing re-assignment of old = -1; at the end of that loop
which means in the ABA case, we accidentally actually acquire the lock
on the thread despite not actually having stopped the thread; or in the
counter-case, we try to run through this logic with old==-1 on the next
iteration, and that isn't valid either (jl_thread_suspend_and_get_state
should return failure and the loop will abort too early).

Fix #56046

* irrationals: restrict assume effects annotations to known types (#55886)

Other changes:
* replace `:total` with the less powerful `:foldable`
* add an `<:Integer` dispatch constraint on the `rationalize` method,
closes #55872
* replace `Rational{<:Integer}` with just `Rational`, they're equal

Other issues, related to `BigFloat` precision, are still present in
irrationals.jl, to be fixed by followup PRs, including #55853.

Fixes #55874

* update `hash` doc string: `widen` not required any more (#55867)

Implementing `widen` isn't a requirement any more, since #26022.

* Merge `diag` methods for triangular matrices (#56086)

* slightly improve inference in precompilation code (#56084)

Avoids the

```
11: signature Tuple{typeof(convert), Type{String}, Any} triggered MethodInstance for Base.Precompilation.ExplicitEnv(::String) (84 children)
```

shown in
https://github.com/JuliaLang/julia/issues/56080#issuecomment-2404765120

Co-authored-by: KristofferC <[email protected]>

* avoid defining `convert(Vector{String}, ...)` in LibGit2 (#56082)

This is a weird conversion function to define. Seems cleaner to use the
iteration interface for this. Also avoids some invalidations
(https://github.com/JuliaLang/julia/issues/56080#issuecomment-2404765120)

Co-authored-by: KristofferC <[email protected]>

* array: inline `convert` where possible (#56034)

This improves a common scenario, where someone wants to `push!` a
poorly-typed object onto a well-typed Vector.

For example:
```julia
const NT = @NamedTuple{x::Int,y::Any}
foo(v::Vector{NT}, x::Int, @nospecialize(y)) = push!(v, (; x, y))
```

The `(; x, y)` is slightly poorly-typed here. It could have any type for
its `.y` field before it is converted inside the `push!` to a NamedTuple
with `y::Any`

Without this PR, the dispatch for this `push!` cannot be inferred:
```julia
julia> code_typed(foo, (Vector{NT}, Int, Any))[1]
 CodeInfo(
1 ─ ...
│   %4 = %new(%3, x, y)::NamedTuple{(:x, :y), <:Tuple{Int64, Any}}
│   %5 = Main.push!(v, %4)::Vector{@NamedTuple{x::Int64, y}}
└──      return %5
) => Vector{@NamedTuple{x::Int64, y}}
```

With this PR, the above dynamic call is fully statically resolved and
inlined (and therefore `--trim` compatible)

* Remove some unnecessary `real` specializations for structured matrices (#56083)

The `real(::AbstractArray{<:Rea})` fallback method should handle these
cases correctly.

* Combine `diag` methods for `SymTridiagonal` (#56014)

Currently, there are two branches, one for an `eltype` that is a
`Number`, and the other that deals with generic `eltype`s. They do
similar things, so we may combine these, and use branches wherever
necessary to retain the performance. We also may replace explicit
materialized arrays by generators in `copyto!`. Overall, this improves
performance in `diag` for matrices of matrices, whereas the performance
in the common case of matrices of numbers remains unchanged.
```julia
julia> using StaticArrays, LinearAlgebra

julia> s = SMatrix{2,2}(1:4);

julia> S = SymTridiagonal(fill(s,100), fill(s,99));

julia> @btime diag($S);
  1.292 μs (5 allocations: 7.16 KiB) # nightly, v"1.12.0-DEV.1317"
  685.012 ns (3 allocations: 3.19 KiB) # This PR
```
This PR also allows computing the `diag` for more values of the band
index `n`:
```julia
julia> diag(S,99)
1-element Vector{SMatrix{2, 2, Int64, 4}}:
 [0 0; 0 0]
```
This would work as long as `getindex` works for the `SymTridiagonal` for
that band, and the zero element may be converted to the `eltype`.

* fix `Vararg{T,T} where T` crashing `code_typed` (#56081)

Not sure this is the right place to fix this error, perhaps
`match.spec_types` should always be a tuple of valid types?

fixes #55916

---------

Co-authored-by: Jameson Nash <[email protected]>

* [libblastrampoline_jll] Upgrade to v5.11.1 (#56094)

v5.11.1 is a patch release with a couple of RISC-V fixes.

* Revert "REPL: hide any prints to stdio during `complete_line`" (#56102)

* Remove warning from c when binding is ambiguous (#56103)

* make `Base.ANSIIterator` have a concrete field (#56088)

Avoids the invalidation

```
   backedges: 1: superseding sizeof(s::AbstractString) @ Base strings/basic.jl:177 with MethodInstance for sizeof(::AbstractString) (75 children)
```

shown in
https://github.com/JuliaLang/julia/issues/56080#issuecomment-2404765120.

Co-authored-by: KristofferC <[email protected]>

* Subtype: some performance tuning. (#56007)

The main motivation of this PR is to fix #55807.
dc689fe8700f70f4a4e2dbaaf270f26b87e79e04 tries to remove the slow
`may_contain_union_decision` check by re-organizing the code path. Now
the fast path has been removed and most of its optimization has been
integrated into the preserved slow path.
Since the slow path stores all inner ∃ decisions on the outer most R
stack, there might be overflow risk.
aee69a41441b4306ba3ee5e845bc96cb45d9b327 should fix that concern.

The reported MWE now becomes
```julia
  0.000002 seconds
  0.000040 seconds (105 allocations: 4.828 KiB, 52.00% compilation time)
  0.000023 seconds (105 allocations: 4.828 KiB, 49.36% compilation time)
  0.000026 seconds (105 allocations: 4.828 KiB, 50.38% compilation time)
  0.000027 seconds (105 allocations: 4.828 KiB, 54.95% compilation time)
  0.000019 seconds (106 allocations: 4.922 KiB, 49.73% compilation time)
  0.000024 seconds (105 allocations: 4.828 KiB, 52.24% compilation time)
```

Local bench also shows that 72855cd slightly accelerates
`OmniPackage.jl`'s loading
```julia
julia> @time using OmniPackage
# v1.11rc4
 20.525278 seconds (25.36 M allocations: 1.606 GiB, 8.48% gc time, 12.89% compilation time: 77% of which was recompilation)
# v1.11rc4+aee69a4+72855cd 
 19.527871 seconds (24.92 M allocations: 1.593 GiB, 8.88% gc time, 15.13% compilation time: 82% of which was recompilation)
```

* rearrange jl_delete_thread to be thread-safe (#56097)

Prior to this, especially on macOS, the gc-safepoint here would cause
the process to segfault as we had already freed the current_task state.
Rearrange this code so that the GC interactions (except for the atomic
store to current_task) are all handled before entering GC safe, and then
signaling the thread is deleted (via setting current_task = NULL,
published by jl_unlock_profile_wr to other threads) is last.

```
ERROR: Exception handler triggered on unmanaged thread.
Process 53827 stopped
* thread #5, stop reason = EXC_BAD_ACCESS (code=2, address=0x100018008)
    frame #0: 0x0000000100b74344 libjulia-internal.1.12.0.dylib`jl_delete_thread [inlined] jl_gc_state_set(ptls=0x000000011f8b3200, state='\x02', old_state=<unavailable>) at julia_threads.h:272:9 [opt]
   269 	    assert(old_state != JL_GC_CONCURRENT_COLLECTOR_THREAD);
   270 	    jl_atomic_store_release(&ptls->gc_state, state);
   271 	    if (state == JL_GC_STATE_UNSAFE || old_state == JL_GC_STATE_UNSAFE)
-> 272 	        jl_gc_safepoint_(ptls);
   273 	    return old_state;
   274 	}
   275 	STATIC_INLINE int8_t jl_gc_state_save_and_set(jl_ptls_t ptls,
Target 0: (julia) stopped.
(lldb) up
frame #1: 0x0000000100b74320 libjulia-internal.1.12.0.dylib`jl_delete_thread [inlined] jl_gc_state_save_and_set(ptls=0x000000011f8b3200, state='\x02') at julia_threads.h:278:12 [opt]
   275 	STATIC_INLINE int8_t jl_gc_state_save_and_set(jl_ptls_t ptls,
   276 	                                              int8_t state)
   277 	{
-> 278 	    return jl_gc_state_set(ptls, state, jl_atomic_load_relaxed(&ptls->gc_state));
   279 	}
   280 	#ifdef __clang_gcanalyzer__
   281 	// these might not be a safepoint (if they are no-op safe=>safe transitions), but we have to assume it could be (statically)
(lldb)
frame #2: 0x0000000100b7431c libjulia-internal.1.12.0.dylib`jl_delete_thread(value=0x000000011f8b3200) at threading.c:537:11 [opt]
   534 	    ptls->root_task = NULL;
   535 	    jl_free_thread_gc_state(ptls);
   536 	    // then park in safe-region
-> 537 	    (void)jl_gc_safe_enter(ptls);
   538 	}
```

(test incorporated into https://github.com/JuliaLang/julia/pull/55793)

* OpenBLAS: Use dynamic architecture support on AArch64. (#56107)

We already do so on Yggdrasil, so this just makes both source and binary
builds behave similarly.

Closes https://github.com/JuliaLang/julia/issues/56075

* IRShow: label builtin / intrinsic / dynamic calls in `code_typed` (#56036)

This makes it much easier to spot dynamic dispatches

* 🤖 [master] Bump the Pkg stdlib from 51d4910c1 to fbaa2e337 (#56124)

* Fix type instability of closures capturing types (2) (#40985)

Instead of closures lowering to `typeof` for the types of captured
fields, this introduces a new function `_typeof_captured_variable` that
returns `Type{T}` if `T` is a type (w/o free typevars).

- replaces/closes #35970
- fixes #23618

---------

Co-authored-by: Takafumi Arakaki <[email protected]>
Co-authored-by: Shuhei Kadowaki <[email protected]>

* Remove debug error statement from Makefile. (#56127)

* align markdown table (#56122)

@<!-- -->gbaraldi `#51197`
@<!-- -->spaette `#56008`

fix innocuous malalignment of table after those pulls were merged

* Improve IOBuffer docs (#56024)

Based on the discussion in #55978, I have tried to clarify the
documentation of `IOBuffer`.

* Comment out url and fix typo in stackwalk.c (#56131)

Introduced in #55623

* libgit2: Always use the bundled PCRE library. (#56129)

This is how Yggdrasil builds the library.

* Update JLL build versions (#56133)

This commit encompasses the following changes:
- Updating the JLL build version for Clang, dSFMT, GMP, LibUV,
LibUnwind, LLD, LLVM, libLLVM, MbedTLS, MPFR, OpenBLAS, OpenLibm, p7zip,
PCRE2, SuiteSparse, and Zlib.
- Updating CompilerSupportLibraries to v1.2.0. The library versions
contained in this release of CSL don't differ from v1.1.1, the only
difference is that v1.2.0 includes FreeBSD AArch64.
- Updating nghttp2 from 1.60.0 to 1.63.0. See
[here](https://github.com/nghttp2/nghttp2/releases) for changes between
these versions.
- Adding `aarch64-unknown-freebsd` to the list of triplets to check when
refreshing checksums.

Note that dependencies that link to MbedTLS (Curl, LibSSH2, LibGit2) are
excluded here. They'll be updated once a resolution is reached for the
OpenSSL switching saga. Once that happens, FreeBSD AArch64 should be
able to be built without any dependency source builds.

* typo in `Compiler.Effects` doc string: `checkbounds` -> `boundscheck` (#56140)

Follows up on #56060

* HISTORY: fix missing links (#56137)

* OpenBLAS: Fix cross-compilation detection for source build. (#56139)

We may be cross-compiling Linux-to-Linux, in which case `BUILD_OS` ==
`OS`, so look at `XC_HOST` to determine whether we're cross compiling.

* `diag` for `BandedMatrix`es for off-limit bands (#56065)

Currently, one can only obtain the `diag` for a `BandedMatrix` (such as
a `Diagonal`) when the band index is bounded by the size of the matrix.
This PR relaxes this requirement to match the behavior for arrays, where
`diag` returns an empty vector for a large band index instead of
throwing an error.
```julia
julia> D = Diagonal(ones(4))
4×4 Diagonal{Float64, Vector{Float64}}:
 1.0   ⋅    ⋅    ⋅ 
  ⋅   1.0   ⋅    ⋅ 
  ⋅    ⋅   1.0   ⋅ 
  ⋅    ⋅    ⋅   1.0

julia> diag(D, 10)
Float64[]

julia> diag(Array(D), 10)
Float64[]
```
Something similar for `SymTridiagonal` is being done in
https://github.com/JuliaLang/julia/pull/56014

* Port progress bar improvements from Pkg (#56125)

Includes changes from https://github.com/JuliaLang/Pkg.jl/pull/4038 and
https://github.com/JuliaLang/Pkg.jl/pull/4044.

Co-authored-by: Kristoffer Carlsson <[email protected]>

* Add support for LLVM 19 (#55650)

Co-authored-by: Zentrik <[email protected]>

* 🤖 [master] Bump the Pkg stdlib from fbaa2e337 to 27c1b1ee5 (#56146)

* HISTORY entry for deletion of `length(::Stateful)` (#55861)

xref #47790

xref #51747

xref #54953

xref #55858

* ntuple: ensure eltype is always `Int` (#55901)

Fixes #55790

* Improve remarks of the alloc opt pass slightly. (#55995)

The Value printer LLVM uses just prints the kind of instruction so it
just shows call.

---------

Co-authored-by: Oscar Smith <[email protected]>

* Implement Base.fd() for TCPSocket, UDPSocket, and TCPServer (#53721)

This is quite handy if you want to pass off the file descriptor to a C
library. I also added a warning to the `fd()` docstring to warn folks
about duplicating the file descriptor first.

* Fix `JULIA_CPU_TARGET` being propagated to workers precompiling stdlib pkgimages (#54093)

Apparently (thanks ChatGPT) each line in a makefile is executed in a
separate shell so adding an `export` line on one line does not propagate
to the next line.

* Merge tr methods for triangular matrices (#56154)

Since the methods do identical things, we don't need multiple of these.

* Reduce duplication in triangular indexing methods (#56152)

This uses an orthogonal design to reduce code duplication in the
indexing methods for triangular matrices.

* update LLVM docs (#56162)

dump with raw=true so you don't get random erorrs, and show how to run
single modules.

---------

Co-authored-by: Valentin Churavy <[email protected]>
Co-authored-by: Mosè Giordano <[email protected]>
Co-authored-by: Jameson Nash <[email protected]>

* Fix zero elements for block-matrix kron involving Diagonal (#55941)

Currently, it's assumed that the zero element is identical for the
matrix, but this is not necessary if the elements are matrices
themselves and have different sizes. This PR ensures that `kron` for a
`Diagonal` has the correct zero elements.
Current:
```julia
julia> D = Diagonal(1:2)
2×2 Diagonal{Int64, UnitRange{Int64}}:
 1  ⋅
 ⋅  2

julia> B = reshape([ones(2,2), ones(3,2), ones(2,3), ones(3,3)], 2, 2);

julia> size.(kron(D, B))
4×4 Matrix{Tuple{Int64, Int64}}:
 (2, 2)  (2, 3)  (2, 2)  (2, 2)
 (3, 2)  (3, 3)  (2, 2)  (2, 2)
 (2, 2)  (2, 2)  (2, 2)  (2, 3)
 (2, 2)  (2, 2)  (3, 2)  (3, 3)
``` 
This PR
```julia
julia> size.(kron(D, B))
4×4 Matrix{Tuple{Int64, Int64}}:
 (2, 2)  (2, 3)  (2, 2)  (2, 3)
 (3, 2)  (3, 3)  (3, 2)  (3, 3)
 (2, 2)  (2, 3)  (2, 2)  (2, 3)
 (3, 2)  (3, 3)  (3, 2)  (3, 3)
```
Note the differences e.g. in the `CartesianIndex(4,1)`,
`CartesianIndex(3,2)` and `CartesianIndex(3,3)` elements.

* Call `MulAddMul` instead of multiplication in _generic_matmatmul! (#56089)

Fix https://github.com/JuliaLang/julia/issues/56085 by calling a newly
created `MulAddMul` object that only wraps the `alpha` (with `beta` set
to `false`). This avoids the explicit multiplication if `alpha` is known
to be `isone`.

* improve `allunique`'s type stability (#56161)

Caught by https://github.com/aviatesk/JET.jl/issues/667.

* Add invalidation barriers for `displaysize` and `implicit_typeinfo` (#56159)

These are invalidated by our own stdlibs (Dates and REPL) unfortunately
so we need to put this barrier in.

This fix is _very_ un-satisfying, because it doesn't do anything to
solve this problem for downstream libraries that use e.g. `displaysize`.
To fix that, I think we need a way to make sure callers get these
invalidation barriers by default...

* Fix markdown list in installation.md (#56165)

Documenter.jl requires all trailing list content to follow the same
indentation as the header. So, in the current view
(https://docs.julialang.org/en/v1/manual/installation/#Command-line-arguments)
the list appears broken.

* [Random] Add more comments and a helper function in Xoshiro code (#56144)

Follow up to #55994 and #55997. This should basically be a
non-functional change and I see no performance difference, but the
comments and the definition of a helper function should make the code
easier to follow (I initially struggled in #55997) and extend to other
types.

* add objects to concisely specify initialization

PerProcess: once per process
PerThread: once per thread id
PerTask: once per task object

* add precompile support for recording fields to change

Somewhat generalizes our support for changing Ptr to C_NULL. Not
particularly fast, since it is just using the builtins implementation of
setfield, and delaying the actual stores, but it should suffice.

* improve OncePer implementation

Address reviewer feedback, add more fixes and more tests,
rename to add Once prefix.

* fix use-after-free in test (detected in win32 CI)

* Make loading work when stdlib deps are missing in the manifest (#56148)

Closes https://github.com/JuliaLang/julia/issues/56109 

Simulating a bad manifest by having `LibGit2_jll` missing as a dep of
`LibGit2` in my default env, say because the manifest was generated by a
different julia version or different master julia commit.

## This PR, it just works
```
julia> using Revise

julia>
```
i.e.
```
% JULIA_DEBUG=loading ./julia --startup-file=no
julia> using Revise
...
┌ Debug: Stdlib LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433] is trying to load `LibGit2_jll`
│ which is not listed as a dep in the load path manifests, so resorting to search
│ in the stdlib Project.tomls for true deps
└ @ Base loading.jl:387
┌ Debug: LibGit2 [76f85450-5226-5b5a-8eaa-529ad045b433] indeed depends on LibGit2_jll in project /Users/ian/Documents/GitHub/julia/usr/share/julia/stdlib/v1.12/LibGit2/Project.toml
└ @ Base loading.jl:395
...

julia>
```

## Master
```
julia> using Revise
Info Given Revise was explicitly requested, output will be shown live
ERROR: LoadError: ArgumentError: Package LibGit2 does not have LibGit2_jll in its dependencies:
- Note that the following manifests in the load path were resolved with a potentially
  different DEV version of the current version, which may be the cause of the error.
  Try to re-resolve them in the current version, or consider deleting them if that fails:
    /Users/ian/.julia/environments/v1.12/Manifest.toml
- You may have a partially installed environment. Try `Pkg.instantiate()`
  to ensure all packages in the environment are installed.
- Or, if you have LibGit2 checked out for development and have
  added LibGit2_jll as a dependency but haven't updated your primary
  environment's manifest file, try `Pkg.resolve()`.
- Otherwise you may need to report an issue with LibGit2
...
```

* Remove llvm-muladd pass and move it's functionality to to llvm-simdloop (#55802)

Closes https://github.com/JuliaLang/julia/issues/55785

I'm not sure if we want to backport this like this. Because that removes
some functionality (the pass itself). So LLVM.jl and friends might need
annoying version code. We can maybe keep the code there and just not run
the pass in a backport.

* Fix implicit `convert(String, ...)` in several places (#56174)

This removes several `convert(String, ...)` from this code, which really
shouldn't be something we invalidate on in the first place (see
https://github.com/JuliaLang/julia/issues/56173) but this is still an
improvement in code quality so let's take it.

* Change annotations to use a NamedTuple (#55741)

Due to popular demand, the type of annotations is to be changed from a
`Tuple{UnitRange{Int}, Pair{Symbol, Any}}` to a `NamedTuple{(:region,
:label, :value), Tuple{UnitRange{Int}, Symbol,
Any}}`.

This requires the expected code churn to `strings/annotated.jl`, and
some changes to the StyledStrings and JuliaSyntaxHighlighting libraries.

Closes #55249 and closes #55245.

* Getting rid of mmtk_julia.c in the binding and moving it to gc-mmtk.c

* Trying to organize and label the code in gc-mmtk.c

* Remove redundant `convert` in `_setindex!` (#56178)

Follow up to #56034, ref:
https://github.com/JuliaLang/julia/pull/56034#discussion_r1798573573.

---------

Co-authored-by: Cody Tapscott <[email protected]>

* Improve type inference of Artifacts.jl (#56118)

This also has some changes that move platform selection to compile time
together with
https://github.com/JuliaPackaging/JLLWrappers.jl/commit/45cc04963f3c99d4eb902f97528fe16fc37002cc,
move the platform selection to compile time.

(this helps juliac a ton)

* Initial support for RISC-V (#56105)

Rebase and extension of @alexfanqi's initial work on porting Julia to
RISC-V. Requires LLVM 19.

Tested on a VisionFive2, built with:

```make
MARCH := rv64gc_zba_zbb
MCPU := sifive-u74

USE_BINARYBUILDER:=0

DEPS_GIT = llvm
override LLVM_VER=19.1.1
override LLVM_BRANCH=julia-release/19.x
override LLVM_SHA1=julia-release/19.x
```

```julia-repl
❯ ./julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.12.0-DEV.1374 (2024-10-14)
 _/ |\__'_|_|_|\__'_|  |  riscv/25092a3982* (fork: 1 commits, 0 days)
|__/                   |

julia> versioninfo(; verbose=true)
Julia Version 1.12.0-DEV.1374
Commit 25092a3982* (2024-10-14 09:57 UTC)
Platform Info:
  OS: Linux (riscv64-unknown-linux-gnu)
  uname: Linux 6.11.3-1-riscv64 #1 SMP Debian 6.11.3-1 (2024-10-10) riscv64 unknown
  CPU: …
vtjnash added a commit that referenced this pull request Oct 31, 2024
Hopefully there aren't any others like this hiding around? Not useful to
make a new closure for every method that we inline, since we just called
`===` inside it
udesou added a commit to mmtk/julia that referenced this pull request Dec 6, 2024
* Implement faster `issubset` for `CartesianIndices{N}` (#56282)

Co-authored-by: xili <[email protected]>

* Improve doc example: Extracting the type parameter from a super-type (#55983)

Documentation describes the correct way of extracting the element type
of a supertype:

https://docs.julialang.org/en/v1/manual/methods/#Extracting-the-type-parameter-from-a-super-type

However, one of the examples to showcase this is nonsensical since it is
a union of multiple element types.
I have replaced this example with a union over the dimension.
Now, the `eltype_wrong` function still gives a similar error, yet the
correct way returns the unambiguous answer.

---------

Co-authored-by: Lilith Orion Hafner <[email protected]>

* llvmpasses: force vector width for compatibility with non-x86 hosts. (#56300)

The pipeline-prints test currently fails when running on an
aarch64-macos device:

```
/Users/tim/Julia/src/julia/test/llvmpasses/pipeline-prints.ll:309:23: error: AFTERVECTORIZATION: expected string not found in input
; AFTERVECTORIZATION: vector.body
                      ^
<stdin>:2:40: note: scanning from here
; *** IR Dump Before AfterVectorizationMarkerPass on julia_f_199 ***
                                       ^
<stdin>:47:27: note: possible intended match here
; *** IR Dump Before AfterVectorizationMarkerPass on jfptr_f_200 ***
                          ^

Input file: <stdin>
Check file: /Users/tim/Julia/src/julia/test/llvmpasses/pipeline-prints.ll

-dump-input=help explains the following input dump.

Input was:
<<<<<<
             1: opt: WARNING: failed to create target machine for 'x86_64-unknown-linux-gnu': unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple.
             2: ; *** IR Dump Before AfterVectorizationMarkerPass on julia_f_199 ***
check:309'0                                            X~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ error: no match found
             3: define i64 @julia_f_199(ptr addrspace(10) noundef nonnull align 16 dereferenceable(40) %0) #0 !dbg !4 {
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             4: top:
check:309'0     ~~~~~
             5:  %1 = call ptr @julia.get_pgcstack()
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             6:  %ptls_field = getelementptr inbounds ptr, ptr %1, i64 2
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             7:  %ptls_load45 = load ptr, ptr %ptls_field, align 8, !tbaa !8
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
             .
            42:
check:309'0     ~
            43: L41: ; preds = %L41.loopexit, %L17, %top
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            44:  %value_phi10 = phi i64 [ 0, %top ], [ %7, %L17 ], [ %.lcssa, %L41.loopexit ]
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            45:  ret i64 %value_phi10, !dbg !52
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            46: }
check:309'0     ~~
            47: ; *** IR Dump Before AfterVectorizationMarkerPass on jfptr_f_200 ***
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
check:309'1                               ?                                           possible intended match
            48: ; Function Attrs: noinline optnone
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            49: define nonnull ptr addrspace(10) @jfptr_f_200(ptr addrspace(10) %0, ptr noalias nocapture noundef readonly %1, i32 %2) #1 {
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            50: top:
check:309'0     ~~~~~
            51:  %3 = call ptr @julia.get_pgcstack()
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
            52:  %4 = getelementptr inbounds ptr addrspace(10), ptr %1, i32 0
check:309'0     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
             .
             .
             .
>>>>>>

--

********************
Failed Tests (1):
  Julia :: pipeline-prints.ll
```

The problem is that these tests assume x86_64, which fails because the
target isn't available, so it presumably uses the native target which
has different vectorization characteristics:

```
❯ ./usr/tools/opt --load-pass-plugin=libjulia-codegen.dylib -passes='julia' --print-before=AfterVectorization -o /dev/null ../../test/llvmpasses/pipeline-prints.ll
./usr/tools/opt: WARNING: failed to create target machine for 'x86_64-unknown-linux-gnu': unable to get target for 'x86_64-unknown-linux-gnu', see --version and --triple.
```

There's other tests that assume this (e.g. the `fma` cpufeatures one),
but they don't fail, so I've left them as is.

* Reduce generic matrix*vector latency (#56289)

```julia
julia> using LinearAlgebra

julia> A = rand(Int,4,4); x = rand(Int,4); y = similar(x);

julia> @time mul!(y, A, x, 2, 2);
  0.330489 seconds (792.22 k allocations: 41.519 MiB, 8.75% gc time, 99.99% compilation time) # master
  0.134212 seconds (339.89 k allocations: 17.103 MiB, 15.23% gc time, 99.98% compilation time) # This PR
```
Main changes:
- `generic_matvecmul!` and `_generic_matvecmul!` now accept `alpha` and
`beta` arguments instead of `MulAddMul(alpha, beta)`. The methods that
accept a `MulAddMul(alpha, beta)` are also retained for backward
compatibility, but these now forward `alpha` and `beta`, instead of the
other way around.
- Narrow the scope of the `@stable_muladdmul` applications. We now
construct the `MulAddMul(alpha, beta)` object only where it is needed in
a function call, and we annotate the call site with `@stable_muladdmul`.
This leads to smaller branches.
- Create a new internal function with methods for the `'N'`, `'T'` and
`'C'` cases, so that firstly, there's less code duplication, and
secondly, the `_generic_matvecmul!` method is now simple enough to
enable constant propagation. This eliminates the unnecessary branches,
and only the one that is taken is compiled.

Together, this reduces the TTFX substantially.

* Type `Base.is_interactive` as `Bool` (#56303)

Before, typing `Base.is_interactive = 7` would cause weird internal REPL
failures down the line. Now, it throws an InexactError and has no
impact.

* REPL: don't complete str and cmd macros when the input matches the internal name like `r_` to `r"` (#56254)

* fix REPL test if a "juliadev" directory exists in home (#56218)

* Fix trampoline warning on x86 as well (#56280)

* typeintersect: more fastpath to skip intersect under circular env (#56304)

fix #56040

* Preserve type in `first` for `OneTo` (#56263)

With this PR,
```julia
julia> first(Base.OneTo(10), 4)
Base.OneTo(4)
```
Previously, this would have used indexing to return a `UnitRange`. This
is probably the only way to slice a `Base.OneTo` and obtain a
`Base.OneTo` back.

* Matmul: dispatch on specific blas paths using an enum  (#55002)

This expands on the approach taken by
https://github.com/JuliaLang/julia/pull/54552.

We pass on more type information to `generic_matmatmul_wrapper!`, which
lets us convert the branches to method dispatches. This helps spread the
latency around, so that instead of compiling all the branches in the
first call, we now compile the branches only when they are actually
taken. While this reduces the latency in individual branches, there is
no reduction in latency if all the branches are reachable.

```julia
julia> A = rand(2,2);

julia> @time A * A;
  0.479805 seconds (809.66 k allocations: 40.764 MiB, 99.93% compilation time) # 1.12.0-DEV.806
  0.346739 seconds (633.17 k allocations: 31.320 MiB, 99.90% compilation time) # This PR

julia> @time A * A';
  0.030413 seconds (101.98 k allocations: 5.359 MiB, 98.54% compilation time) # v1.12.0-DEV.806
  0.148118 seconds (219.51 k allocations: 11.652 MiB, 99.72% compilation time) # This PR
```
The latency is spread between the two calls here.

In fresh sessions:
```julia
julia> A = rand(2,2);

julia> @time A * A';
  0.473630 seconds (825.65 k allocations: 41.554 MiB, 99.91% compilation time) # v1.12.0-DEV.806
  0.490305 seconds (774.87 k allocations: 38.824 MiB, 99.90% compilation time) # This PR
```
In this case, both the `syrk` and `gemm` branches are reachable, so
there is no reduction in latency.

Analogously, there is a reduction in latency in the second set of matrix
multiplications where we call `symm!/hemm!` or `_generic_matmatmul`:

```julia
julia> using LinearAlgebra

julia> A = rand(2,2);

julia> @time Symmetric(A) * A;
  0.711178 seconds (2.06 M allocations: 103.878 MiB, 2.20% gc time, 99.98% compilation time) # v1.12.0-DEV.806
  0.540669 seconds (904.12 k allocations: 43.576 MiB, 2.60% gc time, 97.36% compilation time) # This PR
```

* Scaling `mul!` for generic `AbstractArray`s (#56313)

This improves performance in the scaling `mul!` for `StridedArray`s by
using loops instead of broadcasting.
```julia
julia> using LinearAlgebra

julia> A = zeros(200,200); C = similar(A);

julia> @btime mul!($C, $A, 1, 2, 2);
  19.180 μs (0 allocations: 0 bytes) # nightly v"1.12.0-DEV.1479"
  11.361 μs (0 allocations: 0 bytes) # This PR
```
The latency is reduced as well for the same reason.
```julia
julia> using LinearAlgebra

julia> A = zeros(2,2); C = similar(A);

julia> @time mul!(C, A, 1, 2, 2);
  0.203034 seconds (522.94 k allocations: 27.011 MiB, 14.95% gc time, 99.97% compilation time) # nightly
  0.034713 seconds (59.16 k allocations: 2.962 MiB, 99.91% compilation time) # This PR
```
Thirdly, I've replaced the `.*ₛ` calls by explicit branches. This fixes
the following:
```julia
julia> A = [zeros(2), zeros(2)]; C = similar(A);

julia> mul!(C, A, 1)
ERROR: MethodError: no method matching +(::Vector{Float64}, ::Bool)
```
After this,
```julia
julia> mul!(C, A, 1)
2-element Vector{Vector{Float64}}:
 [0.0, 0.0]
 [0.0, 0.0]
```
Also, I've added `@stable_muladdmul` annotations to the `generic_mul!`
call, but moved it within the loop to narrow its scope. This doesn't
increase the latency, while making the call type-stable.

```julia
julia> D = Diagonal(1:2); C = similar(D);

julia> @time mul!(C, D, 1, 2, 2);
  0.248385 seconds (898.18 k allocations: 47.027 MiB, 12.30% gc time, 99.96% compilation time) # nightly
  0.249940 seconds (919.80 k allocations: 49.128 MiB, 11.36% gc time, 99.99% compilation time) # This PR
```

* InteractiveUtils.jl: fixes issue where subtypes resolves bindings and causes deprecation warnings  (#56306)

The current version of `subtypes` will throw deprecation errors even if
no one is using the deprecated bindings.

A similar bug was fixed in Aqua.jl -
https://github.com/JuliaTesting/Aqua.jl/pull/89/files

See discussion here: 

- https://github.com/JuliaIO/ImageMagick.jl/issues/235 (for identifying
the problem)
- https://github.com/simonster/Reexport.jl/issues/42 (for pointing to
the issue in Aqua.jl)
- https://github.com/JuliaTesting/Aqua.jl/pull/89/files (for the fix in
Aqua.jl)

This adds the `isbindingresolved` test to the `subtypes` function to
avoid throwing deprecation warnings. It also adds a test to check that
this doesn't happen.

---

On the current master branch (before the fix), the added test shows: 
 
```
WARNING: using deprecated binding InternalModule.MyOldType in OuterModule.
, use MyType instead.
Subtypes and deprecations: Test Failed at /home/dgleich/devextern/julia/usr/share/julia/stdlib/v1.12/Test/src/Test.jl:932
  Expression: isempty(stderr_content)
   Evaluated: isempty("WARNING: using deprecated binding InternalModule.MyOldType in OuterModule.\n, use MyType instead.\n")
Test Summary:             | Fail  Total  Time
Subtypes and deprecations |    1      1  2.8s
ERROR: LoadError: Some tests did not pass: 0 passed, 1 failed, 0 errored, 0 broken.
in expression starting at /home/dgleich/devextern/julia/stdlib/InteractiveUtils/test/runtests.jl:841
ERROR: Package InteractiveUtils errored during testing
```

---

Using the results of this pull request:

```
@test_nowarn subtypes(Integer);
```

passes without error. The other tests pass too.

* [CRC32c] Support AbstractVector{UInt8} as input (#56164)

This is a similar PR to https://github.com/JuliaIO/CRC32.jl/pull/12

I added a generic fallback method for `AbstractVector{UInt8}` similar to
the existing generic `IO` method.

Co-authored-by: Steven G. Johnson <[email protected]>

* Put `jl_gc_new_weakref` in a header file again (#56319)

* use textwidth for string display truncation (#55442)

It makes a big difference when displaying strings that have width-2 or
width-0 characters.

* Use `pwd()` as the default directory to walk in `walkdir` (#55550)

* Reset mtime of BOLTed files to prevent make rebuilding targets (#55587)

This simplifies the `finish_stage` rule.

Co-authored-by: Zentrik <[email protected]>

* add docstring note about `displaysize` and `IOContext` with `context` (#55510)

* LinearAlgebra: replace some hardcoded loop ranges with axes (#56243)

These are safer in general, as well as easier to read.

Also, narrow the scopes of some `@inbounds` annotations.

* inference: fix `[modifyfield!|replacefield!]_tfunc`s (#56310)

Currently the following code snippet results in an internal error:
```julia
julia> func(x) = @atomic :monotonic x[].count += 1;

julia> let;Base.Experimental.@force_compile
           x = Ref(nothing)
           func(x)
       end
Internal error: during type inference of
...
```

This issue is caused by the incorrect use of `_fieldtype_tfunc(𝕃, o, f)`
within `modifyfield!_tfunc`, specifically because `o` should be
`widenconst`ed, but it isn’t. By using `_fieldtype_tfunc` correctly, we
can avoid the error through error-catching in `abstract_modifyop!`. This
commit also includes a similar fix for `replacefield!_tfunc` as well.

* inference: don't allow `SSAValue`s in assignment lhs (#56314)

In `InferenceState` the lhs of a `:=` expression should only contain
`GlobalRef` or `SlotNumber` and no other IR elements. Currently when
`SSAValue` appears in `lhs`, the invalid assignment effect is somehow
ignored, but this is incorrect anyway, so this commit removes that
check. Since `SSAValue` should not appear in `lhs` in the first place,
this is not a significant change though.

* Fix `unsafe_read` for `IOBuffer` with non dense data (#55776)

Fixes one part of #54636 

It was only safe to use the following if `from.data` was a dense vector
of bytes.
```julia
GC.@preserve from unsafe_copyto!(p, pointer(from.data, from.ptr), adv)
```

This PR adds a fallback suggested by @matthias314 in
https://discourse.julialang.org/t/copying-bytes-from-abstractvector-to-ptr/119408/7

* support `isless` for zero-dimensional `AbstractArray`s (#55772)

Fixes #55771

* inference: don't add backdge when `applicable` inferred to return `Bool` (#56316)

Also just as a minor backedge reduction optimization, this commit avoids
adding backedges when `applicable` is inferred to return `::Bool`.

* Mark `require_one_based_indexing` and `has_offset_axes` as public (#56196)

The discussion here mentions `require_one_based_indexing` being part of
the public API: https://github.com/JuliaLang/julia/pull/43263

Both functions are also documented (albeit in the dev docs): 
* `require_one_based_indexing`:
https://docs.julialang.org/en/v1/devdocs/offset-arrays/#man-custom-indices
* `has_offset_axes`:
https://docs.julialang.org/en/v1/devdocs/offset-arrays/#For-objects-that-mimic-AbstractArray-but-are-not-subtypes

Towards https://github.com/JuliaLang/julia/issues/51335.

---------

Co-authored-by: Matt Bauman <[email protected]>

* Avoid some allocations in various `println` methods (#56308)

* Add a developer documentation section to the `LinearAlgebra` docs (#56324)

Functions that are meant for package developers may go here, instead of
the main section that is primarily for users.

* drop require lock when not needed during loading to allow parallel precompile loading (#56291)

Fixes `_require_search_from_serialized` to first acquire all
start_loading locks (using a deadlock-free batch-locking algorithm)
before doing stalechecks and the rest, so that all the global
computations happen behind the require_lock, then the rest can happen
behind module-specific locks, then (as before) extensions can be loaded
in parallel eventually after `require` returns.

* Make `String(::Memory)` copy (#54457)

A more targeted fix of #54369 than #54372

Preserves the performance improvements added in #53962 by creating a new
internal `_unsafe_takestring!(v::Memory{UInt8})` function that does what
`String(::Memory{UInt8})` used to do.

* 🤖 [master] Bump the Pkg stdlib from 799dc2d54 to 116ba910c (#56336)

Stdlib: Pkg
URL: https://github.com/JuliaLang/Pkg.jl.git
Stdlib branch: master
Julia branch: master
Old commit: 799dc2d54
New commit: 116ba910c
Julia version: 1.12.0-DEV
Pkg version: 1.12.0
Bump invoked by: @IanButterworth
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
https://github.com/JuliaLang/Pkg.jl/compare/799dc2d54c4e809b9779de8c604564a5b3befaa0...116ba910c74ab565d348aa8a50d6dd10148f11ab

```
$ git log --oneline 799dc2d54..116ba910c
116ba910c fix Base.unreference_module call (#4057)
6ed1d2f40 do not show right hand progress without colors (#4047)
```

Co-authored-by: Dilum Aluthge <[email protected]>

* Wall-time/all tasks profiler (#55889)

One limitation of sampling CPU/thread profiles, as is currently done in
Julia, is that they primarily capture samples from CPU-intensive tasks.

If many tasks are performing IO or contending for concurrency primitives
like semaphores, these tasks won’t appear in the profile, as they aren't
scheduled on OS threads sampled by the profiler.

A wall-time profiler, like the one implemented in this PR, samples tasks
regardless of OS thread scheduling. This enables profiling of IO-heavy
tasks and detecting areas of heavy contention in the system.

Co-developed with @nickrobinson251.

* recommend explicit `using Foo: Foo, ...` in package code (was: "using considered harmful") (#42080)

I feel we are heading up against a "`using` crisis" where any new
feature that is implemented by exporting a new name (either in Base or a
package) becomes a breaking change. This is already happening
(https://github.com/JuliaGPU/CUDA.jl/pull/1097,
https://github.com/JuliaWeb/HTTP.jl/pull/745) and as projects get bigger
and more names are exported, the likelihood of this rapidly increases.

The flaw in `using Foo` is fundamental in that you cannot lexically see
where a name comes from so when two packages export the same name, you
are screwed. Any code that relies on `using Foo` and then using an
exported name from `Foo` is vulnerable to another dependency exporting
the same name.
Therefore, I think we should start to strongly discourage the use of
`using Foo` and only recommend `using Foo` for ephemeral work (e.g. REPL
work).

---------

Co-authored-by: Dilum Aluthge <[email protected]>
Co-authored-by: Mason Protter <[email protected]>
Co-authored-by: Max Horn <[email protected]>
Co-authored-by: Matt Bauman <[email protected]>
Co-authored-by: Alex Arslan <[email protected]>
Co-authored-by: Ian Butterworth <[email protected]>
Co-authored-by: Neven Sajko <[email protected]>

* Change some hardcoded loop ranges to axes in dense linalg functions (#56348)

These should be safer in general, and are also easier to reason about.

* Make `LinearAlgebra.haszero` public (#56223)

The trait `haszero` is used to check if a type `T` has a unique zero
defined using `zero(T)`. This lets us dispatch to optimized paths
without losing generality. This PR makes the function public so that
this may be extended by packages (such as `StaticArrays`).

* remove spurious parens in profiler docs (#56357)

* Fix `log_quasitriu` for internal scaling `s=0` (#56311)

This PR is a potential fix for #54833.

## Description
The function
https://github.com/JuliaLang/julia/blob/2a06376c18afd7ec875335070743dcebcd85dee7/stdlib/LinearAlgebra/src/triangular.jl#L2220
computes $\boldsymbol{A}^{\dfrac{1}{2^s}} - \boldsymbol{I}$ for a
real-valued $2\times 2$ matrix $\boldsymbol{A}$ using Algorithm 5.1 in
[R1]. However, the algorithm in [R1] as well as the above function do
not handle the case $s=0.$ This fix extends the function to compute
$\boldsymbol{A}^{\dfrac{1}{2^s}} - \boldsymbol{I} \Bigg|_{s=0} =
\boldsymbol{A} - \boldsymbol{I}.$

## Checklist
- [X] Fix code: `stdlib\LinearAlgebra\src\triangular.jl` in function
`_sqrt_pow_diag_block_2x2!(A, A0, s)`.
- [X] Add test case: `stdlib\LinearAlgebra\test\triangular.jl`.
- [X] Update `NEWS.md`.
- [X] Testing and self review.

|  Tag  | Reference |
| --- | --- |
| <nobr>[R1]</nobr> | Al-Mohy, Awad H. and Higham, Nicholas J. "Improved
Inverse Scaling and Squaring Algorithms for the Matrix Logarithm", 2011,
url: https://eprints.maths.manchester.ac.uk/1687/1/paper11.pdf |

---------

Co-authored-by: Daniel Karrasch <[email protected]>
Co-authored-by: Oscar Smith <[email protected]>

* loading: clean up more concurrency issues (#56329)

Guarantee that `__init__` runs before `using` returns. Could be slightly
breaking for people that do crazy things inside `__init__`, but just
don't do that. Since extensions then probably load after `__init__` (or
at least, run their `__init__` after), this is a partial step towards
changing things so that extensions are guaranteed to load if using all
of their triggers before the corresponding `using` returns

Fixes #55556

* make `_unsetindex` fast for isbits eltype (#56364)

fixes
https://github.com/JuliaLang/julia/issues/56359#issuecomment-2441537634
```
using Plots

function f(n)
    a = Vector{Int}(undef, n)
    s = time_ns()
    resize!(a, 8)
    time_ns() - s
end

x = 8:10:1000000
y = f.(x)

plot(x, y)
```

![image](https://github.com/user-attachments/assets/5a1fb963-7d44-4cac-bedd-6f0733d4cf56)

* improved `eltype` for `flatten` with tuple argument (#55946)

We have always had
```
julia> t = (Int16[1,2], Int32[3,4]); eltype(Iterators.flatten(t))
Any
```
With this PR, the result is `Signed` (`promote_typejoin` applied to the
element types of the tuple elements).

The same applies to `NamedTuple`:
```
julia> nt = (a = [1,2], b = (3,4)); eltype(Iterators.flatten(nt))
Any     # old
Int64   # new
```

* Reland "Reroute (Upper/Lower)Triangular * Diagonal through __muldiag #55984" (#56270)

This relands #55984 which was reverted in #56267. Previously, in #55984,
the destination in multiplying triangular matrices with diagonals was
also assumed to be triangular, which is not necessarily the case in
`mul!`. Tests for this case, however, were being run
non-deterministically, so this wasn't caught by the CI runs.

This improves performance:
```julia
julia> U = UpperTriangular(rand(100,100)); D = Diagonal(rand(size(U,2))); C = similar(U);

julia> @btime mul!($C, $D, $U);
  1.517 μs (0 allocations: 0 bytes) # nightly
  1.116 μs (0 allocations: 0 bytes) # This PR
```

* Add one-arg `norm` method (#56330)

This reduces the latency of `norm` calls, as the single-argument method
lacks branches and doesn't use aggressive constant propagation, and is
therefore simpler to compile. Given that a lot of `norm` calls use
`p==2`, it makes sense for us to reduce the latency on this call.
```julia
julia> using LinearAlgebra

julia> A = rand(2,2);

julia> @time norm(A);
  0.247515 seconds (390.09 k allocations: 19.993 MiB, 33.57% gc time, 99.99% compilation time) # master
  0.067201 seconds (121.24 k allocations: 6.067 MiB, 99.98% compilation time) # this PR
```
An example of an improvement in ttfx because of this:
```julia
julia> A = rand(2,2);

julia> @time A ≈ A;
  0.556475 seconds (1.16 M allocations: 59.949 MiB, 24.14% gc time, 100.00% compilation time) # master
  0.333114 seconds (899.85 k allocations: 46.574 MiB, 8.11% gc time, 99.99% compilation time) # this PR
```

* fix a forgotten rename `readuntil`  -> `copyuntil` (#56380)

Fixes https://github.com/JuliaLang/julia/issues/56352, with the repro in
that issue:

```
Master:
  1.114874 seconds (13.01 M allocations: 539.592 MiB, 3.80% gc time)

After:
   0.369492 seconds (12.99 M allocations: 485.031 MiB, 10.73% gc time)

1.10:
    0.341114 seconds (8.36 M allocations: 454.242 MiB, 2.69% gc time)
```

* remove unnecessary operations from `typejoin_union_tuple` (#56379)

Removes the unnecessary call to `unwrap_unionall` and type assertion.

* precompile: fix performance issues with IO (#56370)

The string API here rapidly becomes unusably slow if dumping much debug
output during precompile. Fix the design here to use an intermediate IO
instead to prevent that.

* cache the `find_all_in_cache_path` call during parallel precompilation (#56369)

Before (in an environment with DifferentialEquations.jl):

```julia
julia> @time Pkg.precompile()
  0.733576 seconds (3.44 M allocations: 283.676 MiB, 6.24% gc time)

julia> isfile_calls[1:10]
10-element Vector{Pair{String, Int64}}:
        "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Printf/3FQLY_zHycD.ji" => 178
        "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Printf/3FQLY_xxrt3.ji" => 178
         "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Dates/p8See_xxrt3.ji" => 158
         "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/Dates/p8See_zHycD.ji" => 158
          "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/TOML/mjrwE_zHycD.ji" => 155
          "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/TOML/mjrwE_xxrt3.ji" => 155
                                     "/home/kc/.julia/compiled/v1.12/Preferences/pWSk8_4Qv86.ji" => 152
                                     "/home/kc/.julia/compiled/v1.12/Preferences/pWSk8_juhqb.ji" => 152
 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/StyledStrings/UcVoM_zHycD.ji" => 144
 "/home/kc/.julia/juliaup/julia-nightly/share/julia/compiled/v1.12/StyledStrings/UcVoM_xxrt3.ji" => 144
 ```  


After:

```julia
julia> @time Pkg.precompile()
  0.460077 seconds (877.59 k allocations: 108.075 MiB, 4.77% gc time)

julia> isfile_calls[1:10]
  10-element Vector{Pair{String, Int64}}:
"/tmp/jl_a5xFWK/Project.toml" => 15
"/tmp/jl_a5xFWK/Manifest.toml" => 7
"/home/kc/.julia/registries/General.toml" => 6

"/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Markdown/src/Markdown.jl"
=> 3

"/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Serialization/src/Serialization.jl"
=> 3

"/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/Distributed/src/Distributed.jl"
=> 3

"/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/UUIDs/src/UUIDs.jl"
=> 3

"/home/kc/.julia/juliaup/julia-nightly/share/julia/stdlib/v1.12/LibCURL/src/LibCURL.jl"
=> 3
```

Performance is improved and we are not calling `isfile` on a bunch of the same ji files hundreds times.

Benchmark is made on a linux machine so performance diff should be a lot better on Windows where these `isfile_casesensitive` call is much more expensive.

Fixes https://github.com/JuliaLang/julia/issues/56366

---------

Co-authored-by: KristofferC <[email protected]>
Co-authored-by: Ian Butterworth <[email protected]>

* [docs] Fix note admonition in llvm-passes.md (#56392)

At the moment this is rendered incorrectly:
https://docs.julialang.org/en/v1.11.1/devdocs/llvm-passes/#JuliaLICM

* structure-preserving broadcast for `SymTridiagonal` (#56001)

With this PR, certain broadcasting operations preserve the structure of
a `SymTridiagonal`:
```julia
julia> S = SymTridiagonal([1,2,3,4], [1,2,3])
4×4 SymTridiagonal{Int64, Vector{Int64}}:
 1  1  ⋅  ⋅
 1  2  2  ⋅
 ⋅  2  3  3
 ⋅  ⋅  3  4

julia> S .* 2
4×4 SymTridiagonal{Int64, Vector{Int64}}:
 2  2  ⋅  ⋅
 2  4  4  ⋅
 ⋅  4  6  6
 ⋅  ⋅  6  8
```
This was deliberately disabled on master, but I couldn't find any test
that fails if this is enabled.

* 🤖 [master] Bump the Pkg stdlib from 116ba910c to 9f8e11a4c (#56386)

Stdlib: Pkg
URL: https://github.com/JuliaLang/Pkg.jl.git
Stdlib branch: master
Julia branch: master
Old commit: 116ba910c
New commit: 9f8e11a4c
Julia version: 1.12.0-DEV
Pkg version: 1.12.0
Bump invoked by: @IanButterworth
Powered by:
[BumpStdlibs.jl](https://github.com/JuliaLang/BumpStdlibs.jl)

Diff:
https://github.com/JuliaLang/Pkg.jl/compare/116ba910c74ab565d348aa8a50d6dd10148f11ab...9f8e11a4c0efb3b68a1e25a33f372f398c89cd66

```
$ git log --oneline 116ba910c..9f8e11a4c
9f8e11a4c strip out tree_hash for stdlibs that have have been freed in newer julia versions (#4062)
c0df25a47 rm dead code (#4061)
```

Co-authored-by: Dilum Aluthge <[email protected]>

* load extensions with fewer triggers earlier (#49891)

Aimed to support the use case in
https://github.com/JuliaLang/julia/issues/48734#issuecomment-1554626135.

https://github.com/KristofferC/ExtSquared.jl is an example, see
specifically
https://github.com/KristofferC/ExtSquared.jl/blob/ded7c57d6f799674e3310b8174dfb07591bbe025/ext/BExt.jl#L4.

I think this makes sense, happy for a second pair of eyes though.

cc @termi-official

---------

Co-authored-by: KristofferC <[email protected]>
Co-authored-by: Cody Tapscott <[email protected]>

* Dispatch in generic_matmatmul (#56384)

Replacing the branches by dispatch reduces latency, presumably because
there's less dead code in the method.
```julia
julia> using LinearAlgebra

julia> A = rand(Int,2,2); B = copy(A); C = similar(A);

julia> @time mul!(C, A, B, 1, 2);
  0.363944 seconds (1.65 M allocations: 84.584 MiB, 37.57% gc time, 99.99% compilation time) # master
  0.102676 seconds (176.55 k allocations: 8.904 MiB, 27.04% gc time, 99.97% compilation time) # this PR
```
The latency is now distributed between the different branches:
```julia
julia> @time mul!(C, A, B, 1, 2);
  0.072441 seconds (176.55 k allocations: 8.903 MiB, 99.97% compilation time)

julia> @time mul!(C, A', B, 1, 2);
  0.085817 seconds (116.44 k allocations: 5.913 MiB, 99.96% compilation time: 4% of which was recompilation)

julia> @time mul!(C, A', B', 1, 2);
  0.345337 seconds (1.07 M allocations: 54.773 MiB, 25.77% gc time, 99.99% compilation time: 40% of which was recompilation)
```
It would be good to look into why there's recompilation in the last
case, but the branch is less commonly taken than the others that have
significantly lower latency after this PR.

* Add `atol` to addmul tests (#56210)

This avoids the issues as in
https://github.com/JuliaLang/julia/issues/55781 and
https://github.com/JuliaLang/julia/issues/55779 where we compare small
numbers using a relative tolerance. Also, in this PR, I have added an
extra test, so now we compare both `A * B * alpha + C * beta` and `A * B
* alpha - C * beta` with the corresponding in-place versions. The idea
is that if the terms `A * B * alpha` and ` C * beta` have similar
magnitudes, at least one of the two expressions will usually result in a
large enough number that may be compared using a relative tolerance.

I am unsure if the `atol` chosen here is optimal, as I have ballparked
it to use the maximum `eps` by looking at all the `eltype`s involved.

Fixes #55781
Fixes #55779

* Export jl_gc_new_weakref again via julia.h (#56373)

This is how it used for at least Julia 1.0 - 1.11

Closes #56367

* InteractiveUtils: define `InteractiveUtils.@code_ircode` (#56390)

* Fix some missing write barriers and add some helpful comments (#56396)

I was trying some performance optimization which didn't end up working
out, but in the process I found two missing write barriers and added
some helpful comments for future readers, so that part is probably still
useful.

* compiler: fix specialization mistake introduced by #40985 (#56404)

Hopefully there aren't any others like this hiding around? Not useful to
make a new closure for every method that we inline, since we just called
`===` inside it

* Avoid racy double-load of binding restriction in `import_module` (#56395)

Fixes #56333

* define `InteractiveUtils.@infer_[return|exception]_type` (#56398)

Also simplifies the definitions of `@code_typed` and the other similar
macros.

* irinterp: set `IR_FLAG_REFINED` for narrowed `PhiNode`s (#56391)

`adce_pass!` can transform a `Union`-type `PhiNode` into a narrower
`PhiNode`, but in such cases, the `IR_FLAG_REFINED` flag isn’t set on
that `PhiNode` statement. By setting this flag, irinterp can perform
statement reprocessing using the narrowed `PhiNode`, enabling type
stability in cases like JuliaLang/julia#56387.

- fixes JuliaLang/julia#56387

* document isopen(::Channel) (#56376)

This PR has two purposes -- 
1) Add some documentation for public API
2) Add a small note about a footgun I've hit a few times: `!isopen(ch)`
does not mean that you are "done" with the channel because buffered
channels can still have items left in them that need to be taken.

---------

Co-authored-by: CY Han <[email protected]>

* Make build system respect `FORCE_COLOR` and `NO_COLOR` settings (#56346)

Follow up to #53742, but for the build system.  CC: @omus.

* Add `edges` vector to CodeInstance/CodeInfo to keep backedges as edges (#54894)

Appears to add about 11MB (128MB to 139MB) to the system image, and to 
decrease the stdlib size by 55 MB (325MB to 270MB), so seems overall 
favorable right now. The edges are computed following the encoding 
<https://hackmd.io/sjPig55kS4a5XNWC6HmKSg?both#Edges-Encoding> to
correctly reflect the backedges.

Co-authored-by: Shuhei Kadowaki <[email protected]>

* docs: remove `dirname.c` from THIRDPARTY file (#56413)

- `dirname.c` was removed by
https://github.com/JuliaLang/julia/commit/c2cec7ad57102e4fbb733b8fb79d617a9524f0ae

* Allow ext → ext dependency if triggers are a strict superset (#56368) (#56402)

Forward port of #56368 - this was a pretty clean port, so it should be
good to go once tests pass.

* [docs] Fix rendering of warning admonition in llvm passes page (#56412)

Follow up to #56392: also the warning in
https://docs.julialang.org/en/v1.11.1/devdocs/llvm-passes/#Multiversioning
is rendered incorrectly because of a missing space.

* Fix dispatch for `rdiv!` with `LU` (#55764)

* Remove overwritten method of OffsetArray (#56414)

This is overwritten three definitions later in
`Base.reshape(A::OffsetArray, inds::Colon)`.

Should remove warnings I saw when testing a package that uses it.

* Add a missing GC root in constant declaration (#56408)

As pointed out in
https://github.com/JuliaLang/julia/pull/56224#discussion_r1816974147.

* Teach compiler about partitioned bindings (#56299)

This commit teaches to compiler to update its world bounds whenever it
looks at a binding partition, making the compiler sound in the presence
of a partitioned binding. The key adjustment is that the compiler is no
longer allowed to directly query the binding table without recording the
world bounds, so all the various abstract evaluations that look at
bindings need to be adjusted and are no longer pure tfuncs. We used to
look at bindings a lot more, but thanks to earlier prep work to remove
unnecessary binding-dependent code (#55288, #55289 and #55271), these
changes become relatively straightforward.

Note that as before, we do not create any binding partitions by default,
so this commit is mostly preperatory.

---------

Co-authored-by: Shuhei Kadowaki <[email protected]>

* Restore JL_NOTSAFEPOINT in jl_stderr_obj (#56407)

This is not a function we're really using, but it's used in the
embedding examples, so I'm sure somebody would complain if I deleted it
or made it a safepoint, so let's just give the same best-effort result
as before.

* reland "Inlining: Remove outdated code path for GlobalRef movement (#46880)" (#56382)

From the description of the original PR:
> We used to not allow `GlobalRef` in `PhiNode` at all (because they
> could have side effects). However, we then change the IR to make
> side-effecting `GlobalRef`s illegal in statement position in general,
> so now `PhiNode`s values are just regular value position, so there's
> no reason any more to try to move `GlobalRef`s out to statement
> position in inlining. Moreover, doing so introduces a bunch of
> unnecessary `GlobalRef`s that weren't being moved back. We could fix
> that separately by setting appropriate flags, but it's simpler to just
> get rid of this special case entirely.

This change itself does not sound to have any issues, and in fact, it is
very useful for keeping the IR slim, especially in code generated by
Cassette-like systems, so I would like to reland it.

However, the original PR was reverted in JuliaLang/julia#46951 due to
bugs like JuliaLang/julia#46940 and JuliaLang/julia#46943. I could not
reproduce these bugs on my end (maybe they have been fixed on some
GC-side fixes?), so I believe relanding the original PR’s changes would
not cause any issues, but it is necessary to confirm that similar
problems do not arise before merging this PR.

* copy effects key to `Base.infer_effects` (#56363)

Copied from the docstring of `Core.Compiler.Effects`, this makes it
easier to figure out what the output of `Base.infer_effects` is actually
telling you.

* Fix `make install` for asan build (#56347)

Now the makescript finds libclang_rt.asan-x86_64.so for example.

The change from `-0` to `-1` is as with `-1`, `libclang_rt.asan-*` is
searched for in `usr/lib/julia` instead of `usr/lib`.

* Add dims check to triangular mul (#56393)

This adds a dimension check to triangular matrix multiplication methods.
While such checks already exist in the individual branches (occasionally
within `BLAS` methods), having these earlier would permit certain
optimizations, as we are assured that the axes are compatible. This
potentially duplicates the checks, but this is unlikely to be a concern
given how cheap the checks are.

I've also reused the `check_A_mul_B!_sizes` function that is defined in
`bidiag.jl`, instead of hard-coding the checks.

Further, I've replaced some hard-coded loop ranges by the corresponding
`axes` and `first/lastindex` calls. These are identical under the
1-based indexing assumption, but the `axes` variants are easier to read
and reason about.

* clarify short-circuit && and || docs (#56420)

This clarifies the docs to explain that `a && b` is equivalent to `a ? b
: false` and that `a || b` is equivalent to `a ? true : b`.

In particular, this explains why the second argument does not need to be
a boolean value, which is a common point of confusion. (See e.g. [this
discourse
thread](https://discourse.julialang.org/t/internals-of-assignment-when-doing-short-circuit-evaluation/122178/2?u=stevengj).)

* docs: replace 'leaf types' with 'concrete types' (#56418)

Fixes #55044

---------

Co-authored-by: inkydragon <[email protected]>

* Remove aggressive constprop annotation on generic_matmatmul_wrapper! (#56400)

This annotation seems unnecessary, as the method gets inlined and
there's no computation being carried out using the value of the
constant.

* Clarify the FieldError docstring (#55222)

* Allow `Time`s to be rounded to `Period`s (#52629)

Co-authored-by: CyHan <[email protected]>
Co-authored-by: Curtis Vogt <[email protected]>

* Replace unconditional store with cmpswap to avoid deadlocking in jl_fptr_wait_for_compiled_addr (#56444)

That unconditional store could overwrite the actual compiled code in
that pointer, so make it a cmpswap

* Correct nothrow modeling of `get_binding_type` (#56430)

As pointed out in
https://github.com/JuliaLang/julia/pull/56299#discussion_r1826509185,
although the bug predates that PR.

* add tip for module docstrings before load (#56445)

* compiler: Strengthen some assertions and fix a couple small bugs (#56449)

* inference: minor follow-ups to JuliaLang/julia#56299 (#56450)

* Ensure that String(::Memory) returns only a String, not any owner (#56438)

Fixes #56435

* Take safepoint lock before going to sleep in the scheduler. (#56443)

This avoids a deadlock during exit. Between a thread going to sleep and
the thread exiting.

* Profile: mention `kill -s SIGUSR1 julia_pid` for Linux (#56441)

currentlu this route is mentioned in docs
https://docs.julialang.org/en/v1/stdlib/Profile/#Triggered-During-Execution
but missing from the module docstring, this should help users who have
little idea how to "send a kernel signal to a process" to get started

---------

Co-authored-by: Ian Butterworth <[email protected]>

* Fix and test an overflow issue in `searchsorted` (#56464)

And remove `searchsorted` special cases for offset arrays in tests that
had the impact of bypassing actually testing `searchsorted` behavior on
offset arrays

To be clear, after this bugfix the function is still broken, just a little bit less so.

* Update docs of calling convention arg in `:foreigncall` AST node (#56417)

* `step(::AbstractUnitRange{Bool})` should return `Bool` (#56405)

The issue was introduced by #27302 , as
```julia
julia> true-false
1
```

By definitions below, `AbstractUnitRange{Bool} <: OrdinalRange{Bool,
Bool}` whose step type is `Bool`.


https://github.com/JuliaLang/julia/blob/da74ef1933b12410b217748e0f7fbcbe52e10d29/base/range.jl#L280-L299

---------

Co-authored-by: Matt Bauman <[email protected]>
Co-authored-by: Matt Bauman <[email protected]>

* fixup! JuliaLang/julia#56028, fix up the type-level escapability check

In JuliaLang/julia#56028, the type-level escapability check was changed
to use `is_mutation_free_argtype`, but this was a mistake because EA no
longer runs for structs like
`mutable struct ForeignBuffer{T}; const ptr::Ptr{T}; end`.
This commit changes it to use `is_identity_free_argtype` instead, which
can be used to detect whether a type may contain any mutable allocations
or not.

* add `show(::IO, ::ArgEscapeInfo)`

* EA: disable finalizer inlining for allocations that are edges of `PhiNode`s (#56455)

The current EA-based finalizer inlining implementation can create
invalid IR when the target object is later aliased as a `PhiNode`, which
was causing #56422.
In such cases, finalizer inlining for the allocations that are edges of
each `PhiNode` should be avoided, and instead, finalizer inlining should
ideally be applied to the `PhiNode` itself, but implementing that is
somewhat complex. As a temporary fix, this commit disables inlining in
those cases.

- fixes #56422

* make `verify_ir` error messages more informative (#56452)

Currently, when `verify_ir` finds an error, the `IRCode` is printed, but
it's not easy to determine which method instance generated that
`IRCode`. This commit adds method instance and code location information
to the error message, making it easier to identify the problematic code.

E.g.:
```julia
[...]
610 │    %95 =   builtin Core.tuple(%48, %94)::Tuple{GMT.Gdal.IGeometry, GMT.Gdal.IGeometry}
    └───       return %95

ERROR: IR verification failed.
  Code location:   ~/julia/packages/GMT/src/gdal_extensions.jl:606
  Method instance: MethodInstance for GMT.Gdal.helper_2geoms(::Matrix{Float64}, ::Matrix{Float64})
Stacktrace:
  [1] error(::String, ::String, ::String, ::Symbol, ::String, ::Int32, ::String, ::String, ::Core.MethodInstance)
    @ Core.Compiler ./error.jl:53
  [...]
```

* [GHA] Explicitly install Julia for whitespace workflow (#56468)

So far we relied on the fact that Julia comes in the default Ubuntu
images on GitHub Actions runners, but this may change in the future
(although there's apparently no plan in this direction for the time
being). To make the workflow more future-proof, we now explicitly
install Julia using a dedicated workflow.

* Allow taking Matrix slices without an extra allocation (#56236)

Since changing Array to use Memory as the backing, we had the option of
making non-Vector arrays more flexible, but had instead preserved the
restriction that they must be zero offset and equal in length to the
Memory. This results in extra complexity, restrictions, and allocations
however, but doesn't gain many known benefits. Nanosoldier shows a
decrease in performance on linear eachindex loops, which we theorize is
due to a minor failure to CSE before SCEV or a lack of NUW/NSW on the
length multiplication calculation.

* [late-gc-lowering] null-out GC frame slots for dead objects (#52935)

Should fix https://github.com/JuliaLang/julia/issues/51818.

MWE:

```julia
function testme()
     X = @noinline rand(1_000_000_00)
     Y = @noinline sum(X)
     X = nothing
     GC.gc()
     return Y
 end
```

Note that it now stores a `NULL` in the GC frame before calling
`jl_gc_collect`.

Before:

```llvm
; Function Signature: testme()
;  @ /Users/dnetto/Personal/test.jl:3 within `testme`
define double @julia_testme_535() #0 {
top:
  %gcframe1 = alloca [3 x ptr], align 16
  call void @llvm.memset.p0.i64(ptr align 16 %gcframe1, i8 0, i64 24, i1 true)
  %pgcstack = call ptr inttoptr (i64 6595051180 to ptr)(i64 262) #10
  store i64 4, ptr %gcframe1, align 16
  %task.gcstack = load ptr, ptr %pgcstack, align 8
  %frame.prev = getelementptr inbounds ptr, ptr %gcframe1, i64 1
  store ptr %task.gcstack, ptr %frame.prev, align 8
  store ptr %gcframe1, ptr %pgcstack, align 8
;  @ /Users/dnetto/Personal/test.jl:4 within `testme`
  %0 = call nonnull ptr @j_rand_539(i64 signext 100000000)
  %gc_slot_addr_0 = getelementptr inbounds ptr, ptr %gcframe1, i64 2
  store ptr %0, ptr %gc_slot_addr_0, align 16
;  @ /Users/dnetto/Personal/test.jl:5 within `testme`
  %1 = call double @j_sum_541(ptr nonnull %0)
;  @ /Users/dnetto/Personal/test.jl:7 within `testme`
; ┌ @ gcutils.jl:132 within `gc` @ gcutils.jl:132
   call void @jlplt_ijl_gc_collect_543_got.jit(i32 1)
   %frame.prev4 = load ptr, ptr %frame.prev, align 8
   store ptr %frame.prev4, ptr %pgcstack, align 8
; └
;  @ /Users/dnetto/Personal/test.jl:8 within `testme`
  ret double %1
}
```

After:

```llvm
; Function Signature: testme()
;  @ /Users/dnetto/Personal/test.jl:3 within `testme`
define double @julia_testme_752() #0 {
top:
  %gcframe1 = alloca [3 x ptr], align 16
  call void @llvm.memset.p0.i64(ptr align 16 %gcframe1, i8 0, i64 24, i1 true)
  %pgcstack = call ptr inttoptr (i64 6595051180 to ptr)(i64 262) #10
  store i64 4, ptr %gcframe1, align 16
  %task.gcstack = load ptr, ptr %pgcstack, align 8
  %frame.prev = getelementptr inbounds ptr, ptr %gcframe1, i64 1
  store ptr %task.gcstack, ptr %frame.prev, align 8
  store ptr %gcframe1, ptr %pgcstack, align 8
;  @ /Users/dnetto/Personal/test.jl:4 within `testme`
  %0 = call nonnull ptr @j_rand_756(i64 signext 100000000)
  %gc_slot_addr_0 = getelementptr inbounds ptr, ptr %gcframe1, i64 2
  store ptr %0, ptr %gc_slot_addr_0, align 16
;  @ /Users/dnetto/Personal/test.jl:5 within `testme`
  %1 = call double @j_sum_758(ptr nonnull %0)
  store ptr null, ptr %gc_slot_addr_0, align 16
;  @ /Users/dnetto/Personal/test.jl:7 within `testme`
; ┌ @ gcutils.jl:132 within `gc` @ gcutils.jl:132
   call void @jlplt_ijl_gc_collect_760_got.jit(i32 1)
   %frame.prev6 = load ptr, ptr %frame.prev, align 8
   store ptr %frame.prev6, ptr %pgcstack, align 8
; └
;  @ /Users/dnetto/Personal/test.jl:8 within `testme`
  ret double %1
}
```

* Added test for resolving array references in exprresolve (#56471)

added test to take care of non-real-index handling while resolving array
references in exprresolve to test julia/base/cartesian.jl - line 427 to
432

* Fix and test searchsorted for arrays whose first index is `typemin(Int)` (#56474)

This fixes the issue reported in
https://github.com/JuliaLang/julia/issues/56457#issuecomment-2457223264
which, combined with #56464 which fixed the issue in the OP, fixes #56457.

`searchsortedfirst` was fine all along, but I added it to tests regardless.

* Move Core.Compiler into Base

This is the first step in what I am hoping will eventually result in making
the compiler itself and upgradable stdlib. Over time, we've gained several
non-Base consumers of `Core.Compiler`, and we've reached a bit of a breaking
point where maintaining those downstream dependencies is getting more difficult
than the close coupling of Core.Compiler to the runtime is worth.

In this first step, I am moving Core.Compiler into Base, ending the duplication
of common data structure and generic functions between Core.Compiler and Base.
This split goes back quite far (although not all the way) to the early days of
Julia and predates the world-age mechanism.

The extant Base and Core.Compiler environments have some differences
(other than the duplication). I think the primary ones are (but I will add
more here if somebody points one out).

- `Core.Compiler` does not use `getproperty`
- `Core.Compiler` does not have extensible `==` equality

In this, I decided to retain the former by setting `getproperty = getfield`
for Core.Compiler itself (though of course not for the datatstructures shared
with Base). I don't think it's strictly necessary, but might as well.

For equality, I decided the easiest thing to do would be to try to merge
the equalities and see what happens. In general, Core.Compiler is relatively
restricted in the kinds of equality comparisons it can make, so I think it'll
work out fine, but we can revisit this.

This seems to be fully working and most of this is just moving code around.
I think most of that refactoring is independently useful, so I'll pull some
of it out into separate PRs to make this PR more manageable.

* Delete buggy `stat(::Integer)` method (#54855)

"Where did someone get a RawFD as an integer anyway?" -@stefankarpinski

See also #51711

Fixes #51710

* missing gc-root store in subtype (#56472)

Fixes #56141
Introduced by #52228 (a624d445c02c)

* further defer jl_insert_backedges after loading (#56447)

Finish fully breaking the dependency between method insertions and
inferring whether the cache is valid. The cache should be inferable in
parallel and in aggregate after all loading is finished. This prepares
us for moving this code into Julia (Core.Compiler) next.

* count bytes allocated through malloc more precisely (#55223)

Should make the accounting for memory allocated through malloc a bit
more accurate.

Should also simplify the accounting code by eliminating the use of
`jl_gc_count_freed` in `jl_genericmemory_to_string`.

* Fix external IO loop thead interaction and add function to Base.Experimental to facilitate it's use. Also add a test. (#55529)

While looking at https://github.com/JuliaLang/julia/issues/55525 I found
that the implementation wasn't working correctly.
I added it to Base.Experimental so people don't need to handroll their
own and am also testing a version of what the issue was hitting.

* [REPL] raise default implicit `show` limit to 1MiB (#56297)

https://github.com/JuliaLang/julia/pull/53959#issuecomment-2426946640

I would like to understand more where these issues are coming from; it
would be easy to exempt some types from Base or Core with
```julia
REPL.show_limited(io::IO, mime::MIME, x::SomeType) = show(io, mime, x)
```
but I'm not sure which are causing problems in practice.

But meanwhile I think raising the limit makes sense.

* Add a docstring for `Base.divgcd` (#53769)

Co-authored-by: Sukera <[email protected]>

* Fix compilation warning on aarch64-linux (#56480)

This fixes the warning:
```
/cache/build/default-aws-aarch64-ci-1-3/julialang/julia-master/src/stackwalk.c: In function 'jl_simulate_longjmp':
/cache/build/default-aws-aarch64-ci-1-3/julialang/julia-master/src/stackwalk.c:995:22: warning: initialization of 'mcontext_t *' {aka 'struct sigcontext *'} from incompatible pointer type 'struct unw_sigcontext *' [-Wincompatible-pointer-types]
  995 |     mcontext_t *mc = &c->uc_mcontext;
      |                      ^
```

This is the last remaining warning during compilation on aarch64-linux.

* Make Compiler an independent package

This is a further extension to #56128 to make the compiler into a proper
independent, useable outside of `Base` as `using Compiler` in the same way
that `JuliaSyntax` works already. InteractiveUtils gains a new `@activate`
macro that can be used to activate an outside Compiler package, either for
reflection only or for codegen also.

* Make heap size hint available as an env variable (#55631)

This makes `JULIA_HEAP_SIZE_HINT` the environment variable version of
the `--heap-size-hint` command-line flag. Seems like there was interest
in
https://github.com/JuliaLang/julia/pull/45369#issuecomment-1544204022.

The same syntax is used as for the command-line version with, for
example, `2G` => 2 GB and `200M` => 200 MB.

@oscardssmith want to take a look?

* Allow indexing `UniformScaling` with `CartesianIndex{2}` (#56461)

Since indexing with two `Integer`s is defined, we might as well define
indexing with a `CartesianIndex`. This makes certain loops convenient
where the index is obtained using `eachindex`.

* Simplify first index in `FastContiguousSubArray` definition (#56491)

Since `Slice <: AbstractUnitRange` and `Union{Slice, AbstractUnitRange}
== AbstractUnitRange`, we may simplify the first index.

* Make `popat!` support `@inbounds` (#56323)

Co-authored-by: Jishnu Bhattacharya <[email protected]>

* NEWS.md: clarify `--trim` (#56460)

Co-authored-by: Matt Bauman <[email protected]>

* Remove aggressive constprop annotation from 2x2 and 3x3 matmul (#56453)

Removing these annotations reduces ttfx slightly.
```julia
julia> using LinearAlgebra

julia> A = rand(2,2);

julia> @time mul!(similar(A), A, A, 1, 2);
  0.296096 seconds (903.49 k allocations: 44.313 MiB, 4.25% gc time, 99.98% compilation time) # nightly
  0.286009 seconds (835.88 k allocations: 40.732 MiB, 3.29% gc time, 99.98% compilation time) # this PR
```

* `sincos` for non-float symmetric matrices (#56484)

Ensures that the `eltype` of the array to which the result of `sincos`
is a floating-point one, even if the argument doesn't have a
floating-point `eltype`.

After this, the following works:
```julia
julia> A = diagm(0=>1:3)
3×3 Matrix{Int64}:
 1  0  0
 0  2  0
 0  0  3

julia> sincos(A)
([0.8414709848078965 0.0 0.0; 0.0 0.9092974268256817 0.0; 0.0 0.0 0.1411200080598672], [0.5403023058681398 0.0 0.0; 0.0 -0.4161468365471424 0.0; 0.0 0.0 -0.9899924966004454])
```

* Specialize 2-arg `show` for `LinearIndices` (#56482)

After this,
```julia
julia> l = LinearIndices((1:3, 1:4));

julia> show(l)
LinearIndices((1:3, 1:4))
```
The printed form is a valid constructor.

* Avoid constprop in `syevd!` and `syev!` (#56442)

This improves compilation times slightly:
```julia
julia> using LinearAlgebra

julia> A = rand(2,2);

julia> @time eigen!(Hermitian(A));
  0.163380 seconds (180.51 k allocations: 8.760 MiB, 99.88% compilation time) # master
  0.155285 seconds (163.77 k allocations: 7.971 MiB, 99.87% compilation time) # This PR
```
The idea is that the constant propagation is only required to infer the
return type, and isn't necessary in the body of the method. We may
therefore annotate the body with a `@constprop :none`.

* make: define `basecompiler.ji` target (#56498)

For easier experimentation with just the bootstrap process.

Additionally, as a follow-up to JuliaLang/julia#56409, this commit also
includes some minor cosmetic changes.

* speed up bootstrapping by compiling few optimizer subroutines earlier (#56501)

Speeds up the bootstrapping process by about 30 seconds.

* remove top-level branches checking for Base (#56507)

These are no longer needed, now that the files are no longer included
twice.

* Undo the decision to publish incomplete types to the binding table (#56497)

This effectively reverts #36121 and replaces it with #36111, which was
the originally proposed alternative to fix #36104. To recap, the
question is what should happen for
```
module Foo
    struct F
        v::Foo.F
    end
end
```
i.e. where the type reference tries to refer to the newly defined type
via its global path. In #36121 we adjusted things so that we first
assign the type to its global binding and then evaluate the field type
(leaving the type in an incomplete state in the meantime). The primary
reason that this choice was that we would have to deal with incomplete
types assigned to global bindings anyway if we ever did #32658. However,
I think this was the wrong choice. There is a difference between
allowing incomplete types and semantically forcing incomplete types to
be globally observable every time a new type is defined.

The situation was a little different four years ago, but with more
extensive threading (which can observe the incompletely constructed
type) and the upcoming completion of bindings partition, the situation
is different. For bindings partition in particular, this would require
two invalidations on re-definition, one to the new incomplete type and
then back to the complete type. I don't think this is worth it, for the
(somewhat niche and possibly-should-be- deprecated-future) case of
refering to incompletely defined types by their global names.

So let's instead try the hack in #36111, which does a frontend rewrite
of the global path. This should be sufficient to at least address the
obvious cases.

* Merge identical methods for Symmetric/Hermitian and SymTridiagonal (#56434)

Since the methods do identical things, we may define each method once
for a union of types instead of defining methods for each type.

* Specialize findlast for integer AbstractUnitRanges and StepRanges (#54902)

For monotonic ranges, `findfirst` and `findlast` with `==(val)` as the
predicate should be identical, as each value appears only once in the
range. Since `findfirst` is specialized for some ranges, we may define
`findlast` as well analogously.

On v"1.12.0-DEV.770"
```julia
julia> @btime findlast(==(1), $(Ref(1:1_000))[])
  1.186 μs (0 allocations: 0 bytes)
1
```
This PR
```julia
julia> @btime findlast(==(1), $(Ref(1:1_000))[])
  3.171 ns (0 allocations: 0 bytes)
1
```

I've also specialized `findfirst(iszero, r::AbstractRange)` to make this
be equivalent to `findfirst(==(0), ::AbstractRange)` for numerical
ranges. Similarly, for `isone`. These now take the fast path as well.

Thirdly, I've added some `convert` calls to address issues like
```julia
julia> r = Int128(1):Int128(1):Int128(4);

julia> findfirst(==(Int128(2)), r) |> typeof
Int128

julia> keytype(r)
Int64
```
This PR ensures that the return type always corresponds to `keytype`,
which is what the docstring promises.

This PR also fixes
```julia
julia> findfirst(==(0), UnitRange(-0.5, 0.5))
ERROR: InexactError: Int64(0.5)
Stacktrace:
 [1] Int64
   @ ./float.jl:994 [inlined]
 [2] findfirst(p::Base.Fix2{typeof(==), Int64}, r::UnitRange{Float64})
   @ Base ./array.jl:2397
 [3] top-level scope
   @ REPL[1]:1
```
which now returns `nothing`, as expected.

* Loop over `Iterators.rest` in `_foldl_impl` (#56492)

For reasons that I don't understand, this improves performance in
`mapreduce` in the following example:
```julia
julia> function g(A)
           for col in axes(A,2)
               mapreduce(iszero, &, view(A, UnitRange(axes(A,1)), col), init=true) || return false
           end
           return true
       end
g (generic function with 2 methods)

julia> A = zeros(2, 10000);

julia> @btime g($A);
  28.021 μs (0 allocations: 0 bytes) # nightly v"1.12.0-DEV.1571"
  12.462 μs (0 allocations: 0 bytes) # this PR

julia> A = zeros(1000,1000);

julia> @btime g($A);
  372.080 μs (0 allocations: 0 bytes) # nightly
  321.753 μs (0 allocations: 0 bytes) # this PR
```
It would be good to understand what the underlying issue is, as the two
seem equivalent to me. Perhaps this form makes it clear that it's not,
in fact, an infinite loop?

* better error message for rpad/lpad with zero-width padding (#56488)

Closes #45339 — throw a more informative `ArgumentError` message from
`rpad` and `lpad` if a zero-`textwidth` padding is passed (not a
`DivideError`).

If the padding character has `ncodeunits == 1`, suggests that maybe they
want `str * pad^max(0, npad - ncodeunits(str))` instead.

* Safer indexing in dense linalg methods (#56451)

Ensure that `eachindex` is used consistently alongside `@inbounds`, and
use `diagind` to obtain indices along a diagonal.

* The `info` in LAPACK calls should be a Ref instead of a Ptr (#56511)

Co-authored-by: Viral B. Shah <[email protected]>

* Scaling loop instead of broadcasting in strided matrix exp (#56463)

Firstly, this is easier to read. Secondly, this merges the two loops
into one. Thirdly, this avoids the broadcasting latency.
```julia
julia> using LinearAlgebra

julia> A = rand(2,2);

julia> @time LinearAlgebra.exp!(A);
  0.952597 seconds (2.35 M allocations: 116.574 MiB, 2.67% gc time, 99.01% compilation time) # master
  0.877404 seconds (2.17 M allocations: 106.293 MiB, 2.65% gc time, 99.99% compilation time) # this PR
```
The performance also improves as there are fewer allocations in the
first branch (`opnorm(A, 1) <= 2.1`):
```julia
julia> B = diagm(0=>im.*(float.(1:200))./200, 1=>(1:199)./400, -1=>(1:199)./400);

julia> opnorm(B,1)
1.9875

julia> @btime exp($B);
  5.066 ms (30 allocations: 4.89 MiB) # nightly v"1.12.0-DEV.1581"
  4.926 ms (27 allocations: 4.28 MiB) # this PR
```

* codegen: Respect binding partition (#56494)

Minor changes to make codegen correct in the face of partitioned
constant bindings. Does not yet handle the envisioned semantics for
globals that change restriction type, which will require a fair bit of
additional work.

* Profile: fix Compiler short path (#56515)

* Check `isdiag` in dense trig functions (#56483)

This improves performance for dense diagonal matrices, as we may apply
the function only to the diagonal elements.
```julia
julia> A = diagm(0=>rand(100));

julia> @btime cos($A);
  349.211 μs (22 allocations: 401.58 KiB) # nightly v"1.12.0-DEV.1571"
  16.215 μs (7 allocations: 80.02 KiB) # this PR
```

---------

Co-authored-by: Daniel Karrasch <[email protected]>

* Profile: add helper method for printing profile report to file (#56505)

The IOContext part is isn't obvious, because otherwise the IO is assumed
to be 80 chars wide, which makes for bad reports.

* Change in-place exp to out-of-place in matrix trig functions (#56242)

This makes the functions work for arbitrary matrix types that support
`exp`, but not necessarily the in-place `exp!`. For example, the
following works after this:
```julia
julia> m = SMatrix{2,2}(1:4);

julia> cos(m)
2×2 SMatrix{2, 2, Float64, 4} with indices SOneTo(2)×SOneTo(2):
  0.855423  -0.166315
 -0.110876   0.689109
```
There's a slight performance improvement as well because we don't
compute `im*A` and `-im*A` separately, but we negate the first to obtain
the second.
```julia
julia> A = rand(ComplexF64,100,100);

julia> @btime sin($A);
  2.796 ms (48 allocations: 1.84 MiB) # nightly v"1.12.0-DEV.1571"
  2.304 ms (48 allocations: 1.84 MiB) # this PR
```

* Test: Don't change scope kind in `test_{warn,nowarn}` (#56524)

This was part of #56509, but is an independent bugfix. The basic issue
is that these macro were using `do` block internally. This is
undesirable for test macros, because we would like them not to affect
the behavior of what they're testing. E.g. right now:
```
julia> using Test

julia> const x = 1
1

julia> @test_nowarn const x = 1
ERROR: syntax: `global const` declaration not allowed inside function around /home/keno/julia/usr/share/julia/stdlib/v1.12/Test/src/Test.jl:927
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1
```

This PR just writes out the try/finally manually, so the above works
fine after this PR.

* For loop instead of while in generic `copyto!` (#56517)

This appears to improve performance.
```julia
j…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:lowering Syntax lowering (compiler front end, 2nd stage) performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

type inference problem with captured type in closures
10 participants