-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce new "Dbgi" chunk #1367
Conversation
The discussion has been very quiet so far... We think that this idea should be discussed on erlang-questions. We think that there are probably many readers of erlang-questions that don't look at pull requests. Could you post on erlang-question to get the discussion started? |
Done! |
We have discussed this PR in the OTP team. We are cautiously positive that this is the right direction to go. We are now ready for the next step with more of the documentation and implementation details filled in. |
Thank you! I have pinged @kostis for more feedback as well. I am glad to proceed meanwhile. |
I am not sure I have more feedback to give at this point. FWIW, I also think that this proposal is in the right direction, if not already good enough to be considered for inclusion into 20 already. |
Wouldn't there be problems going backwards with code in a sensible fashion? I mean if Core erlang is stored and from that getting say either elixir of LFE would be possible but the resultant code would be strange to say the least. |
@josevalim If the Erlang abstract format was stored as (from description above):
Then the If |
@rvirding I don't think we can "go back" in format. For example, if the Erlang compiler stores:
The However, the proposal does allow LFE to store its own AST in BEAM and compile it down to any format you can, such as Core AST or even Erlang AST. If you want to store Core AST and transform that back into LFE, that's also a choice you can take in your backend. But, as you said, it will be rather weird. @okeuday just to clarify, the Metadata is completely opaque, except for the backend itself. The only way to get Erlang Abstract Code is by asking the backend. That said, I don't think we need to add the version to Metadata because it can be handled internally in many different ways. For example, imagine we start store it as a list. Now we want to slightly change it, we can transform it to a tuple Our preference is always to have the abstract code in metadata though since loading it from source is a fallback option used only by certain tools. Changes in the abstract format will end-up being reflected on the backend API. Today we call
The point of this change is to not have to store multiple ASTs. So I don't think we should store the abstract code alongside the new chunk. The beam_lib high-level API will remain backwards compatible. |
@josevalim There is a potential in this situation for breakage to occur due to the versions involved, where version A of the opaque Metadata is stored with the Backend module name with the version of the Backend module changing based on when the Backend module is executed (the Backend module version can be called version B). If the abstract code is not stored and we call its format version C we would need to depend on Backend module version B to support the abstract code format version C which may not be a priority for that Backend. That would then potentially cause problems with anything that attempts to depend on using the abstract code from non-Erlang BEAM modules. So your proposal should only work if the Backend module implements everything perfectly though there is no enforcement of the Backend module supporting any particular format because it exists outside of Erlang/OTP. I can understand the desire to avoid storing multiple ASTs though this appears like it should lead to format inconsistencies as versions of these formats change over time. |
@okeuday sorry but I can't quite understand the scenario you are referring to. Can you give a more detailed example, explaining why we would have so many versions, how they are introduced and how they would conflict?
That's exactly how LFE/Elixir works today. We need to implement everything perfectly otherwise the compiler won't accept the Core Erlang or the Erlang Abstract Format that we feed into it. And sometimes it requires us checking the Erlang/OTP version we are are compiling to. But as a rule: if we were able to emit a .beam file for that Erlang/OTP version, we are able to emit a valid Core AST or valid Erlang AST. So this requirement is already in place and this proposal does not change it. It is also worth pointing out that any new OTP version can introduce elements to the AST in a way that a module or an AST generated on Erlang/OTP 21 does not compile on Erlang/OTP 20. That's regardless of the language of choice.
I fail to see how those format inconsistencies would arise since:
Could you please clarify? |
@josevalim The concrete situation I am thinking of is where Elixir uses the "Dbgi" chunk to store the opaque Metadata the Backend module understands though the Erlang Abstract Format doesn't exist in the BEAM output. I find it helpful to use the script https://github.com/okeuday/reltool_util/blob/master/ex2erl (I have discussed it with you briefly in the past) which converts Elixir source code to Erlang source code while using the Erlang Abstract Format to do so. It seems like this process can break if the Backend module does not support the Erlang Abstract Format at the current version that is required. My concern is due to being unsure if Elixir really requires the Erlang Abstract Format in the future after this change. The |
If you compile it with Elixir v1.5 and you use Elixir 1.6 in order to compute the AST, we should remain compatible, otherwise that's a backwards incompatible change at the AST level, a bug. However, if you compile it with Elixir v1.6 and you use the Elixir v1.5 backend, it is not guaranteed to work, in the same way a .beam you defined in Erlang 20 may not work on Erlang 19. But that is true even with the Erlang Abstract Format, since Elixir v1.6 may use features introduced in later OTP versions, not used by Elixir v1.5, such as the new AST annotations, and your tool may not be ready to handle those changes. And in all cases, the Elixir compiler should be able to emit the AST for the Erlang versions it supports, otherwise it means the Elixir compiler is broken. Before and after this proposal, the conditions that will lead your code to break is if if you use a more recent Elixir version than the one you compiled with. The Elixir compiler is always guaranteed to emit AST that works from a minimum OTP version. If your tool works on a more recent version than that, you should be golden, otherwise the tool needs to be updated. |
@josevalim The question that still seems unclear is how the Backend module will always support the Erlang Abstract Format, even after potential changes to the Erlang Abstract Format that change its version? Is there a direct conversion from the Core Erlang AST that any Backend module may use to get the Erlang Abstract Format (using source code already in Erlang/OTP that makes this support trivial), or is this a potential problem in the future? |
@okeuday if you assume Erlang will break previous versions, then it will all fall apart. Then it should no longer be called Erlang Abstract Format, but rather Erlang Abstract Format v2 or something else. For example, instead of asking If that is the case, maybe Erlang will want to provide an API that translates erlang_v2 to v1 and so on, to make it easier for backends to support multiple formats. |
@josevalim If we ignore the problem of the Erlang Abstract Format changing in the future and requiring a version to identify the change, how would the Backend module convert Core Erlang to the Erlang Abstract Format now? Does source code currently exist in Erlang/OTP to convert Core Erlang to the Erlang Abstract Format? I do not see where it might be, though I know the cerl modules in the compiler application lack documentation. |
@okeuday why would the backend convert from Core to the Erlang Abstract Format? I don't think that's possible. The backend should store a high level format and convert down as necessary. If you store a low-level format, the backend will be limited on what it provides. |
@josevalim Ok, so then the opaque Metadata is specific to the non-Erlang programming language and each conversion from the high-level format stored in the opaque Metadata to the Erlang Abstract Format is unique. That means it is a maintenance burden if the Erlang Abstract Format changes in the future for all non-Erlang programming languages and that should also mean that non-Erlang programming languages may have no motivation to provide a conversion to the Erlang Abstract Format. If this is true, that means this change helps to discard information (the Erlang Abstract Format) for a small file size reduction while being unable to ensure the same information can be obtained in the future. If this is correct, I think this is a dangerous change. |
This is true today. If Erlang Abstract Format changes, it is a maintenance burden for all languages that target the Erlang Abstract Format, including Elixir (as well as parse transforms).
Sorry but I don't understand this logic. If a language has no motivation to convert to Erlang Abstract Format, then why should it? Languages will likely stick with the Erlang Abstract Format to be able to leverage the existing tooling. Languages will likely target Core instead of the Erlang Abstract Format if they need more flexibility and they don't want to be constrained by Erlang decisions. Elixir will continue to emit the Erlang Abstract Format, because we find it is important, but there is nothing blocking us today, before or after this pull request, to stop doing that. If there is a feature we believe is essential to the language and we need to compile to Core to implement it, then we will no longer support the Erlang Abstract Format regardless of this pull request. It seems you are worried that some languages will stop supporting the Erlang Abstract Format but it is not your decision to take (nor mine). That's a design decision that should be left to the language. And that already happens today in LFE and Alpaca and we should support them the best we can. If a language can emit the Erlang Abstract Format, then it obviously will as there are clear benefits. But if a language cannot, then it already does not do so today, and we should still support such languages. |
@josevalim Ok, then this change is neutral and doesn't impact the ability to get the Erlang Abstract Format which makes me happy. I regret creating noise about this but wanted to make sure information is not being lost. Thanks! |
Yes, I believe languages will choose to emit Erlang Abstract Format or Core for design reasons, regardless of this pull request. There are already big trade-offs in this decision and I don't think this PR will tip the scales. LFE compiles to core even if it means no access to debugger, cover, etc. With this PR, we can at least improve compatibility with some of the tooling.
No worries! @bjorng wanted a busy discussion, now we have it! :D |
6dcb09d
to
3ffa8d6
Compare
I have pushed a working implementation of the new chunk.
At the moment, this PR allows any compiler that is able to emit Erlang Abstract Code to store a higher level AST and emit the Erlang Abstract Code when necessary. A test that shows a module with custom backend being compiled and having the abstract_code retrieved can be found here: https://github.com/josevalim/otp/blob/3ffa8d6ec/lib/compiler/test/compile_SUITE.erl#L604-L622 Notice I have decided to version the abstract code formats, so the backend is asked for In the next days, I want to start changing parts of OTP to rely on the new chunk, removing code duplication and allowing languages such as LFE and Alpaca to better integrate with the tooling. |
@bjorng I have just pushed a fix. |
d580be0
to
db78709
Compare
lib/compiler/src/compile.erl
Outdated
%% Do not affect compilation result on future calls. | ||
keep_compile_option({outdir, _}) -> false; | ||
keep_compile_option(warnings_as_errors) -> false; | ||
keep_compile_option(Option) -> effects_code_generation(Option). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bjorng should I add outdir
and warnings_as_errors
to effects_code_generation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. I think that is a good idea. Please do it in separate commit before your main commit so that it will be easier to notice that change if it ever turns out to be a problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did it as a commit after the current changes just in case someone reverts it. This way, reverting it will also add the two clauses above back (and won't break dialyzer).
That only solves one of the problems. Note that
You must also remove any (By the way, while you are at it, you can remove the |
lib/dialyzer/src/dialyzer_plt.erl
Outdated
{ok, Abs} -> | ||
erlang:md5(term_to_binary(Abs)) | ||
end | ||
{ok, {_, MD5}} = beam_lib:md5(File), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not seem right.
The new code computes the MD5 from the Beam code in the file. If a type or spec has been changed in the source code and the file has been recompiled dialyzer will think that nothing has changed. It seems that you should retrieve the Core Erlang code and compute the MD5 from that.
@uabboli Any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, the Dbgi chunk is not considered on md5. I will fix it accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed to retrieve Core Erlang as proposed.
lib/stdlib/doc/src/beam_lib.xml
Outdated
<c>Backend:debug_info/4</c> must return <c>{ok, Code}</c> or | ||
<c>{error, Term}</c>.</p> | ||
|
||
<p>Developers must always invoked the <c>debug_info/4</c> function |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"invoked" => "invoke"
lib/stdlib/doc/src/beam_lib.xml
Outdated
<c>{error, Term}</c>.</p> | ||
|
||
<p>Developers must always invoked the <c>debug_info/4</c> function | ||
and never rely on the <c>Data</c> stored in the <c>debug_info</c> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The word "chunk" is missing at the end of the line.
db78709
to
0f78f26
Compare
I have addressed all feedback and force pushed. |
You can't use
Also, I came to think of that also is a I have pushed a fixup commit. Please review it and rebase. |
The new Dbgi chunk returns data in the following format: {debug_info_v1, Backend, Data} This allows compilers to store the debug info in different formats. In order to retrieve a particular format, for instance, Erlang Abstract Format, one may invoke: Backend:debug_info(erlang_v1, Module, Data, Opts) Besides introducing the chunk above, this commit also: * Changes beam_lib:chunk(Beam, [:abstract_code]) to read from the new Dbgi chunk while keeping backwards compatibility with old .beams * Adds the {debug_info, {Backend, Data}} option to compile:file/2 and friends that are stored in the Dbgi chunk. This allows the debug info encryption mechanism to work across compilers * Improves dialyzer to work directly on Core Erlang, allowing languages that do not have the Erlang Abstract Format to be dialyzer as long as they emit the new chunk and their backend implementation is available Backwards compatibility is kept across the board except for those calling beam_lib:chunk(Beam, ["Abst"]), as the old chunk is no longer available. Note however the "Abst" chunk has always been optional. Future OTP versions may remove parsing the "Abst" chunk altogether from beam_lib once Erlang 19 and earlier is no longer supported. The current Dialyzer implementation still supports earlier .beam files and such may also be removed in future versions.
By moving to effects_code_generation/1, there is no need to explicitly remove those options when storing compile information in the DebugInfo chunk.
7bbd4eb
to
9dfb4ed
Compare
@bjorng that answers why the compiler was not using |
Will test in our daily builds (with updated primary bootstrap). |
Seems to work fine in our daily builds now. I will merge this PR later today, since our deadline for inclusion in the first release candidate is rapidly approaching. OK? |
@bjorng sounds perfect. Please ping me if anything new comes up. Also, once this is merged, I will send another PR to be merged after OTP 20 is out, that cleans up the dialyzer code (but removes support for older beams). |
This commit removes the old code paths that attempted to translate Erlang Abstract Format to Core and relies exclusively on the new debug_info chunk. This is a follow up to erlang#1367.
This commit removes the old code branches that attempted to translate Erlang Abstract Format to Core and relies exclusively on the new debug_info chunk. This is a follow up to erlang#1367.
This commit removes the old code branches that attempted to translate Erlang Abstract Format to Core and relies exclusively on the new debug_info chunk. This is a follow up to #1367.
Many tools in the Erlang ecosystem expect the Erlang Abstract Code chunk to exist or, if it doesn't exist, it automatically generates one from the source code, if it can find the source. This restricts the usage of the tooling for some languages and leads to code duplication in different tools.
To solve this issue for languages that compile directly to Core, such as LFE, I have earlier proposed a chunk that stores Core AST. However, even if we add such chunk, I could see the following problems:
Storing the Core AST chunk still does not provide a feature where chunks are calculated on the fly from the source if the chunk is not present
Adding a new chunk could potentially make the situation worse because tools in the future may work directly on those new chunks, forcing compilers to add both Erlang Abstract Format and Code AST chunks to the .beam file. In fact, if we go in this direction, I could see we ending-up in a situation where Elixir has to store Elixir, Erlang and Core ASTs on .beam.
Therefore, I would like the ability to store abstract code on beam such that:
Therefore I propose us to introduce a new chunk which allows any abstract code to be stored but also includes a backend module that will be used for performing conversions or source lookup.
Proposal: The "Dbgi" chunk
I propose to introduce a new chunk, called "Dbgi", with the following format:
Once the chunk is fetched, we can retrieve the actual debug information as follows:
Where
Format
is the desired format (erlang, core, elixir, lfe, etc),Module
is the module we are fetching the debug info for andOpts
may include[allow_source_lookup]
, which would instruct the backend to lookup the debug information from source if it is not available.As an example, an implementation for the Erlang abstract format would return something like:
Where
erl_abstract_code
could be defined as:As you can see, this approach provides many benefits:
The logic that retrieves, converts and potentially fetches the abstract code from source is encapsulated in a single module instead of scattered around different projects like dbg, dialyzer, cover, etc
It allows the
debug_info
to be stored in any format desired by a given compiler. This way LFE can store LFE AST and translate to core accordingly, Elixir can store Elixir AST and convert to either core or erlang, etc.Finally, by allowing
{debug_info, {Backend, Metadata}}
to be given tocompile:forms/2
, we can allow all language compilers to leverage the current infra-structure for debug_info and debug_info_key instead of rolling their ownThis approach is also backwards compatible.
beam_lib:chunks(Module, [abstract_code])
can look into "Dbgi" and invoke the proper backend conversion in case the "Abst" chunk is no longer available. If a tool is attempting to access "Abst" chunk directly, the fact the chunk no longer exists should not be an issue as the chunk is optional.The PR includes the changes to the
compile
module. If the general direction is agreed on, I will improve the docs, add tests and do the relevant changes tobeam_lib
and change some of the tooling to prove the proposal is sound. Later we will submit pull requests to update the remaining of the tooling as necessary (updating the rest of the tooling can be done in another step as this PR aims to be backwards compatible)./cc @j14159 @rvirding