Skip to content

Conversation

@withinboredom
Copy link
Member

Inspired by the discussion in #1080, this is an ultra-risky implementation that should fix #915. Basically, Go uses RWLocks to prevent multiple threads from writing to the environment, while PHP uses TSRM. TSRM and Go don't know about each other and if one happens to be reading the environment while the other is writing to it, a segfault is bound to happen.

So... this PR replaces the built-in getenv and putenv with ones that defer to Go. See #1080 discussions for why this may not be the best solution, but maybe it is a 'good enough' solution to prevent segfaults from concurrent environmental accesses in Go and PHP. If we are happy with this approach, this will need many tests to ensure the behavior is exactly the same as the built-in ones.

For example, a missing feature is setting the TZ environment variable which should trigger a reset of the current timezone... however, that would bleed into other threads currently running, so it isn't ideal. In fact, it might be good that it doesn't work.

@withinboredom withinboredom requested a review from dunglas October 11, 2024 18:16
@withinboredom
Copy link
Member Author

In theory, we could call the go-side to fill the track_vars array. However, I believe this would end up conflicting with #1080 and in any case, would be an optimization for after that is merged.

@withinboredom
Copy link
Member Author

withinboredom commented Oct 12, 2024

huh. I guess I will have to do that since ASAN found go/php stepping on each other.

Copy link
Member

@dunglas dunglas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also rebase please?

This seems almost ready to merge.

@AlliBalliBaba
Copy link
Contributor

I think I realized, why $_ENV sometimes behaves weird #1061 . We can fix that in one swoop or a separate PR after this one.

Basically:

  • PHP has the following superglobals: _GET _POST _COOKIE _SERVER _ENV _REQUEST and _FILES.

  • _SERVER, _ENV and _REQUEST are special in that they have a 'jit' functionality, where they only get loaded when they are encountered on script compilation (this can be disabled via auto_globals_jit = Off)

  • Since in worker mode we do not recompile scripts, these 3 globals are never reloaded.

FrankenPHP does actually reload $_SERVER here. The other 2 are destroyed here, but their reference as a global variable in the EG(symbol_table) remains.

We should probably not destroy $_ENV and reset it in the EG(symbol_table) after each request.

We could also fix $_REQUEST as it's currently always empty in worker mode with auto_globals_jit = On. That will come at a performance penalty though (we can also just document the behaviour).

@withinboredom
Copy link
Member Author

Yes @AlliBalliBaba, I think after this PR, we are in a place were we can implement the env in a php-fpm compatible way (if it isn't already). We should do some testing on fpm + frankenphp and identify any differences between them. Your PR gets us most of the way there, but this one just unifies Go/PHP environment access. I think there is still more to be done that is out of this PR's scope.

@withinboredom
Copy link
Member Author

In other words, after this PR, the environment is entirely managed in go (whether it actually gets the environment or from a map), so we can do whatever we need to. There might be a slight performance penalty, due to the cgo call, but getting it right in a higher level language will probably be easier before we optimize back to c (if we even need to).

@withinboredom
Copy link
Member Author

Not actually sure how to test $_ENV due to test pollution (previous tests causing $_ENV to be populated when we reach the test where we want to test $_ENV), so I've removed that part of the test.

frankenphp.c Outdated
}

PHPAPI void get_full_env(zval *track_vars_array) {
struct go_getfullenv_return full_env = go_getfullenv(thread_index);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can probably cache the environment somewhere here as a thread local (or even global) zend_array, so it will only get imported once. I'll check later this week again, but I think that's how FPM also behaves with since putenv doesn't alter $_ENV or $_SERVER

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably do it on the go side and not mix source of truths, but yeah.

Copy link
Contributor

@AlliBalliBaba AlliBalliBaba Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah even in the manual someone mentions how putenv doesn't affect $_ENV. Since $_ENV is unchanging for the duration of the process I think it would be most efficient to directly store it as a zend_array at the start of the process. I can also do that in a follow-up PR if you want.

putenv and getenv should always access the real environment though like in your implementation 👍

Copy link
Member Author

@withinboredom withinboredom Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is working as it is supposed to. This only gets called in the following scenarios:

  1. on each request in cgi-mode when $_SERVER is filled
  2. at worker startup in worker mode to fill $_SERVER
  3. getenv() without any parameters
  4. jit-ENV when $_ENV is first accessed

There shouldn't be any reason to cache this.

@withinboredom withinboredom requested a review from dunglas October 18, 2024 10:49
@withinboredom
Copy link
Member Author

This should be good to merge!

@withinboredom withinboredom merged commit e812473 into main Oct 18, 2024
@withinboredom withinboredom deleted the go-env branch October 18, 2024 11:47
@ghost ghost mentioned this pull request Nov 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segmentation violation when running parallel E2E tests

4 participants