Skip to content
This repository has been archived by the owner on Aug 23, 2021. It is now read-only.

Include cloud-init-vmware-guestinfo in core cloud-init package #2

Closed
MnrGreg opened this issue Jan 28, 2019 · 45 comments
Closed

Include cloud-init-vmware-guestinfo in core cloud-init package #2

MnrGreg opened this issue Jan 28, 2019 · 45 comments

Comments

@MnrGreg
Copy link

MnrGreg commented Jan 28, 2019

Hi @akutz,

This module works really well. Have you considered submitting a PR for it to be added to the core cloud-init package? https://git.launchpad.net/cloud-init

I'm sure more people would use it if they were aware of it. Many could benefit.

@akutz
Copy link
Contributor

akutz commented Jan 28, 2019

Hi Greg,

That’s always been my intent, but I just haven’t had the time. Is the process simply to file a ticket via the URL you provided?

@FireBall1725
Copy link

I also agree with @MnrGreg, this is something that should be in cloud-init, I have found this pretty useful as well.

@powersj
Copy link

powersj commented Feb 27, 2020

Hi! Upstream cloud-init here :)

We have received a number of requests to get this datasource added to cloud-init and would be happy to see that done.

Since this bug was filed, we moved upstream cloud-init to our Github project. In order to get this added to upstream, we would ask you to submit a merge request and sign the Canonical CLA. Further details are in our hacking document.

If you have quesions feel free to drop in out #cloud-init on Freenode or join our mailing list.

I would also be interested in reaching out directly to VMware for some additional assistance with VMware and cloud-init compatibility. How to use cloud-init with the larger variety of VMware products is a very common question and getting both sides on the same page would be great!

Thanks!

@akutz
Copy link
Contributor

akutz commented Feb 27, 2020

Hi @powersj,

Thank you so much for reaching out! I am flattered that so many people have an interest in what started as a personal project. I am akutz at vmware if you would like to e-mail me directly. As I said above, it has always been my intent to add this datasource to cloud-init proper, but it is also the case that it being added after the fact enables greater flexibility and more frequent releases. That said, I am lazy and do not do releases that often :)

@karstenjakobsen
Copy link

This would be absolutely amazing.

@janitha
Copy link

janitha commented Jun 22, 2020

I'm planning on spending some cycles on this next quarter, will link to the cloud-init PR once I have it ready (~Aug)

Some initial changes in progress was removing dependency on netifaces and deepmerge and using utils in cloud-init itself. Also, I haven't wrapped my head fully around the networking related bits this plugin does and seemed like there was some overlap with cloud-init's own (probably lack of my understanding on both sides).

@powersj
Copy link

powersj commented Jun 22, 2020

@janitha awesome, please do feel free to stop by #cloud-init on Freenode and ask questions.

@akutz
Copy link
Contributor

akutz commented Jul 6, 2020

Hi @janitha,

Removing the netifaces dependency will be difficult. Let's set up some time to discuss this. There is a very good reason it was added in the first place.

@weikinhuang
Copy link

Has there been any movement on this feature?

@skyscooby
Copy link

@akutz Is this ever going to happen? What can we do to push this along?

This module just broke on our machines with the latest Ubuntu 18.04 update.. I don't have time to figure out why so I disabled cloud-init and static configured the network on the important boxes. Hopefully I'll get some cycles to dig deeper on the failure sometime in the next month... Can we please get this moved into the community cloud-init project where it will be able to be maintained and tested by the greater community?

It's a very useful data bridge between Vsphere and VM's but when I reboot my machines and the datasource fails completely it creates complete downtime for the box until someone goes in on the console and manually fixes the network. It really doesn't do well to the 'VMware' brand to have this here in the corporate repository as a pet project either..

@akutz
Copy link
Contributor

akutz commented Sep 19, 2020

Hi @skyscooby,

I understand your concerns, and I sympathize with you. I am not on this project full time -- it's something I created when I needed it, but I should certainly be doing more to try and keep it up to date. If someone can run with it, then I'd be more than happy. The primary impediments are the two external dependencies, and there is no good way around the netifaces one.

I was speaking with someone the other day -- what I should do is get CI set up to automatically run this against multiple versions of multiple distros, but again, it's a matter of time that I do not currently have :(

@skyscooby
Copy link

@akutz Much appreciate the honesty. I see a large part of the breakages happening due to changes in the upstream project being done out of band.. if this module was sitting in their tree I would assume it would be kept up to date when they change the class interface or module import paths. I'm not sure the Datasource interface has been fully designed with third party stability in mind.. that said my first question is really about what is the special case with this datasource that requires it to have dependencies on module that none of the other cloud init data sources seem to require (otherwise the dep would already be available)?

Perhaps I am not using the full functionality of this datasource so I may not understand the edge case here.. but all we do is pump some encoded data into the Vpshere API using our automation tooling... which we want passed into cloud-init... cloud-init itself should mostly be handling the system level details to apply the desired configuration. There is a long list of other datasources which just work when passed a cloud-config...

@akutz
Copy link
Contributor

akutz commented Sep 19, 2020

On mobile, but it boils down to DHCP and instance data. Go look for the commit where i introduced instance data support. EC2, GCE, those don’t rely on DHCP. We do. There’s not metadata service or IPAM native to vSphere, so if we want to avail ourselves of instance data that includes the network config, we have to write that ourselves. And that is discovered with netifaces.

I’ll Add more when at a laptop

@skyscooby
Copy link

skyscooby commented Sep 19, 2020

@akutz OK that's one area that differs for our use case.. we are passing static network configuration through guestinfo.metadata ... but even with the DHCP case why do we need the instance-data.json to be retroactively updated with actual IP information should it not just contain the metadata as supplied by the user.. (ie that the interface should be configured with DHCP?)... in either case cloud-init would then take that and do whatever distro-specific actions are necessary to configure the interface for DHCP.. and the DCHP server would configure the interface down the line. Are you collecting IP / inventory information by going back and using the cloud-init file? or am I missing something? Is there some functionality in cloud-init that does not work correctly if we don't update instance-data.json with the allocated DHCP IP information?

I checked on our failed nodes and when I run the datasource from the command line it hangs.. when I break out I can see it's gotten stuck in the wait_for_network sleep loop (even tho we are not passing a request to do so through the metadata).. so I am guessing something in this logic is what is causing this issue for us... when I manually re-configure the network and run the datasource it works again.

@skyscooby
Copy link

I would gander that if there was expected cloud-init functionality that instance metadata be dynamically updated with a 'live' view of the systems network state that would and should be implemented in cloud-init core once and wouldn't be provider specific... From all I gather the cloud init metadata is only to be passed in one direction from the "cloud" -> instance.

@akutz
Copy link
Contributor

akutz commented Sep 19, 2020

From all I gather the cloud init metadata is only to be passed in one direction from the "cloud" -> instance.

The expectation is that a cloud platform is supposed to provide the cloud instances with their network information -- and again, that doesn't exist or happen on vSphere environments with DHCP. This functionality simulates that behavior as it is required for many systems that rely on cloud-init instance metadata queries for discovery of network information.

Every other cloud provider is able to indicate in advance what the networking information is and include it in the instance data where it is queryable and you can use Jinja to interpolate your user data with the instance data. Please see https://cloudinit.readthedocs.io/en/latest/topics/instancedata.html#using-instance-data for an example.

Because vSphere often relies on DHCP, that information is not available in advance. This is not something that cloud-init provides. This is something the data source is supposed to provide. This functionality was introduced to allow Kubernetes images that are bootstrapped with cloud-init and need to know information that is only available after the network is online to still be used with the same templates every other cloud platform uses.

@skyscooby
Copy link

Thanks very much for the explanation, that last bit helps me understand the goal you had in mind when building it, basically shimming in functionality to support existing templates that might require IP information at boot time.Indeed DHCP is not a typical use case in the cloud (unless it's supported in some VPC implementations.. not sure).

I found the exception in the debug output that is causing me troubles.. are you aware of this

2020-09-19 00:24:38,823 - util.py[DEBUG]: Getting data from <class 'cloudinit.sources.DataSourceVMwareGuestInfo.DataSourceVMwareGuestInfo'> failed
Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceVMwareGuestInfo.py", line 320, in get_guestinfo_value
    [VMWARE_RPCTOOL, "info-get guestinfo." + key])
TypeError: 'module' object is not callable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 770, in find_source
    if s.update_metadata([EventType.BOOT_NEW_INSTANCE]):
  File "/usr/lib/python3/dist-packages/cloudinit/sources/__init__.py", line 659, in update_metadata
    result = self.get_data()
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceVMwareGuestInfo.py", line 125, in get_data
    self.metadata = load_metadata()
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceVMwareGuestInfo.py", line 425, in load_metadata
    data = load(guestinfo('metadata'))
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceVMwareGuestInfo.py", line 399, in guestinfo
    data = get_guestinfo_value(key)
  File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceVMwareGuestInfo.py", line 327, in get_guestinfo_value
    except util.ProcessExecutionError as error:
AttributeError: module 'cloudinit.util' has no attribute 'ProcessExecutionError'

@skyscooby
Copy link

Disregard I see a Fix in #53...

I will have to re-bake all my base images or pin cloud-init from being updated from upstream I guess.

@skyscooby
Copy link

Actually #54 fixes this... master is still broken on latest ubuntu 18.04 cloud-init version at the moment.

@mcmcghee
Copy link

Disregard I see a Fix in #53...

I will have to re-bake all my base images or pin cloud-init from being updated from upstream I guess.

I'm in the same boat as you until the fix gets merged. I've created a number of images that utilize this and our infrastructure is dependent on those images now.

I assumed that this had the full support of VMWare and wasn't one person's side project. Knowing that now makes me uneasy. I might pin cloud-init regardless from here on out. (This is not a knock on you at all @akutz, the work you've done is appreciated and this module has been very useful.)

@akutz
Copy link
Contributor

akutz commented Sep 22, 2020

I assumed that this had the full support of VMWare and wasn't one person's side project.

I totally understand. We do have coverage for OSs like Photon as this datasource is included with Photon OS, but as @skyscooby mentioned above, this datasource exists specifically to address gaps, such as DHCP on vSphere as opposed to static IPs via IPAM like other cloud providers. This makes it difficult to test with a wide range of operating systems.

As I noted above, I've looked into setting up CI pipelines internally to test against cloud-init HEAD.

This is not a knock on you at all @akutz

Of course. People need software to be reliable. I do not take these concerns personally, and I appreciate everyone's patience.

I am totally open to this going upstream, but I do not see a path forward for that without the inclusion of the netifaces dependency. That is the foundation of the feature set used by projects like Cluster API and VMware's own Project Pacific for bootstrapping machines.

@weikinhuang
Copy link

Would it be possible to split this into two different datasources in this case, where one of the datasources just queries for the static configs (like setting static addresses in metadata) found in the vmx file using vmware-rpctool, and a separate datasource that handles the dhcp nature of the above issues.

That way, it would be possible to upstream the section that specifically reads from the vmx configs, the portion needed for reading dhcp net can remain here in this datasource.

@akutz
Copy link
Contributor

akutz commented Sep 22, 2020

Hi @weikinhuang,

Before I did something like that I would rather modify the datasource to only import the netifaces module when/if that feature was activated, making activation conditional on some configuration element.

@weikinhuang
Copy link

but wouldn't that still mean it would depend on netifaces, and therefore still make it hard to include upstream?

@akutz
Copy link
Contributor

akutz commented Sep 22, 2020

Hi @weikinhuang,

Python's imports only matter when they are actually loaded, and that can happen:

  • Anywhere in the file
  • Based on a conditional

Python 2

$ python2 - <<EOF
if True:
  import netifaces
else:
  import netifaces2
print('hi')
EOF
hi

Python 3

$ python3 - <<EOF
import os
if True:
  import netifaces
else:
  import netifaces2
print('hi')
EOF
hi

@weikinhuang
Copy link

oh sorry, I meant with regards to the coding style of the upstream cloud-init project.

@akutz
Copy link
Contributor

akutz commented Sep 22, 2020

oh sorry, I meant with regards to the coding style of the upstream cloud-init project.

Could you please clarify this? Your statement was:

but wouldn't that still mean it would depend on netifaces, and therefore still make it hard to include upstream?

I took that to mean the dependency as the issue. How does that relate to the coding style? Perhaps you mean project rules and governance, such as may be related to external dependencies?

@weikinhuang
Copy link

yep

@akutz
Copy link
Contributor

akutz commented Sep 22, 2020

Hi all,

I just opened #56 to remove the dependency on deepmerge. I'd appreciate people taking a look. Thank you!

@skyscooby
Copy link

skyscooby commented Sep 22, 2020

@akutz was the objection from upstream not related to pulling netifaces from pip vs using it in general? It seems that rhel7 / rhel8 and ubuntu and others for some time has included python-netifaces packages in the distro itself.. https://pkgs.org/search/?q=netifaces. I wouldn't think they would have issues declaring a dependency on software that is part of the distro proper.

I also thought they were open to vendoring in the pip dependency as well or was that ruled out?

It would mean ensuring the use of netifaces in the module is compatible with all the different versions of netifaces each distro includes tho..

@akutz
Copy link
Contributor

akutz commented Sep 22, 2020

Hi @skyscooby,

The pip thing is a non-factor. That's just how it's installed with curl bash. I don't really care where netifaces originates.

@skyscooby
Copy link

@akutz
Perhaps with this new CI it could be setup then to install the OS specific netifaces package vs the version declared in requirements.. I'm somewhat confident the upstream would either be happy to add a dependency on this OS package to cloud init package, they could also create a new subpackage for just this data source module that does depend on the OS package if they strongly oppose doing it for the entire cloud init package.. and barring that failing we would at least be in a position to build deb/rpm's that have been tested with the varying python-netifaces versions included with the OS's that are supported.

@powersj
Copy link

powersj commented Sep 23, 2020

First, great to see the conversation here :)

Next, this week is the cloud-init summit and it is open to the community: https://discourse.ubuntu.com/t/cloud-init-online-summit-sept-23-24/17867 It would be very, very nice to have someone there to talk about their usage of cloud-init + VMware. Last year we were not able to have anyone from VMware and it is something that many of the other clouds and distro maintainers have requested.

Finally, if you do go down this route, make sure it is python3-netifaces. Not saying we would add the dependency, but upstream cloud-init only supports Python 3.4 and newer as of right now. I am sure this will be re-evaluated at the summit.

@akutz
Copy link
Contributor

akutz commented Sep 23, 2020

Hi @powersj,

That’s the dep I use when building the RPM, so that should work. Thanks for reaching out; I’m on mobile, so my response will be brief for now.

@akutz
Copy link
Contributor

akutz commented Sep 23, 2020

@powersj I am happy to attend the summit as long as remote attendance is acceptable. I could discuss the purpose of this data source and how it helps address a gap with cloud-init bootstrapped systems on vSphere.

@powersj
Copy link

powersj commented Sep 23, 2020

@akutz yep this is 100% remote!

@akutz
Copy link
Contributor

akutz commented Sep 23, 2020

@powersj Awesome! I reached out to some people a few days ago about this project, and there does appear to be interest in moving ahead with getting this contributed to cloud-init proper. Maybe if I have a chance to discuss the reasoning behind the netifaces dependency at the summit, others will be on board with allowing that. I opened #56 to remove the dependency on deepmerge, so netifaces is the only one left standing.

@akutz
Copy link
Contributor

akutz commented Sep 23, 2020

@powersj What are the next steps for me to attend the event? I'm akutz at the company for which I work if you would like to e-mail me. Thanks!

@powersj
Copy link

powersj commented Sep 23, 2020

Mail sent! Thanks!

@islandbee
Copy link

Hi, has there been anymore progress on this? Seemed to be picking up steam.

@ggiesen
Copy link

ggiesen commented Mar 12, 2021

Would love to see this move forward as well...

@johnypony3
Copy link

+1 on this

@akutz
Copy link
Contributor

akutz commented Jul 23, 2021

canonical/cloud-init#953

@akutz
Copy link
Contributor

akutz commented Jul 27, 2021

FYI, I'm renaming this DatasourceVMware as it merges into cloud-init. This is consistent with the pattern that the DS have a suffix of the vendor name.

@akutz
Copy link
Contributor

akutz commented Aug 10, 2021

Merged upstream into cloud-init

@akutz akutz closed this as completed Aug 10, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests