I spend 1 day to understand why I was not able to log with root user on GCE for RH and CentOs image whereas it works fine for Debian image. I post here the results of my findings hoping it will save time for some other people (I also open a bug report : https://code.google.com/p/google-compute-engine/issues/detail?id=114)
For my application I wanted to be able to connect to my new CentOs VM with root user. I don’t want to go in the “allow ssh with root is dangerous”….
I made all the necessary changes on the SSH conf on the new created centOs image but I was not able to log on the VM. I know that the setup was correct since I do the exact same scenario with a debian image and it was working just fine….Thus I start suspecting there was an issue….
I start investigate and found out that there are some google scripts running on background on each VMs to take care of replicating the SSH keys defined on GCE console to the VMs. The scripts are available here :
https://github.com/GoogleCloudPlatform/compute-image-packages
I added some logs and see the following pattern:
[AuthorizeSshKeys] user : charles ssh_keys : ['ssh-dss ….vQBN7nAVg== charles@ip-172-31-45-251'] UID : 500 [WriteAuthorizedSshKeysFile] Original File : /root/.ssh/authorized_keys [WriteAuthorizedSshKeysFile] New_keys_path : /tmp/tmpjwGRv8 [WriteAuthorizedSshKeysFile] UID : 0 GID : 0 [AuthorizeSshKeys] user : root ssh_keys : ['ssh-dss …..nC4RvQBN7nAVg== root@ip-172-31-45-251'] UID : 0 [WriteAuthorizedSshKeysFile] Original File : /root/.ssh/authorized_keys [WriteAuthorizedSshKeysFile] New_keys_path : /tmp/tmp15enQW [WriteAuthorizedSshKeysFile] UID : 11 GID : 0 [AuthorizeSshKeys] user : operator ssh_keys : [] UID : 11 [WriteAuthorizedSshKeysFile] Original File : /home/charles/.ssh/authorized_keys [WriteAuthorizedSshKeysFile] New_keys_path : /tmp/tmpDB6knL [WriteAuthorizedSshKeysFile] UID : 500 GID : 500
The script update the ssh key for the following users : Charles, root, and then “operator”. I understand the 2 firsts users “Charles” and “root” which are the users I defined in GCE console nevertheless the user “operator” is more strange especially because it end up by changing the “root” sshkey.
I investigate deeper the python code and understand this last flow updating “operator” user. The google python daemon use /etc/passwd to determine the path to the ssh key and on RedHat (or Centos) the operator user is defined with :
User | UID | GID | Home Directory | Shell |
root | 0 | 0 | /root | /bin/bash |
operator | 11 | 0 | /root | /sbin/nologin |
The « home dir » of « operator » is thus /root…..leading the daemon to update /root/.ssh/auth… when working on “operator” user. It thus removes the real “root” file preventing the root login. Checking deeper this flow is designed to clean the “ssh key” file for the users which are not present on the GCE console (which is the case for “operator”).
I would like to have more time to propose a patch but I me late on my project. I done a workaround which is to create a new centOs image with /sbin as the home dir for the operator user :
usermod -m -d /sbin operator
and then everything works fine !
For the final solution I would suggest to change the script to maybe check if a “home” is shared and do nothing in this case or change the RH image.
Update (Nov2014) : The issue has been fixed on GCE side with last version of images.