Add known_hosts on startup via config.yml configuration


#1

tl;dr

  • add options for appending known_hosts to ~/.ssh/known_hosts or /etc/ssh/ssh_known_hosts on the primary container’s startup
  • this greatly helps when starting ssh connection without showing prompts “Are you sure you want to continue connecting ( yes /no )?”, in any job scripts like capistrano in Rails app

Syntax Suggestions

ssh_known_hosts: 
  - bastion.example.com
  - app-server.example.com

This is inspired by: https://docs.travis-ci.com/user/ssh-known-hosts/

Background

We have a Ruby on Rails web app, and in deploy: section of .circleci/config.yml, we would like to deploy to multiple app servers (both stating & production) via Capistrano v3, which requires SSH connection.

Here is the example for the servers structure:

the Primary Container on Circle CI
  \==> ssh to the bastion server
        \==> ssh to the app server A (prod)
        \==> ssh to the app server B (prod)
        \==> ssh to the app server (staging)

In Circle CI 1.0, there was a global configure for setting StrictHostKeyChecking no, which we implicitly relied on:

# while SSH debugging onto the Circle CI container
ubuntu@box573:~/.ssh$ cat /etc/ssh/ssh_config
Host *
  StrictHostKeyChecking no # here
  HashKnownHosts no
  SendEnv LANG LC_*

From Circle CI 2.0, such configuration depends on which containers we choose for the primary container, so yes you can manually set StrictHostKeyChecking, but it is not the ideal because of security risks.

Therefore, I believe that it is great to have ssh_known_hosts: options, which add known_hosts on the startup.

Q&A

Q. Why not StrictHostKeyChecking no ?

Because it has a vulnerability against Man-in-the-middle attacks, which can be caused by DNS spoofing or IP-address spoofing in this case.

Q. How about ssh-keyscan {{ hostname }} >> ~/.ssh/config ?

This often solves the problem, but if you want to add remote servers behind a proxy server (a.k.a. a bastion server), ssh-keyscan does not suffice.


#2

This is great feedback - thank you!

Just to clarify, you’d like to have the ability to specify SSH fingerprints in your config.yml and have it automatically append to ~/.ssh/known_hosts?

Also, while this feature is unsupported, does appending a file to ~/.ssh/known_hosts work? To have the fingerprints committed in a file within your repo?


#3

Yes, exactly!


Also, while this feature is unsupported, does appending a file to ~/.ssh/known_hosts work? To have the fingerprints committed in a file within your repo?

Well, yes and no. It may work for most of the cases, but not for our case.

There are 3 ways for appending known_hosts format *1:

  1. Copy and paste from other local machine’s known_hosts, which already connected with web app servers
  2. Use ssh-keyscan command and dynamically create public keys info from the private key (~/.ssh/id_*)
  3. Write a work-around script to connect with all app servers via ssh on the startup of the primary container

As for “1” case, our web apps’ ip addresses can be changed dynamically. Therefore it is pain and over-engineering to maintain such a copied known_hosts file.

And for “2” case, we can get a desired public key format just for bastion server, and not for other app servers behind bastion with ssh-keyscan command.

Finally for “3” case, well… this solution can be accepted for our case this time. However it is the best that config.yml supports the feature.


So far, we temporary set StrictHostKeyChecking no with TODO comment, and we are working on separating deploy jobs from CI ( because it has a vulnerability as I have mentioned above ).


*1: FYI for those who are interested, you can see known_hosts file formats with man sshd.


#4

Thank you so much for the verbose explanation. I’ve opened a feature request internally for you.


#6

Thank you very much for opening a feature request. Looking very forward to it.


#7

Great proposal!

We had to do similar workarounds for two scenarios:

  1. npm install pointing to private git repos. For added security, we created a custom DNS that points to github.com in order to attach a specific ssh key (ex. reponame.git.our_company.com)
  2. running git fetch -n right before deploying master to make sure the build currently running has the latest code.

@kenju were you able to test these settings locally? We setup a couple environment variables and SSH keys through the UI but I don’t know how to pass that information to circleci build.

The build specifically says:

====>> Installing additional ssh keys
There are no configured ssh keys to install

#8

No, I did not test it locally.
I get the same message for add_ssh_key if it is run locally:

====>> Installing additional ssh keys
There are no configured ssh keys to install

I do not think that you can retrieve environment variables’ values via local build. Currently circleci command does not support add_ssh_keys locally.


#9

is this being considered for implementation/being implemented?
Any way we can track the progress in either case?


#10

My workaround was to explicitly add the fingerprint. For some reason using ssh-keyscan wasn’t working :\

/* ... lots of stuff ... */

  - run: echo 'THE FINGERPRINT' >> ~/.ssh/known_hosts
  - run: ssh user@host -p port 'some command'

/* ... final stuff ... */

Works for me! I tried adding the fingerprint to an environment variable, but circle ci kept saying it was failing. I would rather do that, so I’ll move as much configuration over to env vars on circleci as I can.


#11

Why an environment variable?


#12

Putting configuration in code (ie config.yml) makes it more likely to change. Letting it load configuration (ssh keys, user, host, port, etc…) from another source (env vars) makes those things trivial to change (go to circleci ui and change the variable) - it reduces the reasons to change the code. This makes it more maintainable. That’s why environment variables should be preferred.

It’s basically applying the 12 factor methodology to these yaml files. Even though these yaml files are called configs, they’re actually code- as they are an instruction set a machine interprets to execute said instructions.


#13