The Host-it Anywhere GitLab Deployment

I run a GitLab server.

Sounds easy, right? All you need to do is run the GitLab omnibus installer on your target system of choice and be on your merry way. Not so fast. What if you’re like me and you’re:

  • running your own server at home in a VM
  • where you don’t have control over your ISP’s port filtering
  • and you need to be able to move your Gitlab installation at a moment’s notice to a new network or server.

The easy solution isn’t enough. You need to do more.

What these limitations really mean is that I cannot directly expose my GitLab server to the Internet. But, if I cannot just open a port on my router and direct my subdomain to the GitLab server, how can I make the service publicly available? In my specific case, I’m working with external developers, so I want the server to be available not only through a VPN connection to my home network, but also through a standard HTTPS connection.

The high-level answer is: a relay server.

Relays are fairly common in peer-to-peer applications. One common situation they’re well suited to are cases where a computer is on a network without the ability to forward ports, so nothing can directly connect to them. Bittorrent Sync, for example, relies on relays whenever two peers cannot negotiate a connection directly to each other; this is usually due to a firewall between the two machines blocking the incoming ports they’re trying to use.

Going back to the GitLab server, I need something that supports relaying an HTTP connection from a machine that is not public facing to another machine that can readily accept connections from the Internet. The solution: SSH tunnels.

SSH tunneling works by forwarding a port over an outbound SSH connection on the client to a server, then exposing a port to the server. At that point, you can simply access the resource at localhost:someport on the server. If you needed to expose something simple, say a file syncing program or remote administration tool, your work would be done here.

But what if you want to access that resource on anything that can connect to the SSH server? One option is to make the forwarded port globally available (that is, make the SSH server listen to incoming connections on all interfaces, not only the local loopback). This is a simple solution – all it involves is changing the SSH server’s configuration file to allow these types of connections – but it comes with a few major disadvantages:

  • you’re limited to only ports above 1024, unless your SSH client connects to the server as root. Since allowing remote root logins is always bad idea, as it is the most common target of automated bruteforce logins, you probably don’t want to allow these types of connections from your client. Regular users (non-root) cannot bind to ports below 1024.
  • you can only run one HTTP service on a port. So, if you have 3 different web services you want to relay, you’d need to ask your users to connect over a non-standard port. Only one service can be on the standard ports 80 and 443. This is rarely a good practice.
  • you probably don’t want to expose all of your SSH tunnels to the world. Yes, your firewall can block ports that you don’t want just anybody to connect to, but why take the risk? Relying on a firewall creates potential for a single point of failure and allowing SSH tunnels to be accessible on all interfaces violates the Principle of Least Privilege.

Again, this is a solvable problem. But before I share the answer, I need to take a quick detour to talk about the general unreliability of the SSH protocol.

Actually, the SSH protocol is well-implemented and reliable. The problem with it stems from the fact that any network interruption has a tendency to terminate an SSH connection. This is especially true if you ever plan on changing the network that your SSH client (i.e. your Gitlab server) is running on without fully restarting the server or manually restarting the SSH client. You’re also bound to face issues if your ISP suffers from a lot of uptime issues.

AutoSSH to the rescue. AutoSSH is basically a watchdog for the SSH protocol. If the connection breaks, AutoSSH automatically attempts to reconnect. It does a great job at recovering from a network disruption and can recover minutes (or even hours) after a failure.

Now, returning to the original discussion, we need to ask how can we make our HTTP service accessible to the Internet in a way that isn’t prone to the aforementioned issues? One solution: a reverse HTTP proxy, like Apache’s mod_proxy. There’s also a solution for Nginx, which works great, but is not the focus of this post.

mod_proxy allows you to use all of the power of a bona fide HTTP server, including VirtualHosts, which support multitenancy environments on your relay server. So, you end up with the ability to control access to a resource based on what domain is used to access the server. This makes it simple for you to have and point to two different servers, while being accessible via the same IP on your relay server.

To make this work, you essentially set up proxy rules that forward the connection to the forwarded port, so you might have rules that look like:

ProxyPass / "http://localhost:12345/service"
ProxyPassReverse / "http://localhost:12345/service"

Which transparently forward the connection to the server that’s inaccessible from the Internet. Your users won’t even be able to tell where the server is actually running; they’ll only know the IP of the relay server. (Just don’t rely on this for anonymity; anybody with the ability to observe your relay server’s connections will be able to tell the origin of these SSH connections.)

Before I wrap up, you may have noticed that although I mention Gitlab, this approach can work with any HTTP-based web service. You may run into some minor issues with specific configuration settings, especially related to the URLs used in the application. But, the good news is I’ve been able to get this to work with: Gitlab, Apache, Jira, and Confluence. I’m sure it works with other applications, as well – just make sure you change the base URL that your application uses to the one that is publicly accessible.

If you plan on using SSL/TLS for encrypting a connection between GitLab and your users, you’ll need to be sure to install the SSL certificate on both your relay server and your Gitlab server. I found this to also be true with Confluence and Jira, as well.

One last thing: GitLab also allows users to access repositories via an SSH connection. While it’s easy to proxy HTTP content, you’re out of luck with SSH – you have to find another way to access that service. I don’t consider this a critical downside, because you always have the ability to access Gitlab repositories via HTTPS, but it can be a source of frustration for some users.

Why Open Standards Are Essential to Your Software Project

Many software projects, both open-source and proprietary, like to rehash solutions to the same old problems that have been solved 20 times before. Case in point: XACML authorization libraries. A cursory Google search shows that there are at least a half dozen of these in existence; all are written in Java and all do pretty much the same things, with some minor exceptions. Some are open-source.

I’ve recently started working on the challenge of creating a large, scalable SaaS solution that has a lot of moving parts. It must do user authentication. And authorization. And have a web interface. And have a SDN layer. And several other components that are specific to its own featureset. Sounds like a lot of work — and it is.

So what’s the best way of tackling these challenges?

Option #1 — Do-it-yourself

This is always an option. Fortunately, each of these components have fairly well documented and implemented solutions that are publicly available. From a purely academic standpoint, there might be a lot to gain (in terms of knowledge and skill) in building your own implementations of these components.

There are several caveats to doing this, however:

  • Time and money. This can be unforgiving, especially in a startup, where time and money are especially limited (don’t think time is limited? Think again.)
  • Expertise. Can a smart person learn how to do pretty much whatever they want? Sure. But are they going to get it right the first time? Probably not. Even if they do, could they be doing something better instead? Most likely.
  • Messing it up. This might be the most dangerous part of doing it yourself. Chances are you’ll probably get it wrong the first time. The second time. Even the third time. Ever hear someone tell you to “never roll your own security?” That rule applies here too.

Option #2 — Use Commercial (& Proprietary) Solutions

This can be a great option. You certainly avoid the issues surrounding investing human capital into solving the problem, which is a major caveat of doing it yourself.

You’ll also likely have a company to stand behind the application, which can mean either great, proactive support that really cares or a group of less-than-qualified support reps that know very little about the product and don’t really care about the customer’s experience. It’s usually somewhere between those two extremes.

Still, it’s not ideal:

  • Money. In a startup, it’s often hard to justify spending money on software, especially if it involves costly licenses.
  • Compatibility. Good commercial solutions are agnostic to their surrounding environment (languages, operating systems, APIs, and frameworks) or have extensive support. But, what if you run into issues and the company behind it doesn’t care enough to fix it? You might have been better off DIYing it.
  • Long-term Maintainability. What happens when the company behind it discontinues the codebase? What if they go out of business? Who’s going to keep it alive.
  • Trust. Where novice mistakes and incompetence plague DIY solutions, malice and carelessness plague commercial solutions. While most software is reputable, without seeing the code, there’s no way of truly knowing whether the code you’re running actually works correctly and honestly.

Option #3 — Use Open-Source (& Standards-based) Solutions

This is really the ideal approach to the problem. Commercial solutions typically give you something that works, because that’s how they get you to buy their solutions.

Open-source solves the issues with:

  • Cost. Often open-source software is free or has an upstream version that’s free. You can start with a gratis option and move to a paid option, that often adds prioritization of your needs (fixes, feature requests, updates, etc.) and support, when your requirements justify it.
  • Trust. With open-source, you don’t need to worry about trusting other people, because the code is completely transparent to you. No longer are you running someone’s secretive binary that may or may not be safe.
  • Indefinite maintainability. This is really an issue of whether you’re willing to keep it going in the event that it dies, but since you have the source code, you can continue to improve the application even if the original creator disappears.

[Note: In this article, my discussion is focused around rethinking how we create features, capabilities and components in a large software project, which is different from the libraries, APIs, frameworks, languages, and databases we use. These are tools. I’m referring to pre-packaged solutions, like JBoss Keycloak, which are standalone applications.]