Author Archives: William Roush

About William Roush

William Roush is currently employed as a Senior Software Developer and independent contractor in Chattanooga, Tennessee. He has more than 12 years of experience in the IT field, with a wide range of exposure including software development, deploying and maintaining virtual infrastructure, storage administration and Windows administration.

Solid State Drives Are More Robust Than Spinning Rust

Written by William Roush on March 20, 2014 at 7:37 pm

A number breakdown on why the idea that "SSDs are unreliable" is a silly statement.

I’ve been hearing some silly assumptions that magnetic drives are more "reliable" than Solid State Drives (SSDs). I’ve heard some silly ideas such as "can I mirror my SSDs to regular magnetic disks", while this behavior completely defeats the purpose of having the SSDs (all disks must flush their writes before additional writes can be serviced), but I’ll show you why in this configuration the traditional magnetic drives will fail first.

For the sake of being picky about numbers, I’m going to point out a few of these are “back of a napkin” type calculations. Getting all the numbers I need from a single benchmark is difficult (being as most people are interested in total bytes read/write, not operations served), additionally I don’t have months to throw a couple SSDs at this right now.

A Very Liberal Lifetime Of A Traditional Magnetic Disk Drive

So we’re going to assume the most extreme possibilities for a magnetic disk drive, a high performance enterprise grade drive (15k RPM), running at 100% load 24/7/365 for 10 years. This is borderline insane and would likely be toast under this much of a workload long before then, but this helps illustrate my point. The high end of the load these drives can put out is 210 IOPS. So what we see on a daily basis is this:

210 * 60 * 60 * 24 =     18,144,000
18,144,000 * 365   =  6,622,560,000

x 10               = 66,225,600,000

We expect at the most insane levels of load, performance and reliability that the disk can perform 66 billion operations in it’s lifetime.

The Expected Lifetime Of A Solid State Drive

Now I’m going to perform the opposite (for the most part), I’m going to go with a consumer grade triple-level cell (TLC) SSD. These drives have some of the shortest life span that you can expect out of an SSD that you can purchase off the shelf. Specifically we’re going to look at a Samsung 250GB TLC drive, which ran 707TB of information before it’s first failed sector, at over 2900 writes per sector.

250GB drive

250,000,000,000 / 4096 = ~61,000,000 sectors.
x2900 writes/sector = 176,900,000,000 write operations.

Keep in mind: the newer Corsair Force 240GB MLC-E drives claim a whopping 30,000 cycles before failure, but I’m going to keep this to "I blindly have chosen a random consumer grade drive to compete with an enterprise level drive", and not even look at the SSDs aimed at longer lifespans, which includes enterprise level SLC flash memory, which can handle over 100,000 cyles per cell!

So What Do You Mean More Robust?

The modern TLC drive from Samsung performed nearly three times the total work output of the enterprise level 15k SAS drive before dying. Well if that is the case why do people see SSDs are "unreliable"? The answer is simple: the Samsung drive will perform up to 61,000 write IOPS, where as the magnetic disk will perform at best 210, it would take me an array of 290 magnetic disks, at a theoretical optimal performance configuration (no failover) to match the performance of this single SSD.

Because of this additional throughput, the SSD wears out it’s lifespan much faster.

So I should Just Replace My HDDs with SSDs?

Whoa, slow down there, not quite. Magnetic storage still has a solid place from everywhere in your home to your data center. The $/GB ratio of magnetic storage is still much more preferable over the $/GB ratio of SSD storage. For home users this means the new hybrid drives (SSD/HDD) that have been showing up are an excellent choice, for enterprise systems you may want to look at storage platforms that allow you to use flash storage as read/write caches and data tiering methods.

PCI Compliant ScreenConnect Setup Using Nginx

Written by William Roush on February 19, 2014 at 9:26 pm

ScreenConnect’s Mono server fails PCI compliance scans from Qualys for a list of reasons out of the box. We’re going to configure a Nginx proxy to make it compliant!

There are a few things we’ll want before configuring ScreenConnect, we need two public IP addresses (one for your website, one for the ScreenConnect relay server). We’ll want a 3rd party cert from your favorite cert provider. I’m also going to assume you’re running Windows so I’ll include extra instructions, skip those if you know what you’re doing and just need to get to the Nginx configuration.

Get Your Certificate

mkdir /opt/certs
cd /opt/certs

# Generate your server's private key.
openssl genrsa -out screenconnect.example.com.key 2048

# Make a new request.
openssl req -new -key screenconnect.example.com.key -out screenconnect.example.com.csr

Go ahead and log into your server using WinSCP and copy your .csr file to your desktop, and go get a certificate from your Certificate Authority (.crt) and load that back to the server.

Recommended ScreenConnect Configuration

In your ScreenConnect directory you have a “web.config” file. You’ll want to edit (or add if not found) the following properties under the “appsettings” section of the configuration file.

<add key="WebServerListenUri" value="http://127.0.0.1:8040/" />
<add key="WebServerAddressableUri" value="https://screenconnect.example.com" />

We want to configure the web server address to listen on the first IP address we have, additionally pick a port that we’ll use for the internal proxy. I went ahead with the default port 8040. You’ll also need to set the URI to the domain for your first IP (should match the domain on your certificate).

<add key="RelayListenUri" value="relay://[2nd IP]:443/" />
<add key="RelayAddressableUri" value="relay://screenconnectrelay.example.com:443/" />

Additionally we’ll configure our relay server to listen on the second IP, we’ll set it to use port 443 which will help us punch through most firewalls, and we’ll want to set the URI to a second domain name we have pointed at the IP address we specified.

Nginx Configuration

# Defining our ScreenConnect server.
upstream screenconnect {
  server 127.0.0.1:8040;
}

server {
  # Bindings
  listen [1st IP]:80;
  server_name screenconnect.example.com;

  location / {
    # Redirect all non-SSL to SSL-only.
    rewrite ^ https://screenconnect.example.com/ permanent;
  }
}

server {
  # Bindings
  listen [1st IP]:443 default_server ssl;
  server_name screenconnect.example.com;

  # Certificate information
  ssl_certificate /etc/ssl/certs/private/screenconnect.example.com.crt;
  ssl_certificate_key /etc/ssl/certs/private/screenconnect.example.com.key;

  # Limit ciphers to PCI DSS compliant ciphers.
  ssl_ciphers RC4:HIGH:!aNULL:!MD5:!kEDH;
  ssl_protocols TLSv1 TLSv1.1 TLSv1.2;
  ssl_prefer_server_ciphers on;

  location / {
    # Redirect to local screenconnect
    proxy_pass http://screenconnect;
    proxy_redirect off;
    proxy_buffering off;

    # We're going to set some proxy headers.
    proxy_set_header        Host            $host;
    proxy_set_header        X-Real-IP       $remote_addr;
    proxy_set_header        X-Forwarded-For $proxy_add_x_forwarded_for;

    # If we get these errors, we want to move to the next upstream.
    proxy_next_upstream error timeout invalid_header http_500 http_502 http_503 http_504;

    # If there are errors we're going to intercept them.
    proxy_intercept_errors  on;

    # If there are any 400/500 errors, we'll redirect to the root page to catch the Mono error page.
    error_page 401 402 403 404 405 500 501 502 503 504 /;
  }
}

I’ve run a server with a similar setup through a Qualys PCI compliance scan (which the ScreenConnect server failed horribly prior to the changes), and it passed with flying colors.

Additionally remember to lock down your IP tables so you’re only open where you absolutely need to be, mainly 80 and 443 on your primary IP and 443 on your second IP. Add SSH into the mix if you use that to remotely connect to your servers (only accessible from inside of your company network though!).

1/19/2015 Update: No more SSLv3 due to POODLE.

Statically Compiled LINQ Queries Are Broken In .NET 4.0

Written by William Roush on January 19, 2014 at 5:31 pm

Diving into how a minor change in error handling in .NET 4.0 has broken using compiled LINQ queries as per the MSDN documentation.

Query was compiled for a different mapping source than the one associated with the specified DataContext.

When working on high performing LINQ code this error can cause a massive amount of headaches. This StackOverflow post blames the problem on using multiple LINQ mappings (which the same mappings from different DataContexts will count as "different mappings"). In the example below, we’re going to use the same mapping, but different instances which is extremely common for short-lived DataContexts (and reusing DataContexts come with a long list of problematic side-effects).

namespace ConsoleApplication1
{
    using System;
    using System.Data.Linq;
    using System.Linq;

    class Program
    {
        protected static Func<MyContext, Guid, IQueryable<Post>> Query =
            CompiledQuery.Compile<MyContext, Guid, IQueryable<Post>>(
                (dc, id) =>
                    dc.Posts
                        .Where(p => p.AuthorID == id)
            );

        static void Main(string[] args)
        {
            Guid id = new Guid("340d5914-9d5c-485b-bb8b-9fb97d42be95");
            Guid id2 = new Guid("2453b616-739f-458f-b2e5-54ec7d028785");

            using (var dc = new MyContext("Database.sdf"))
            {
                Console.WriteLine("{0} = {1}", id, Query(dc, id).Count());
            }

            using (var dc = new MyContext("Database.sdf"))
            {
                Console.WriteLine("{0} = {1}", id2, Query(dc, id2).Count());
            }

            Console.WriteLine("Done");
            Console.ReadKey();
        }
    }
}

This example follows MSDN’s examples, yet I’ve seen people recommending you do this to resolve the changes in .NET 4.0:

protected static Func<MyContext, string, IQueryable<Post>> Query
{
    get
    {
        return
            CompiledQuery.Compile<MyContext, string, IQueryable<Post>>(
                 (dc, id) =>
                    dc.Posts
                        .Where(p => p.AuthorID == id)
            );
    }
}

Wait a second! I’m recompiling on every get, right? I’ve seen claims it doesn’t. However peeking at the IL code doesn’t hint at that, the process is as follows:

Check if the query is assignable from ITable, if so let the Lambda function compile it.
Create a new CompiledQuery object (just stores the Lambda function as a local variable called “query”).
Compile the query using the provider specified by the DataContext (always arg0).

At no point is there a cache check, the only place a cache could be placed is in the provider (which SqlProvider doesn’t have one), and that would be a complete maintenance mess if it was done that way.

Using a test application (code is available at https://bitbucket.org/StrangeWill/blog-csharp-static-compiled-linq-errors/, use the db.sql file to generate the database, please use a local installation of MSSQL server to give the best speed possible so that we can evaluate query compilation times), we’re going to force invoking the CompiledQuery.Compile method on every iteration (10,000 by default) by passing in delegates as opposed to passing in the resulting compiled query.

QueryCompiled Average: 0.5639ms
QueryCompiledGet Average: 1.709ms
Individual Queries Average: 2.1312ms
QueryCompiled Different Context (.NET 3.5 only) Average: 0.6051ms
QueryCompiledGet Different Context Average: 1.7518ms
Individual Queries Different Context Average: 2.0723ms

We’re no longer seeing the 1/4 the runtime you get with the compiled query. The primary problem lies in this block of code found in CompiledQuery:

if (context.Mapping.MappingSource != this.mappingSource)
{
	throw Error.QueryWasCompiledForDifferentMappingSource();
}

This is where the CompiledQuery will check and enforce that you’re using the same mapper, the problem is that System.Data.Linq.Mapping.AttributeMappingSource doesn’t provide an Equals override! So it’s just comparing whether or not they’re the same instance of an object, as opposed to them being equal.

There are a few fixes for this:

Use the getter method, and understand that performance benefits will mainly be seen where the result from the property is cached and reused in the same context.
Implement your own version of the CompiledQuery class.
Reuse DataContexts (typically not recommended! You really shouldn’t…).
Stick with .NET 3.5 (ick).
Update: RyanF below details sharing a MappingSource below in the comments. This is by far the best solution.

You May Pay Even If You Do Everything Right (CryptoLocker)

Written by William Roush on January 13, 2014 at 7:14 pm

Many people in the IT field are depending on various products to protect them from CryptoLocker and similar malware, but how realistic is that really?

Seth Hall over at tunnl.in wrote an article detailing how many parts of your system must fail in order for CryptoLocker to infect your network. The major problem I have with the article is that this level of trust in your systems to protect you is exactly how a lot of companies got bit by the CryptoLocker ransomware, and the concept that "if you have these bases covered, you’re ok".

You’ll need an email server willing to send you infected executable attachments.

This assumes that CryptoLocker is going to come in a form that your email server will catch. One of the easiest ways to prevent your email server from blocking a piece of malware attached to an email is to password protect it. Which CryptoLocker has been known to do [1] [2] [3]. This leaves a handful of options in detecting the email: Either have a signature for the encrypted zip file, which if unique passwords are being used per email that wouldn’t work, or attempt to unencrypt all zips by searching the body of the email for the password (which I don’t think any mail filtering services do this).

And that is all dependent on the idea that you’re being infected by an already detected derivative of CrytpoLocker.

Your perimeter security solution will have to totally fail to spot the incoming threat.

Here Seth is talking about Firewall based anti-malware scanning. Again this falls into all of the same problems as relying on your email server to protect you.

Your desktop security solution will have to totally fail.

This is one of the major ones everyone relies on, your desktop antivirus catching malware, and by far this is what bit almost everyone infected by CryptoLocker. In my previous post about CryptoLocker I talk about how it wasn’t till 2013-11-11 that antiviruses were preventing CryptoLocker. With PowerLocker on the horizon these assumptions are dangerous.

Your user education program will have to be proven completely ineffective.

Now this is one of the major important parts of security, and by far one of the largest things that irk me in IT. I’ll go into this more in a more business-oriented post, but it comes down to this: what happens when I allow someone into the building that doesn’t have an access card? Human Resources would have my head and I could very well lose my job (and rightfully so!). Why is it that IT’s policies get such lackluster enforcement at most places?

In general, IT policies and training is always fairly weak. Users often forget (in my opinion: because there is no risk to not committing it to memory), and training initiatives are rarely taken seriously. People who "don’t get computers" are often put into positions were they’ll be on one for 8 hours a day (I’m not talking IT level proficiency, I’m talking "don’t open that attachment").

I feel this is mostly due to the infancy of IT in the workplace at many places, and will change as damages continue to climb.

Your perimeter security solution will have to totally fail, a second time.

It really depends on how you have your perimeter security set up. Some companies are blocking large swaths of the internet in an attempt to reduce the noise you get from various countries which they do not do business with and only receive attempts to break into their systems. This is pretty much the only circumstance your perimeter security will stop this problem.

Your intrusion prevention system […] will have to somehow miss the virus loudly and constantly calling out to Russia or China or wherever the bad guys are.

This is by far a dangerous assumption. CryptoLocker only communicates to a command and control server for a public key to encrypt your files with. I’d be thoroughly impressed by a system that’ll catch a few kilobytes of encrypted data being requested from a foreign server and not constantly trigger false alerts from normal use of the internet.

Your backup solution will have to totally fail.

This is by far in my opinion the only realistic "this is 100% your responsibility with a nearly 100% chance of success" on this list. Backups that have multiple copies, stored cold and off-site have nearly no chance of being damaged, lost or tampered with. Tested backups have nearly no chance of failing. Malware can’t touch what it can’t physically access, and this will always be your ace in the hole.

In Conclusion

And don’t take this post wrong! The list that Seth gives is a great list of security infrastructure, procedures and policies that should be in place. However I think it reads as if you won’t get infected as long as you follow his list, and that is not entirely accurate.