To Lump or to Split?

I used to be a splitter.  I have recently begun the process of converting to a lumper.  What is a lumper and a splitter ?  Wikipedia defines lumping and splitting in software modeling as

A lumper is always keen to generalize, and produces models with a small number of broadly defined objects. A splitter is reluctant to generalize, and produces models with a large number of narrowly defined objects. For example, according to the lumpers, a subcontractor could be basically the same as any other supplier, and is therefore the same class; meanwhile the splitters would probably argue that there are significant differences between different groups of suppliers, justifying separate classes in the model

merriam-webster defines lumper as a noun meaning:

1 : a laborer who handles freight or cargo

2 : one who classifies organisms into large often variable taxonomic groups based on major characters

and splitter as a noun meaning:

1 : one that splits

2 : one who classifies organisms into numerous named groups based on relatively minor variations or characters

I believe each group more or less equally criticizes the opposite group of being ridiculous.  I have found over my short existence as a software developer that I tend to complicate issues, meaning that I am a splitter.  I had a manager once who told me there are two types of software developers in the world – those that over-simplify and those that over-complicate.  And he proceeded to tell me that I fell into the over-complicate category.  That was four years ago.  I have seen myself time and time again end up in the over-complicate camp with my software design solutions, approaches to issues in my personal life, including something as simple as cooking!  About a year ago i made the connection between the over-complication/over-simplification concepts and lumping/splitting.

Lumping is to over-simplifying as splitting is to over-complicating and there are cons in both camps.  Over-lumping results in architectures that are too generic, not able to scale well and adapt easily to future requirements.  Over-splitting results in creating entirely too many discrete functional parts to a whole.  While this may allow for ultimate flexibility, pluggability, and whateverability, there are two rather significant consequences:

  1. things become complicated and difficult to maintain
  2. things slow down with lots of different processes running (if you have split across assemblies that is)

I now see the wisdom in Einstein’s view that

Everything should be made as simple as possible, but not simpler

or, re-worded into lumping/splitting language that would say

Lumping should be used whenever possible, but not more than necessary

I now take the approach in situations where I have influence over architecture and system design to keep things lumped together until there is good reason to split out.  There is a well understood concept in the software development world around separation of concerns – which is splitting pieces of a system out into separate layers to allow the system to change over time with less pain.  If you do any amount of reading on design patterns you will have heard about the three-tiered approach containing presentation, business logic and data layers.  Lumping and splitting decisions come into play here when you need to decide on the structure of certain objects within any of these layers.  Splitting in each layer should always require good reason to do so. 

So what are you, a lumper?  a splitter? or a mix of the two.  To some degree I believe we all have tendencies to do both depending on any number of reasons applying to our current situation – deadlines, mood (have you had your caffeine yet this morning?), team environment, mentors available, influences, etc.

Visual Studio Solution Configurations and Assembly References

Visual Studio 2008/2010 by default sets up two solution configurations – debug and release.  The difference in these two configurations is that the debug configuration contains the full symbolic debug information and no optimization and the release configuration contains no debug information and full optimization.- you can read more about that over at msdn.

image

The issue I run into with this is with setting references to other assemblies.  In my workplace we have 50+ projects that make up our architecture.  Those projects do not live under one solution.  When setting  a reference from one project to another you have these options (again more about this over at msdn):

  • .NET: .NET framework components
  • COM: COM components
  • Projects: projects local to the current solution
  • Browse: browse the file system for a component
  • Recent: recently added components

image

When working with a project in the same solution I used to always utilize the Projects tab in the Add Reference dialog because it will automatically switch between the debug and release folders that are automatically configured for each project during it’s creation.  This is actually a really nice feature…  at least nice until you need to set a reference to a project outside of your current solution.  In this case you must use the Browse tab and point directly to an assembly.  The issue I have with this is that if that assembly was compiled in debug mode then you must point to the assembly in the debug folder and if in release mode then in the release folder.  If another developer is working in the project that is responsible for the assembly you are referencing and changes the configuration that compiles to a folder you did not reference then you are no longer referencing the latest version of the assembly.  And come build time – kabam!  there is a possibility you broke the build!  The solution I see to this problem is to have all assemblies compiled directly in the /bin folder and not to separate /debug and /release folders – you can set this in the project properties.

image

I’m sure Microsoft has great intentions behind separate folders but I do not understand what it is.  What are your experiences in your organization?  Do you follow this same approach or do you utilize the debug/release folders?

 

Update 10/23/2010:

I posted a question about this over at stackoverflow to see what the experts would say and my idea was basically rejected flat out by four different responders.  The main theme of the responses being that it is better to deal with managing compilation to different folders than to one folder because, among other reasons, a clean + build would be needed for every compile to ensure debug and release versions of the assemblies would not get mixed.

The piece of advice from the stackoverflow post that is now in my toolbag is the concept of always referencing release versions of the assemblies, and if debugging is needed then bring the project into the solution you are working on to debug. 

PowerShell and Source Control Metadata

I often find myself merging code from one path in a Visual Source Safe (VSS) source control repository to another path.  If I am not careful in this merge process the .scc and .vspscc files will copy over to the path I am merging to. This can become quite problematic if i do not catch it before checking files out, applying changes and checking them in.  Imagine for a minute you have two paths in your source control repository:

  • Path Main
  • Path Isolated

You want to merge code from Path Isolated back into Path Main.  Currently both paths are configured to a particular directory (i.e. working directory has been set) on your local machine so when a file is checked out it updates the local .scc file and  marks the source code file in the repository as being checked out.  When said file is checked in VSS will, quite expectedly, update the local .scc file and also the source code file in the repository.  VSS relies on the .scc file to determine in which path  (Main or Isolated) the update is to be applied.  If this .scc file is copied from Path Isolated to Path Main then guess what happens when the file is checked out, changed, and checked in?  That’s right, it updates the source code file in Path Isolated and NOT in Path Main as you intended it to!  This can be a huge problem because now you think that you are checking code into Path Main but in actuality you are still checking into Path Isolated.

Now, this may not seem like a big deal if you are merging one or two projects with a shallow folder tree, but if you have multiple projects and some of those projects have deeply nested folder trees then it can become quite tedious taking care not to copy the .scc files from Path Isolated into Path Main.  So I consulted Google, and found (on a number of sites) a PowerShell script to recursively delete files with certain extensions.  This script supports a comma separated list of files or file types so it allows you to eliminate both the .scc and .vspscc files in the same script.

    Get-Childitem * -Include *.scc, *.vspscc -Recurse | Remove-Item

In the event that you are working with Subversion as a source control provider then you need to be able to recursively delete the _svn (or .svn) directories (instead of files) and to do this replace the *.extension parameter(s) with the folder name(s).

    Get-Childitem –Include _svn, .svn -Recurse -Force | Remove-Item -Force -Recurse

And that’s it!  Now you have the power of the shell (pun intended) at your fingertips to recursively delete those pesky source control metadatum.

Dropbox nails it!

I recently installed Dropbox to solve a problem of synchronizing a password file between machines.  I use KeePass on at least two computers.  I sometimes add passwords to KeePass on one machine, and sometimes on the other – it just depends where I am at the moment which is usually either at my laptop on the couch (or some other corner of the world if I’m travelling) or my wife’s computer at the desk in the family room.  I was quickly growing tired of manually reconciling differences in the password files.  In comes the aforementioned Dropbox, which is a simple file sharing program that creates a folder on your local drive (wherever you choose) and synchronizes any files in that folder to any other machine on which you have installed the Dropbox client using the same account.  The real genius in this is the automation of it.  It seemed as if I just barely thought about using it and it worked!

According to the features page on the Dropbox site Secure Sockets Layer (SSL) is used to secure the transmission of your files, and AES-256 bit encryption is used on their servers where your files are stored.  Since I am storing a strongly  encrypted password file I am not overly concerned about someone hacking into my data, but if I were to store, say, personal financial data, or confidential business documents on Dropbox I would do additional research to ensure the safety of my data.  Dropbox runs as a windows service (for the windows version) and consumes a total of ~36MB of RAM and it installed fine on my Windows 7 x64 machines.

image

Dropbox also has versions available for Mac, Linux and various mobile devices.  KeePass has a Linux port available, KeePassX, which I used for a number of years, but have since switched back to the windows version as I no longer have a need for password management on a Linux system.  I found the idea for this on Joel Spolsky’s blog.

DeltaCopy

I was looking for a backup tool recently with the following requirements:

  • onsite backup (including over LAN)
  • free/open source
  • scheduling
  • differential backup

I solicited feedback from coworkers and received these recommendations:

  • DropBox
  • Windows 7 Backup
  • SnycToy
  • Clonezilla
  • 2BrightSparks
  • SmartSync Pro

I looked at each of these recommendations, but ended up selecting DeltaCopy as it seemed to fit all of my requirements.  I really like that DeltaCopy is built on rsync, which is open source.  This means that in the future I can setup a linux file server with rsync installed and have my client machines around the house backup to that server over the network.

So what exactly am I backing up?  At the moment just my wife’s desktop computer, which is a Windows 7 x64 build and has this drive configuration:

andreapcdriveconfig

As you can see Disk 0 and Disk 2 are separate drives devoted to video and picture backup.  Of course this only covers onsite, but our pictures and videos are shared with relatives often enough that we do have offsite backups from that.  Otherwise it’s a time and cost intensive matter to backup 300+GB of stuff.

You can grab a copy at AboutMyX or at cnet download.com.  AboutMyX has a great how to guide including screenshots.  DeltaCopy is a bit different from what I was expecting as it is a server based backup program.  So you setup virtual directories similiar to how you do so for a web site in IIS.  Then when configuring the client portion you pick a virtual directory you have already setup.  Here are wiki articles on Rsync and DeltaCopy.

Returning

Returning to what?  Well blogging of course!  It’s been two years and five months since my last post, wow!  My intentions over two years ago when I started this blog was to post at least one or two a month.  I clearly have not met that goal.  Several things have occupied my attention since that time almost three long years ago – to name a few:  our third child came along, we bought and moved into a new house, i was elected to co-chair a committee that develops healthcare standards (technically we profile existing standards), my full time job has been increasingly requiring more of my time, etc., etc.

I am excited to be back – and why, you may ask, do I attempt to resurrect this blog?  Several reasons actually.  It provides:

1)  an outlet for my constant technical rambling

2) a record of problems I have encountered and their solutions

3) broadening of my perspective via reader feedback

4) opportunity to improve on my current communication skills – both verbally and written

One other note is that I have moved from blogger to wordpress.  I like that wordpress is open source and if I so choose to host on my own server one day I can easily do so.  Besides that I am a big fan of open source software – at some point in my future I would like to contribute to or start an open source project.

New Job, Same Direction

I started working for Greenway Medical Technologies last Monday. I will be doing the same type of work that I was for digiChart, just much more of it. I am excited about the opportunities that abound at my new company. I will be working out of my home most of the time, traveling to Carrollton, Ga once a month to have some face-to-face time with the team. I will also remain on the IHE PCC Committee as one of the authors of the Antepartum Summary profile. I feel privileged to be working on this profile and not only is it a healthy exercise for me, but also for my company as it will help to keep us in the loop in the interoperability world.

How to remove whitespace from xml serialized from a custom object

Recently I came across a problem with whitespace in xml that I was serializing from a custom entity class. The situation is this – I create my custom object, apply the XmlSerializer to it, generate the xml and put it in a memory stream without a hitch. I then convert to a string and save to a database. While verifying the data being saved I find that about %30 of the xml is whitespace, which can become quite considerable when you take into account hundreds of thousands of transactions. My first thought was to use good old regular expressions, but I was concerned about unintentionally removing whitespace that I may want to keep – such as inside an element or attribute. What I finally came up with was to load the xml string into an XmlDocument and set the PreserveWhitespace to false. See below for a simplified example.

Create and populate a custom object:

Car c = new Car();

c.Make = “Jeep”;

c.Model = “Wrangler”;

c.Year = “1981″;

Use the XmlSerializer class to serialize the object as xml to a System.IO.MemoryStream:

System.IO.MemoryStream ms = new System.IO.MemoryStream();

System.Xml.Serialization.XmlSerializer xs = new System.Xml.Serialization.XmlSerializer(c.GetType());

xs.Serialize(ms, c);


Convert to string for storage in database or other use:


string str = System.Text.ASCIIEncoding.ASCII.GetString(ms.ToArray());

You will now have the following xml in your string variable (it actually has tabs too, but when publishing from Word, blogspot doesn’t render the html quite as I expected) So this is nice, right?



Jeep

Wrangler

1981

Yes and no. The XmlSerializer class is very useful in that it is easy to implement, but it includes whitespace by default. This is good for presentation, but bad for data storage and transmission. The easiest way I have found to strip the white space is to do the following:

Create an XmlDocument and load the string into it:

System.Xml.XmlDocument xmlDoc = new System.Xml.XmlDocument();

xmlDoc.LoadXml(str);

Then set the PreserveWhitespace property to false:

xmlDoc.PreserveWhitespace = false;

Now the .OuterXml property of the XmlDocument will have this:

JeepWrangler1981

All done! Xml without the whitespace!

Linux Finally!

I have tried to install Linux twice in my life before now. The first attempt was with Slackware about 5 years ago – let’s just say that didn’t go so well. The second was about 4 months ago on a computer I was fixing for family – it was Ubuntu, and although it installed successfully I quickly gave up because I really wasn’t looking to spend all the configuration time necessary for a Linux newbie.

A few days ago I decided to install Ubuntu on my current Desktop as a dual boot alongside Vista. My goal is to run everything (outside of my professional computer life as a .NET developer) from Linux. Although it hasn’t been without it’s bumps, it has gone relatively well. I have it up and running on dual screen monitors, listening to music, browsing the web, watching videos, etc. And it’s sooooo much faster than Vista. I still have not committed to going fully to Linux as that will take much more work.

My favorite part of this project is that I am really having fun exploring this technology that is new to me. I have always known Microsoft OS’s, from way back in the DOS days, and it’s healthy to see other perspectives of how an OS can be run effectively. What has been challenging is the lack of idiot proofing Ubuntu provides. Windows has always excelled at this and tries to prevent you from irreversibly damaging your install (or at least gives you that impression). I reinstalled Ubuntu at least 3 times after hosing the video drivers – but then maybe that is just because I did not know how to unhose them, and it was much quicker to reinstall at that time.

Some Ubuntu tasks I have ahead of me include:
- install WINE so I can run certain windows apps (Quicken comes to mind first)
- setup Picasa for Linux
- research new package that allows read/write to NTFS

IHE Connectathon

Last week I attended an event called IHE Connectathon. IHE stands for Intergrating Healtcare Enterprise. The Connectathon is an event that is put on by IHE in different countries at various times during the year. I attended the IHE North America Connectathon in Chicago. My purpose for attending was to implement some of the various profiles available in my company’s emr application. The goal of our implementation was to be able to exchange xml documents across many different emr systems developed on different platforms with different languages. Anyone in the software development industry can appreciate the enormous complexity of this feat.

Several months leading up to the event my colleague and I prepared vigorously, working long hours. We constructed a database model I was proud of, and the code was not too shabby either. So we packed up and headed off to snowy Chicago ready for the Connectathon. We were quite surprised when Monday morning rolled around and all 300 attendees were there and started buzzing around like insects when the start bell rang. We sat in quiet confusion as we tried to digest the testing process that was happening all around us. After the first day we had figured out how to accomplish what it was we were supposed to accomplish. Mind you, we knew that we had to pass these certain tests, but knowing what our task was and how to accomplish it were two different things that first day.

As the week progressed we fine tuned our system, and dropped a few profiles (which meant less tests to pass), and by the end of the week we had our system functioning properly and passed twenty tests in total. Some companies passed more, some less – all in all it was a successful week and we have had the opportunity to really lay the groundwork for interoperability in my company’s application. I discovered that Connectathon really wasn’t about passing the tests as much as it was about learning how to become interoperable. In fact, “gold stars” or certification type awards were given out the first few years, but heavy marketing of these types of merits quickly negated the intention of the Connectathon. It created too much competition between the participating vendors, which had a negative effect on the cooperation of those normally competitive companies. You see, at Connectathon, it pays to work with your competitors – if they win then you win and vice versa. While there is a large sense of respect for each company’s privacy, there definitely is a focus on working toward win-win situations. I spoke with one infrastructure vendor (one who supports repositories of patient data) who said they do not require anyone working with them to implement and pass the Connectathon tests – they simply have to be able to complete the tasks necessary for the types of transactions they wish to implement.

Where these tests will come into play is when other government-backed health initiatives push certifications forward using the IHE profiles. This is coming – it’s a fact! The tests may be in a different form – administered differently, or what have you, but some of the same interoperability functionality we programmed for the Connectathon will be required in the future to have a successful emr application.

Whatever the future holds for me and health care interoperability I am excited to be a part of what it happening. I think it is revolutionary and I think that it will all affect our lives sooner than we realize.

Follow

Get every new post delivered to your Inbox.