CSV file parser and writer in C# (Part 1)

An issue that comes up quite frequently is how to read and write comma seperated value (CSV) files in C#. Surprisingly the .NET libraries have no built-in support for this, and the usual solution to use an OleDb connection to the CSV file with Microsoft Excel's database driver is convoluted and not cross-platform.

CSV files have a very simple structure (source):

  • Each record is one line (with exceptions)
  • Fields are separated with commas
  • Leading and trailing space-characters adjacent to comma field separators are ignored
  • Fields with embedded commas must be delimited with double-quote characters
  • Fields that contain double quote characters must be surounded by double-quotes, and the embedded double-quotes must each be represented by a pair of consecutive double quotes.
  • A field that contains embedded line-breaks must be surounded by double-quotes
  • Fields with leading or trailing spaces must be delimited with double-quote characters
  • Fields may always be delimited with double quotes
  • The first record in a CSV file may be a header record containing column (field) names

In this article, I'll provide simple, but fully functional code to read and write a CSV file according to these rules. In memory, data will be represented as a DataTable, which makes it easy to process; for storage in the file system or transfer over a network the CSV data will be stored as a String or in a Stream.


First, the easy part: writing a DataTable to a CSV file.

public class CsvWriter
{
        public static string WriteToString(DataTable table, bool header, bool quoteall)
        {
                StringWriter writer = new StringWriter();
                WriteToStream(writer, table, header, quoteall);
                return writer.ToString();
        }

        public static void WriteToStream(TextWriter stream, DataTable table, bool header, bool quoteall)
        {
                if (header)
                {
                        for (int i = 0; i < table.Columns.Count; i++)
                        {
                                WriteItem(stream, table.Columns[i].Caption, quoteall);
                                if (i < table.Columns.Count1)
                                        stream.Write(',');
                                else
                                        stream.Write('\n');
                        }
                }
                foreach (DataRow row in table.Rows)
                {
                        for (int i = 0; i < table.Columns.Count; i++)
                        {
                                WriteItem(stream, row[i], quoteall);
                                if (i < table.Columns.Count1)
                                        stream.Write(',');
                                else
                                        stream.Write('\n');
                        }
                }
        }

        private static void WriteItem(TextWriter stream, object item, bool quoteall)
        {
                if (item == null)
                        return;
                string s = item.ToString();
                if (quoteall || s.IndexOfAny("\",\x0A\x0D".ToCharArray()) > –1)
                        stream.Write("\"" + s.Replace("\"", "\"\"") + "\"");
                else
                        stream.Write(s);
        }
}

The methods are static since the whole conversion is done in one method call, there is no need to create object instances etc.

WriteToString will return the CSV file in a string; it is just a wrapper around the more generic WriteToStream method. Both methods take a DataTable and two boolean flags to indicate if you want to write a header line (which would use the column headers of the DataTable) and if you want to quote all values instead of only values that need to be quoted.

Since CSV files do not work well for binary data, your DataTable should not contain any, although the resulting file would still be valid and could be read back in.

WriteToStream just loops through all rows and columns of the DataTable and writes the individual data items to the output stream.

The WriteItem method finally encodes an individual data item and, if necessary or requested, adds quotes around it.

In upcoming parts of this article series, I'll provide and explain code to read a CSV file back into a DataTable, and how to use NUnit to test everything.