windows:aboutexcelcsvformat
Differences
This shows you the differences between two versions of the page.
Next revision | Previous revision | ||
windows:aboutexcelcsvformat [2014/10/29 08:59] – created rlunaro | windows:aboutexcelcsvformat [2022/12/02 21:02] (current) – external edit 127.0.0.1 | ||
---|---|---|---|
Line 7: | Line 7: | ||
===== The CSV format ===== | ===== The CSV format ===== | ||
- | The [[http:// | + | The [[http:// |
+ | What I've find out researching into the way Excel stores this information and making some tries is this: | ||
+ | |||
+ | The general format is this: | ||
+ | |||
+ | < | ||
+ | value11< | ||
+ | value21< | ||
+ | </ | ||
+ | |||
+ | As ''< | ||
+ | |||
+ | If the character ''< | ||
+ | | ||
+ | < | ||
+ | " | ||
+ | value21; | ||
+ | </ | ||
+ | | ||
+ | **How the carriage returns in the values are treated??? | ||
+ | | ||
+ | |||
+ | < | ||
+ | " | ||
+ | value21; | ||
+ | </ | ||
+ | |||
+ | **What happens with the lenght of the values???** The maximum length per record is 32,768 **bytes** (sounds familiar, isn't it?). Pay attention to the " | ||
+ | | ||
+ | | ||
+ | ===== The best way to encode a CSV format ===== | ||
+ | |||
+ | After many tries, I've discovered that: | ||
+ | |||
+ | * You can use Unicode as an enconding format for your CSV files | ||
+ | * If you use Unicode, the best way to encode is to use <tab> as a field separator: the rest of separators (semicolon, comma) are not recognized for Excel as a proper CSV file, and then will force the user to use the " | ||
+ | |||
+ | ===== A reference implementation in visual basic script ===== | ||
+ | |||
+ | Here is my reference implementation in visual basic script (provided the input comes from a recordset), which can be easily converted into other programming language like java. | ||
+ | |||
+ | As a safety precaution, I've put behind double quotes any value that contains a " | ||
+ | |||
+ | <code vbs> | ||
+ | ' | ||
+ | ' | ||
+ | ' csv.vbs - helper functions for writing csv files | ||
+ | ' | ||
+ | ' | ||
+ | |||
+ | function openCsv( byval filename ) | ||
+ | |||
+ | dim fso | ||
+ | dim textFile | ||
+ | dim overwrite | ||
+ | dim useUnicode | ||
+ | |||
+ | overwrite = true | ||
+ | useUnicode = true | ||
+ | set fso = createobject( " | ||
+ | set openCsv = fso.createTextFile( filename, overWrite, useUnicode ) | ||
+ | |||
+ | end function ' | ||
+ | |||
+ | |||
+ | function closeCsv( byval fileObject ) | ||
+ | |||
+ | fileObject.close | ||
+ | |||
+ | set fileObject = nothing | ||
+ | |||
+ | end function ' fileObject | ||
+ | |||
+ | function writeHeaderCsv( byval fileObject, byval recordset ) | ||
+ | |||
+ | dim nCont | ||
+ | |||
+ | nCont = 0 | ||
+ | do while nCont < recordset.fields.count | ||
+ | fileObject.write item(recordset.fields(nCont).name) & recordSep() | ||
+ | nCont = nCont + 1 | ||
+ | loop ' nCont | ||
+ | fileObject.write lineSep() | ||
+ | |||
+ | end function ' writeHeaderCsv | ||
+ | |||
+ | ' prints the current line of the | ||
+ | ' recorset as a cvs line | ||
+ | function writeLineCsv( byval fileObject, byval recordset ) | ||
+ | |||
+ | dim nCont | ||
+ | |||
+ | nCont = 0 | ||
+ | do while nCont < recordset.fields.count | ||
+ | fileObject.write item(recordset.fields(nCont).value) & recordSep() | ||
+ | nCont = nCont + 1 | ||
+ | loop ' nCont | ||
+ | fileObject.writeline | ||
+ | |||
+ | end function ' writeLine | ||
+ | |||
+ | function lineSep() | ||
+ | lineSep = vbLf | ||
+ | end function ' lineSetp | ||
+ | |||
+ | ' after trying several alternatives, | ||
+ | ' separator that makes the excel correctly interpret | ||
+ | ' the differents cells is vbTab, so I left as | ||
+ | ' the default separator | ||
+ | function recordSep() | ||
+ | recordSep = vbTab ' ";" | ||
+ | end function ' recordSep | ||
+ | |||
+ | |||
+ | ' escape of items: | ||
+ | ' value es;to must be set as " | ||
+ | ' value es" | ||
+ | function item( byval value ) | ||
+ | |||
+ | if isempty( value ) then | ||
+ | value = "" | ||
+ | end if ' isempty | ||
+ | if isnull( value ) then | ||
+ | value = "" | ||
+ | end if ' isnull | ||
+ | |||
+ | item = value | ||
+ | ' cut items longer than 32768 bytes long | ||
+ | ' as you can imagine " | ||
+ | ' to " | ||
+ | ' to a prudent 30767 char (2000 bytes less) | ||
+ | if len( item ) > 30767 then | ||
+ | item = mid( item, 1, 30767 ) & " | ||
+ | end if ' length( item ) | ||
+ | item = replace( item, """", | ||
+ | if instr( item, vbCr ) > 0 or instr( item, vbLf ) > 0 then | ||
+ | item = """" | ||
+ | end if ' instr( item | ||
+ | if instr( item, recordSep() ) > 0 then | ||
+ | item = """" | ||
+ | end if ' instr( item | ||
+ | end function ' item | ||
+ | |||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | ---- | ||
+ | |||
+ | ~~~DISQUS~~~ | ||
windows/aboutexcelcsvformat.1414573140.txt.gz · Last modified: 2022/12/02 21:02 (external edit)