¿Cómo puedo convertir un archivo .TXT en un archivo .CSV sin perder ninguno de los contenidos de ese documento?

0

He estado intentando durante unos 4 días completar la aparentemente simple actividad de convertir un volcado de texto del llavero OS X en un archivo .csv con el formato correcto sin progreso, y mucho menos éxito. Publiqué una pregunta diferente sobre mi situación ayer, que se ha ido sin comentarios o sugerencia.

Simplemente estoy intentando convertir un archivo .txt en un archivo .csv formateado, sin perder ninguno de los contenidos del documento original en el proceso.

Lo que ya he probado:

(1) RWest's Gist y Ben-Oni y christophervalles variaciones de los mismos. El ingrediente principal de este script es un archivo keychain.rb y creo que esto funcionaría si solo supiera cómo agregar campos de entrada para que coincidan con todos los campos del documento original. Sin embargo, no conozco la fórmula y no he podido encontrar una plantilla. Como tal, la ejecución de este script da como resultado cientos de entradas faltantes porque las entradas carecen del campo de salida correspondiente.

#!/usr/bin/env ruby                                                                                                                                                                                                          
#                                                                                                                                                                                                                            
# Usage:                                                                                                                                                                                                                     
#   security dump-keychain -d login.keychain > keychain_logins.txt                                                                                                                                                           
#   # Lots of clicking 'Always Allow', or just 'Allow', until it's done...                                                                                                                                                   
#   curl -O curl -O https://raw.github.com/gist/1224792/06fff24412311714ad6534ab700a7d603c0a56c9/keychain.rb
#   chmod a+x ./keychain.rb                                                                                                                                                                                                  
#   ./keychain.rb keychain_logins.txt | sort > logins.csv                                                                                                                                                                    
#                                                                                                                                                                                                                            
# Then import logins.csv in 1Password using the format:                                                                                                                                                                      
# Title, URL/Location, Username, Password                                                                                                                                                                                    
# Remember to check 'Fields are quoted', and the Delimiter character of 'Comma'.                                                                                                                                             
require 'date'

class KeychainEntry
  attr_accessor :fields

  def initialize(keychain)
    last_key = nil
    @fields = {}
    data = nil
    aggregate = nil
    lines = keychain.split("\n")
    lines.each do |line|
      # Everything after the 'data:' statement is data.

      if data != nil
        data << line
      elsif aggregate != nil
        if ( line[0] == 32 || line[0] == " " )
          keyvalue = line.split('=', 2).collect { |kv| kv.strip }
          aggregate[keyvalue.first] = keyvalue.last
        else
          @fields[last_key] = aggregate
          aggregate = nil
        end
      end

      if aggregate == nil
        parts = line.split(':').collect { |piece| piece.strip }
        if parts.length > 1
          @fields[parts.first] = parts.last
        else
          last_key = parts.first
          data = [] if parts.first == "data"
          aggregate = {}
        end
      end
    end
    @fields["data"] = data.join(" ") if data
  end
end

def q(string)
  "\"#{string}\""
end

def process_entry(entry_string)
  entry = KeychainEntry.new(entry_string)


  if entry.fields['class'] == '"genp"'
    title = entry.fields['attributes']['0x00000007 <blob>'].gsub!('"', '')
    user = entry.fields['attributes']['"acct"<blob>'].gsub!('"', '')
    location = entry.fields['attributes']['"svce"<blob>'].gsub!('"', '')
    pass = entry.fields['data'][1..-2]
    puts "\"#{title}\"\t\"#{location}\"\t\"#{user}\"\t\"#{pass}\""
  end

  if entry.fields['class'] == '"inet"' && ['"form"', '"dflt"'].include?(entry.fields['attributes']['"atyp"<blob>'])
    site = entry.fields['attributes']['"srvr"<blob>'].gsub!('"', '')
    path = entry.fields['attributes']['"path"<blob>'].gsub!('"', '')
    proto= entry.fields['attributes']['"ptcl"<uint32>'].gsub!('"', '')
    proto.gsub!('htps', 'https');
    user = entry.fields['attributes']['"acct"<blob>'].gsub!('"', '')
    #user = entry.fields['attributes']['0x00000007 <blob>'].gsub!('"', '')
    date_string = entry.fields['attributes']['"mdat"<timedate>'].gsub(/0x[^ ]+[ ]+/, '').gsub!('"', '')
    date = DateTime.parse(date_string)
    pass = entry.fields['data'][1..-2]
    path = '' if path == '<NULL>'
    url = "#{proto}://#{site}#{path}"

    puts "#{site}\t#{url}\t#{user}\t#{pass}\t#{date}"
    #puts "#{user}\t #{pass}\t #{date}"
  end
end

accum = ''
ARGF.each_line do |line|
  if line =~ /^keychain: /
    unless accum.empty?
      process_entry(accum)
      accum = ''
    end
  end
  accum += line
end

(2) RMondello's Gist , que funcionó para mí una vez, aunque faltaban entradas, pero No he podido volver a trabajar en absoluto sin una razón explicable.

set the logFile to ((path to desktop) as string) & "Passwords"
set keychainPath to "/Users/Dad/Desktop/dad.keychain"

-- write_to_file taken from http://www.macosxautomation.com/applescript/sbrt/sbrt-09.html
on write_to_file(this_data, target_file, append_data)
    try
        set the target_file to the target_file as string
        set the open_target_file to open for access file target_file with write permission
        if append_data is false then set eof of the open_target_file to 0
        write this_data to the open_target_file starting at eof
        close access the open_target_file
        return true
    on error
        try
            close access file target_file
        end try
        return false
    end try
end write_to_file

tell application "Usable Keychain Scripting"
    set keychainItems to get every keychain item of keychain keychainPath
    repeat with keychainItem in keychainItems
        set aServer to server in keychainItem
        set anAccount to account in keychainItem
        set aPassword to password in keychainItem

        set csvEntry to aServer & "," & anAccount & "," & aPassword & "
"

        my write_to_file(csvEntry, logFile, true)
    end repeat
end tell

(3) El convertidor de llavero de AgileBits OS X a CSV , lo que ha resultado en archivos .csv con más entradas faltantes de lo que Gist de RWest hizo.

# OS X Keychain text export converter
#
# Copyright 2014 Mike Cappella ([email protected])

package Converters::Keychain 1.02;

our @ISA    = qw(Exporter);
our @EXPORT     = qw(do_init do_import do_export);
our @EXPORT_OK  = qw();

use v5.14;
use utf8;
use strict;
use warnings;
#use diagnostics;

binmode STDOUT, ":utf8";
binmode STDERR, ":utf8";

use Encode;
use Utils::PIF;
use Utils::Utils qw(verbose debug bail pluralize myjoin unfold_and_chop print_record);
use Time::Local qw(timelocal);
use Time::Piece;

my $max_password_length = 50;

my %card_field_specs = (
    login =>            { textname => undef, fields => [
    [ 'username',       0, qr/^username$/ ],
    [ 'password',       0, qr/^password$/ ],
    [ 'url',        0, qr/^url$/ ],
    ]},
    note =>         { textname => undef, fields => [
    ]},
);

my (%entry, $itype);

# The following table drives transformations or actions for an entry's attributes, or the class or
# data section (all are collected into a single hash).  Each ruleset is evaluated in order, as are
# each of the rules within a set.  The key 'c' points to a code reference, which is passed the data
# value for the given type being tested.  It can transform the value in place, or simply test it and
# return a string (for debug output).  When the key 'action' is set to 'SKIP', the entry being tested
# will be rejected from consideration for export when the 'c' code reference returns a TRUE value.
# And in that case, the code ref pointed to by 'msg' will be run to produce debug output, used to
# indicate the reason for the rejection.
#
# The table facilitates adding new transformations and rejection rules, as necessary,
# through empirical discover based on user feedback.
my @rules = (
    CLASS => [
        { c => sub { $_[0] !~ /^inet|genp$/ }, action => 'SKIP', msg => sub { debug "\tskipping non-password class: ", $_[0] } },
    ],
    svce => [
        { c => sub { $_[0] =~ s/^0x([A-F\d]+)\s+".*"$/pack "H*", $1/ge } },
        { c => sub { $_[0] =~ s/^"(.*)"$/$1/ } },
        { c => sub { $_[0] =~ /^Apple Persistent State Encryption$/ or 
                 $_[0] =~ /^Preview Signature Privacy$/ or
                 $_[0] =~ /^Safari Session State Key$/ or
                 $_[0] =~ /^Call History User Data Key$/}, action => 'SKIP',
            msg => sub { debug "\t\tskipping non-password record: $entry{'CLASS'}: ", $_[0] } },
    ],
    srvr => [
        { c => sub { $_[0] =~ s/^"(.*)"$/$1/ } },
        { c => sub { $_[0] =~ s/\.((?:_afpovertcp|_smb)\._tcp\.)?local// } },
    ],
    path => [
        { c => sub { $_[0] =~ s/^"(.*)"$/$1/ } },
        { c => sub { $_[0] =~ s/^<NULL>$// } },
    ],
    ptcl => [
        { c => sub { $_[0] =~ s/htps/https/ } },
        { c => sub { $_[0] =~ s/^"(\S+)\s*"$/$1/ } },
    ],
    acct => [
        { c => sub { $_[0] =~ s/^0x([A-F\d]+)\s+".*"$/pack "H*", $1/ge } },
        { c => sub { $_[0] =~ s/^"(.*)"$/$1/ } },
    ],
    mdat => [
        { c => sub { $_[0] =~ s/^0x\S+\s+"(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})Z.+"$/$1-$2-$3 $4:$5:$6/g } },
    ],
    cdat => [
        { c => sub { $_[0] =~ s/^0x\S+\s+"(\d{4})(\d{2})(\d{2})(\d{2})(\d{2})(\d{2})Z.+"$/$1-$2-$3 $4:$5:$6/g } },
    ],
    # desc must come before DATA so that 'secure note' type can be used as a condition in DATA below
    desc => [
        { c => sub { $_[0] =~ s/^"(.*)"$/$1/; $itype = 'note' if $_[0] eq 'secure note'; $_[0] } },
    ],
    DATA => [
        # secure note data, early terminates rule list testing
        { c => sub { $itype eq 'note' and $_[0] =~ s/^.*<key>NOTE<\/key>\012\011<string>(.+?)<\/string>.*$/$1/ }, action => 'BREAK',
            msg => sub { debug "\t\tskipping non-password record: $entry{'CLASS'}: ", $entry{'svce'} // $entry{'srvr'} } },

        { c => sub { $_[0] !~ s/^"(.+)"$/$1/ }, action => 'SKIP',
            msg => sub { debug "\t\tskipping non-password record: $entry{'CLASS'}: ", $entry{'svce'} // $entry{'srvr'} } },
        { c => sub { $_[0] =~ /^[A-Z\d]{8}-[A-Z\d]{4}-[A-Z\d]{4}-[A-Z\d]{4}-[A-Z\d]{12}$/ }, action => 'SKIP',
            msg => sub { debug "\t\tskipping non-password record: $entry{'CLASS'}: ", $entry{'svce'} // $entry{'srvr'} } },
        { c => sub { length $_[0] > $max_password_length }, action => 'SKIP',
            msg => sub { debug "\t\tskipping record with improbably long password: $entry{'CLASS'}: ", $entry{'svce'} // $entry{'srvr'} } },
        { c => sub { join '', "\trecord: class = $entry{'CLASS'}: ", $entry{'svce'} // $entry{'srvr'} } },  # debug output only
    ],
);

$DB::single = 1;                    # triggers breakpoint when debugging

sub do_init {
    return {
    'specs'     => \%card_field_specs,
    'imptypes'      => undef,
    'opts'      => [ [ q{-m or --modified           # set item's last modified date },
                   'modified|m' ],
               ],
    };
}

sub do_import {
    my ($file, $imptypes) = @_;
    my (%Cards, %dup_check);
    my $contents = $_;;

    {
    local $/;
    open my $fh, '<:encoding(utf8)', $file or bail "Unable to open file: $file\n$!";
    $contents = <$fh>;
    close $fh;
    }

    my ($n, $examined, $skipped, $duplicates) = (1, 0, 0, 0);
    my ($npre_explode, $npost_explode);

KEYCHAIN_ENTRY:
    while ($contents) {
    if ($contents =~ s/\Akeychain: (.*?)\n+(?=$|^keychain: ")//ms) {
        local $_ = $1; my $orig = $1;
        $itype = 'login';

        $examined++;
        debug "Entry ", $examined;

        s/\A"(.*?)"\n^(.+)/$2/ms;
        my $keychain = $1;
        #debug 'Keychain: ', $keychain;

        s/\Aclass: "?(.*?)"? ?\n//ms;
        my $class = $1;

        # attributes
        s/\Aattributes:\n(.*?)(?=^data:)//ms;
        %entry = map { clean_attr_name(split /=/, $_) } split /\n\s*/, $1 =~ s/^\s+//r;

        $entry{'CLASS'} = $class;

        # data
        s/\Adata:\n(.+)\z//ms;
        $entry{'DATA'}  = defined $1 ? $1 : '';

        # run the rules in the rule set above
        # for each set of rules for an entry key...
RULE:
        for (my $i = 0;  $i < @rules; $i += 2) {
        my ($key, $ruleset) = ($rules[$i], $rules[$i + 1]);

        debug "  considering rules for ", $key;
        next if not exists $entry{$key};

        # run the entry key's rules...
        my $rulenum = 1;
        for my $rule (@$ruleset) {
            debug "\t    rule $rulenum: called with ", unfold_and_chop $entry{$key};

            my $ret = ($rule->{'c'})->($entry{$key});

            debug "\t    rule $rulenum: returns ", $ret || 0, '   ', unfold_and_chop $entry{$key};

            if (exists $rule->{'action'}) {
            if ($ret) {
                if ($rule->{'action'} eq 'SKIP') {
                $skipped++;
                ($rule->{'msg'})->($entry{$key})        if exists $rule->{'msg'};
                next KEYCHAIN_ENTRY;
                }
                elsif ($rule->{'action'} eq 'BREAK') {
                debug "\t    breaking out of rule chain";
                next RULE;
                }
            }
            }

            $rulenum++;
        }
        }

        for (keys %entry) {
        debug sprintf "\t    %-12s : %s", $_, $entry{$_}    if exists $entry{$_};
        }

        #my $itype = find_card_type(\%entry);

        my %h;
        my ($notes, $card_modified);
        if ($itype eq 'login') {
        $h{'password'}  = $entry{'DATA'};
        $h{'username'}  = $entry{'acct'}                        if exists $entry{'acct'};
        $h{'url'}   = $entry{'ptcl'} . '://' . $entry{'srvr'} . $entry{'path'}  if exists $entry{'srvr'};
        }
        elsif ($itype eq 'note') {
        # convert ascii string DATA, which contains \### octal escapes, into UTF-8
        my $octets = encode("ascii", $entry{'DATA'});
        $octets =~ s/\(\d{3})/"qq|\$1|"/eeg;
        $notes = decode("UTF-8", $octets);
        }
        else {
        die "Unexpected itype: $itype";
        }

        # will be added to notes
        $h{'protocol'}  = $entry{'ptcl'}                    if exists $entry{'ptcl'} and $entry{'ptcl'} =~ /^afp|smb$/;
        $h{'created'}   = $entry{'cdat'}                    if exists $entry{'cdat'};
        if (exists $entry{'mdat'}) {
        if ($main::opts{'modified'}) {
            $card_modified = date2epoch($entry{'mdat'});
        }
        else {
            $h{'modified'}  = $entry{'mdat'};
        }
        }

        for (keys %h) {
        debug sprintf "\t    %-12s : %s", $_, $h{$_}                if exists $h{$_};
        }

        # don't set/use $sv before $entry{''svce'} is removed of _afp*, _smb*, and .local, since it defeats dup detection
        my $sv = $entry{'svce'} // $entry{'srvr'};

        my $s = join ':::', 'sv', $sv,
        map { exists $h{$_} ? "$_ => $h{$_}" : 'URL => none' } qw/url username password/;

        if (exists $dup_check{$s}) {
        debug "  *skipping duplicate entry for ", $sv;
        $duplicates++;
        next
        }
        $dup_check{$s}++;


        # From the card input, place it in the converter-normal format.
        # The card input will have matched fields removed, leaving only unmatched input to be processed later.
        my $normalized = normalize_card_data($itype, \%h, 
        { title     => $sv,
          tags      => undef,
          notes     => $notes,
          folder    => undef,
          modified  => $card_modified });

        # Returns list of 1 or more card/type hashes; one input card may explode into multiple output cards
        my $cardlist = explode_normalized($itype, $normalized);

        my @k = keys %$cardlist;
        if (@k > 1) {
        $npre_explode++; $npost_explode += @k;
        debug "\tcard type $itype expanded into ", scalar @k, " cards of type @k"
        }
        for (@k) {
        print_record($cardlist->{$_});
        push @{$Cards{$_}}, $cardlist->{$_};
        }
        $n++;
    }
    else {
        bail "Keychain parse failed, after entry $examined; unexpected: ", substr $contents, 0, 2000;
    }
    }

    $n--;
    verbose "Examined $examined record", pluralize($examined);
    verbose "Skipped $skipped non-login record", pluralize($skipped);
    verbose "Skipped $duplicates duplicate record", pluralize($duplicates);

    verbose "Imported $n record", pluralize($n) ,
    $npre_explode ? " ($npre_explode card" . pluralize($npre_explode) .  " expanded to $npost_explode cards)" : "";
    return \%Cards;
}

sub do_export {
    create_pif_file(@_);
}

sub find_card_type {
    my $eref = shift;

    my $type = (exists $eref->{'desc'} and $eref->{'desc'} eq 'secure note') ? 'note' : 'login';
    debug "\t\ttype set to '$type'";
    return $type;
}

# Places card data into a normalized internal form.
#
# Basic card data passed as $norm_cards hash ref:
#    title
#    notes
#    tags
#    folder
#    modified
# Per-field data hash {
#    inkey  => imported field name
#    value  => field value after callback processing
#    valueorig  => original field value
#    outkey => exported field name
#    outtype    => field's output type (may be different than card's output type)
#    keep   => keep inkey:valueorig pair can be placed in notes
#    to_title   => append title with a value from the narmalized card
# }
sub normalize_card_data {
    my ($type, $carddata, $norm_cards) = @_;

    for my $def (@{$card_field_specs{$type}{'fields'}}) {
    my $h = {};
    for my $key (keys %$carddata) {
        if ($key =~ /$def->[2]/) {
        next if not defined $carddata->{$key} or $carddata->{$key} eq '';
        my ($inkey, $value) = ($key, $carddata->{$key});
        my $origvalue = $value;

        if (exists $def->[3] and exists $def->[3]{'func'}) {
            #         callback(value, outkey)
            my $ret = ($def->[3]{'func'})->($value, $def->[0]);
            $value = $ret   if defined $ret;
        }
        $h->{'inkey'}       = $inkey;
        $h->{'value'}       = $value;
        $h->{'valueorig'}   = $origvalue;
        $h->{'outkey'}      = $def->[0];
        $h->{'outtype'}     = $def->[3]{'type_out'} || $card_field_specs{$type}{'type_out'} || $type; 
        $h->{'keep'}        = $def->[3]{'keep'} // 0;
        $h->{'to_title'}    = ' - ' . $h->{$def->[3]{'to_title'}}   if $def->[3]{'to_title'};
        push @{$norm_cards->{'fields'}}, $h;
        delete $carddata->{$key};
        }
    }
    }

    # map remaining keys to notes
    $norm_cards->{'notes'} .= "\n"        if defined $norm_cards->{'notes'} and length $norm_cards->{'notes'} > 0 and keys %$carddata;
    for my $key (keys %$carddata) {
    $norm_cards->{'notes'} .= "\n"  if defined $norm_cards->{'notes'} and length $norm_cards->{'notes'} > 0;
    $norm_cards->{'notes'} .= join ': ', $key, $carddata->{$key};
    }

    return $norm_cards;
}

# sort logins as the last to check
sub by_test_order {
    return  1 if $a eq 'login';
    return -1 if $b eq 'login';
    $a cmp $b;
}

sub clean_attr_name {
    return ($_[0] =~ /"?([^<"]+)"?<\w+>$/, $_[1]);
}

# Date converters
# LastModificationTime field:    yyyy-mm-dd hh:mm:ss
sub parse_date_string {
    local $_ = $_[0];
    my $when = $_[1] || 0;                  # -1 = past only, 0 = assume this century, 1 = future only, 2 = 50-yr moving window

    if (my $t = Time::Piece->strptime($_, "%Y-%m-%d %H:%M:%S")) {
    return $t;
    }

    return undef;
}

sub date2epoch {
    my $t = parse_date_string @_;
    return defined $t->year ? 0 + timelocal($t->sec, $t->minute, $t->hour, $t->mday, $t->mon - 1, $t->year): $_[0];
}

1;

(4) Swiss File Knife parecía prometedor ya que tiene un comando incorporado para convertir el texto separado por tabulaciones en coma texto separado. Sin embargo, ejecutar el comando dio lugar a docenas y docenas y docenas de errores de "formato incorrecto". Supongo que mi archivo .txt debería ser un archivo .tsv para que el comando realmente funcione y no he podido encontrar ninguna forma de convertir un .txt en un .tsv.

Entonces, aquí es donde estoy ahora, días más tarde, en el mismo lugar donde empecé. ¿Puede alguien, por favor, decirme cómo modificar los campos de entrada del primer guión de RWest para que todo el contenido pase al .csv o sugerir un método alternativo, guión, aplicación, baile de lluvia, etc. que ¿Me permitirá simplemente que todos los datos del archivo de texto aparezcan en un archivo csv, con las columnas y filas adecuadas que separan cada entrada?

EDIT: esto es lo que obtienes cuando solo cambias la extensión del archivo:

    
pregunta Kerlix 30.06.2015 - 22:51

1 respuesta

2

Simplemente estoy tratando de convertir un archivo .txt a .csv, sin perder el contenido de ese documento

Puedes simplemente cambiar la extensión .txt en .csv .

Los archivos de 'valores separados por comas' son solo archivos de texto plano con las columnas de datos separadas por comas (o ; o tab ). Los archivos 'txt' están diseñados para texto y los archivos 'csv' para el almacenamiento de datos, pero ambos son solo archivos de texto.

Agregue: Los datos parecen organizados como un tipo de diccionario (o como una matriz multidimensional). Un diccionario no se puede convertir a una estructura de marco de datos 2D (como csv o excel ) ya que puede tener más dimensiones que 2. Lo que desea no es posible. Sin embargo, reemplazar csv con txt lo hace legible ...

Este texto es ilegible: @1?舋H????A?!A?̃?-ƅm??I?؅?uҸ?

    
respondido por el CousinCocaine 30.06.2015 - 23:00

Lea otras preguntas en las etiquetas