Delphi 如何读取文本文件中的最后一行  
官方Delphi 学习QQ群: 682628230(三千人)
频道

Delphi 如何读取文本文件中的最后一行


如何使用 Delphi 读取文本文件中的最后一行


我需要读取一些非常大的文本文件中的最后一行(以从数据中获取时间戳)。TStringlist 是一种简单的方法,但它返回内存不足错误。我正在尝试使用seek和blockread,但是缓冲区中的字符都是无意义的。这与unicode有关吗?

   Function TForm1.ReadLastLine2(FileName: String): String;
   var
     FileHandle: File;
     s,line: string;
     ok: 0..1;
     Buf: array[1..8] of Char;
     k: longword;
     i,ReadCount: integer;
   begin
     AssignFile (FileHandle,FileName);
     Reset (FileHandle);           // or for binary files: Reset (FileHandle,1);
     ok := 0;
     k := FileSize (FileHandle);
     Seek (FileHandle, k-1);
     s := '';
     while ok<>1 do begin
       BlockRead (FileHandle, buf, SizeOf(Buf)-1, ReadCount);  //BlockRead ( var FileHandle : File; var Buffer; RecordCount : Integer {; var RecordsRead : Integer} ) ;
       if ord (buf[1]) <>13 then         //Arg to integer
         s := s + buf[1]
       else
         ok := ok + 1;
       k := k-1;
       seek (FileHandle,k);
     end;
     CloseFile (FileHandle);

     // Reverse the order in the line read
     setlength (line,length(s));
     for i:=1 to length(s) do
       line[length(s) - i+1 ] := s[i];
     Result := Line;
   end;

基于 www.delphipages.com/forum/showthread.php?t=102965

测试文件是我在 excel 中创建的一个简单的 CSV(这不是我最终需要读取的 100MB)。

   a,b,c,d,e,f,g,h,i,j,blank
   A,B,C,D,E,F,G,H,I,J,blank
   1,2,3,4,5,6,7,8,9,0,blank
   Mary,had,a,little,lamb,His,fleece,was,white,as,snow
   And,everywhere,that,Mary,went,The,lamb,was,sure,to,go



你真的必须从尾部到头部以大块读取文件。由于它太大,它不适合内存 - 那么从头到尾逐行读取它会非常慢。随着ReadLn- 慢两倍。

您还必须准备好最后一行可能以 EOL 结尾,也可能不以 EOL 结尾。

我个人也会考虑三种可能的 EOL 序列:

CR/LF aka #13#10=^M^J - DOS/Windows 风格

没有 LF 的 CR - 只是 #13=^M - 经典 MacOS 文件

没有 CR 的 LF - 只是 #10=^J - UNIX 风格,包括 MacOS 版本 10

如果您确定您的 CSV 文件只会由本机 Windows 程序生成,那么假设使用完整的 CR/LF 是安全的。但是如果可以有其他 Java 程序、非 Windows 平台、移动程序 - 我就不太确定了。当然,没有 LF 的纯 CR 将是最不可能的情况。


uses System.IOUtils, System.Math, System.Classes;

type FileChar = AnsiChar; FileString = AnsiString; // for non-Unicode files
// type FileChar = WideChar; FileString = UnicodeString;// for UTF16 and UCS-2 files
const FileCharSize = SizeOf(FileChar);
// somewhere later in the code add: Assert(FileCharSize = SizeOf(FileString[1]);

function ReadLastLine(const FileName: String): FileString; overload; forward;

const PageSize = 4*1024;
// the minimal read atom of most modern HDD and the memory allocation atom of Win32
// since the chances your file would have lines longer than 4Kb are very small - I would not increase it to several atoms.

function ReadLastLine(const Lines: TStringDynArray): FileString; overload;
var i: integer;
begin
 Result := '';
 i := High(Lines);
 if i < Low(Lines) then exit; // empty array - empty file

 Result := Lines[i];
 if Result > '' then exit; // we got the line

 Dec(i); // skip the empty ghost line, in case last line was CRLF-terminated
 if i < Low(Lines) then exit; // that ghost was the only line in the empty file
 Result := Lines[i];
end;

// scan for EOLs in not-yet-scanned part
function FindLastLine(buffer: TArray<FileChar>; const OldRead : Integer;
    const LastChunk: Boolean; out Line: FileString): boolean;
var i, tailCRLF: integer; c: FileChar;
begin
 Result := False;
 if Length(Buffer) = 0 then exit;

 i := High(Buffer);
 tailCRLF := 0; // test for trailing CR/LF
 if Buffer[i] = ^J then begin // LF - single, or after CR
    Dec(i);
    Inc(tailCRLF);
 end;
 if (i >= Low(Buffer)) and (Buffer[i] = ^M) then begin // CR, alone or before LF
    Inc(tailCRLF);
 end;

 i := High(Buffer) - Max(OldRead, tailCRLF);
 if i - Low(Buffer) < 0 then exit; // no new data to read - results would be like before

 if OldRead > 0 then Inc(i); // the CR/LF pair could be sliced between new and previous buffer - so need to start a bit earlier

 for i := i downto Low(Buffer) do begin
     c := Buffer[i];
     if (c=^J) or (c=^M) then begin // found EOL
        SetString( Line, @Buffer[i+1], High(Buffer) - tailCRLF - i);
        exit(True);
     end;
 end;  

 // we did not find non-terminating EOL in the buffer (except maybe trailing),
 // now we should ask for more file content, if there is still left any
 // or take the entire file (without trailing EOL if any)

 if LastChunk then begin
    SetString( Line, @Buffer[ Low(Buffer) ], Length(Buffer) - tailCRLF);
    Result := true;
 end;
end;

function ReadLastLine(const FileName: String): FileString; overload;
var Buffer, tmp: TArray<FileChar>;
   // dynamic arrays - eases memory management and protect from stack corruption
   FS: TFileStream; FSize, NewPos: Int64;
   OldRead, NewLen : Integer; EndOfFile: boolean;
begin
 Result := '';
 FS := TFile.OpenRead(FileName);
 try
   FSize := FS.Size;
   if FSize <= PageSize then begin // small file, we can be lazy!
      FreeAndNil(FS);  // free the handle and avoid double-free in finally
      Result := ReadLastLine( TFile.ReadAllLines( FileName, TEncoding.ANSI ));
         // or TEncoding.UTF16
         // warning - TFIle is not share-aware, if the file is being written to by another app
      exit;
   end;

   SetLength( Buffer, PageSize div FileCharSize);
   OldRead := 0;
   repeat
     NewPos := FSize - Length(Buffer)*FileCharSize;
     EndOfFile := NewPos <= 0;
     if NewPos < 0 then NewPos := 0;
     FS.Position := NewPos;

     FS.ReadBuffer( Buffer[Low(Buffer)], (Length(Buffer) - OldRead)*FileCharSize);

     if FindLastLine(Buffer, OldRead, EndOfFile, Result) then
        exit; // done !

     tmp := Buffer; Buffer := nil; // flip-flop: preparing to broaden our mouth

     OldRead := Length(tmp); // need not to re-scan the tail again and again when expanding our scanning range
     NewLen := Min( 2*Length(tmp), FSize div FileCharSize );

     SetLength(Buffer, NewLen); // this may trigger EOutOfMemory...
     Move( tmp[Low(tmp)], Buffer[High(Buffer)-OldRead+1], OldRead*FileCharSize);
     tmp := nil; // free old buffer
   until EndOfFile;
 finally
   FS.Free;
 end;
end;

附注。请注意一个额外的特殊情况 - 如果您将使用 Unicode 字符(两个字节的字符)并提供奇数长度的文件(3 个字节、5 个字节等)-您将永远无法扫描起始单字节(半宽字符) )。也许你应该在那里添加额外的守卫,比如Assert( 0 = FS.Size mod FileCharSize)

聚苯乙烯。根据经验,您最好将这些函数保留在表单类之外,因为为什么要混合它们?一般来说,您应该将关注点分成小块。读取文件与用户交互无关 - 因此最好将其卸载到额外的单元。然后,您将能够在主线程或多线程应用程序中以一种或 10 种形式使用该单元中的函数。就像乐高零件一样 - 它们小巧而独立,为您提供灵活性。

购买力平价。这里的另一种方法是使用内存映射文件。Google for MMF implementations for Delphi 和关于 MMF 方法的好处和问题的文章。我个人认为重写上面的代码使用MMF会大大简化它,去除几个“特殊情况”和麻烦的内存复制触发器。OTOH 它会要求你对指针算术非常严格。

https://en.wikipedia.org/wiki/Memory-mapped_file

https://msdn.microsoft.com/en-us/library/ms810613.aspx

http://torry.net/quicksearchd.php?String=memory+map&Title=No



推荐分享
图文皆来源于网络,内容仅做公益性分享,版权归原作者所有,如有侵权请告知删除!
 

Copyright © 2014 DelphiW.com 开发 源码 文档 技巧 All Rights Reserved
晋ICP备14006235号-8 晋公网安备 14108102000087号

执行时间: 0.063524961471558 seconds