Jump to content

jjones

Level 1
  • Content Count

    4
  • Joined

  • Last visited

Community Reputation

1 Neutral

About jjones

  1. Kind of odd that my replies to posts are limited, when I have an active issue and responding to multiple posts/suggestions. I too use the text for searching, not for reading. I need to find a short (7 character or less) text string-which relates to an expoded view of an assembly. The exploded view is either 1 page ahead, or sometimes one page behind the text that was searched. My issue is two fold. 1) I have raw scans going into EN, that because of the file rules, or the inability of EN to OCR properly, remain totally unsearchable. So, they are not better than their paper counterparts at this point, except for portability. Mind you, the quality of the print being scanned is ideal and clean and crisp. Black, white and some grey scale. I have even tried making the smaller PDF's into .JPG, just to see if the EN OCR will identify well enough to be workable, again not as expected. Failure of text recognition of .jpg of black on white print text is at least 75% 2) I have digitally created PDF's with a text layer, but those also are not well handled in EN due to it's limitation of being able to pinpoint a search string to a page within the PDF. I really wish I knew the PDF related limitations of EN before investing so much time trying to figure this out. I can (and have) combined a good bit of my digital PDF's into a single PDF. By doing so, I open the "BIG" PDF, enter a search string, and bam...I can bang through each instance right then and there. Unfortunatly, I have a significant amount of non-digital information to digitize, catalog and make somewhat reliably searchable. Thanks for your help and suggestions! JJ
  2. Ok, understood. But here are my questions about your above suggestion/explanation: 1) Is the consistent experience you describe above, found also on the Windows client? I know there are differences between Mac and Windows clients. Which are you basing your experience from? 2) The scanner adds another several hundred dollars, when I already have a fully capable, networked, 100 page duplex ADF scanner. 3) I also have PDF's outputted by various software packages. How is your suggestion (search experience) above different than my current experience if my PDF already has a text index or layer created by the outputting software? 4) I have tried to use the OCR funtions of Acrobat 8 when scanning technical documents, but I end up with all sorts of goofy formatting. The end result of Acrobats OCR is just an unusable mess. Not to mention my documents are a mixture of English and Italian text. I think that throws off the Acrobat OCR. I don't mind spending several hundred dollars to get the right tools together that are needed, but if the end result is still mediocre search results (notes search only gets me to the first page of a PDF), then I just outlayed some dough for nothing. Not something I like to do. Thanks, JJ
  3. Thanks for the suggestion. Unfortunatly, I am dealing with techncial documents, schematics and the like with a mixture of mechanical assembly explosions, pictures, and other resourceful images. Raw text won't work in my case. JJ
  4. I too am quite disappointed since now i've paid for the premium membership, only to find the main feature of interest to me, PDF searching....borders on useless. I feel as if I was misled with the generalization of "searchable PDF's" So I have been able to determine the following facts from my short experience. I hope this helps another user in their decision to invest $45.00 into a SAS that will not meet their needs with "features" that are so generally suggested. 1) There are two types of PDF's, one type is created by scanning paper documents, and the other type is by "printing" or saving a document as a PDF. When scanning a document, and NOT using OCR EN will reject recognition if any of the following is TRUE: The PDF contains more than 100 pages The PDF file is more than 25MB The PDF does not contain at least one "scanned" page, defined as: A "scanned" page contains at least 1025 pixels of image data A "scanned" page contains no more than 512 characters of regular, searchable text (e.g. this is enough for a text-based fax header or similar). PDF files that have already been processed by a separate OCR system will not satisfy this condition and will be rejected. [*]The PDF contains no more than one non-scanned page. (I.e. the doc may have one "cover" page without any image data, but if there's more than one, than it's not a real scan and we reject it.) [*]The analysis crashes or fails for some technical reason, typically due to a malformed PDF from some crazy source, or if the PDF is password protected (encrypted). [*]This analysis process takes more than 30 seconds to complete. Assuming, none of the above is true for a particular document, EN will recognize the text and create a "searchable" (I use that term very loosely) PDF. However, in searching all notes for a particular string of text, EN will show the note which contains this PDF. If this is a multi page PDF, EN will NOT display or further filter to the first instance of the document which contains your search string. Making this particular feature more or less useless. 2)The other type of PDF, mentioned above, is created by some other software from a document. For instance a multi page Word document, saved as a PDF. A PDF of this type has a hidden layer built in that contains an index of all the text in the PDF. When importing this type of PDF into EN, EN seems to index and recognize only some of the text. So again, when searching for a sting of text from your list of notes, EN will pin down the PDF, but as before does not go to the exact page, or highlight the text. The only caveat to this type of file is that a Windows user can press CTRL-F to invoke a search box located at the bottom of the screen. You have to RE-ENTER the search string, then the first instance of the string will be found within the document. More useable, but still a LOT functionality less than I had expected. So in summary, the PDF search feature is far from acceptable in my experience. FYI, I am using a Windows client, and have identical functionailty on the web client, with the exception of the CTRL-F option...for the web client I have to open the PDF and use the search function to re-enter my search string.
×
×
  • Create New...