Linux, FreeBSD, Juniper, Cisco / Network security articles and troubleshooting guides

FAQ
It is currently Sun Aug 20, 2017 4:43 am


Tips & Tricks, Questions regarding shell scripts, awk, perl, sed and much more.

Author Message
LaR3
Post  Post subject: How to get the word frequency in a text  |  Posted: Wed Aug 05, 2009 7:01 am

Joined: Mon Aug 03, 2009 4:55 pm
Posts: 19

Offline
 

How to get the word frequency in a text

In my previous post http://forum.ivorde.ro/tr-how-to-convert-a-text-into-a-list-of-words-one-per-line-t17.html I explained how to convert a text into a list of words, with one word per line.

This is how to get the word frequency in a text:
Code:
# cat test.file
FreeBSD 7.2-RELEASE is now available for the amd64, i386, ia64, pc98, powerpc, and sparc64 architectures.

FreeBSD 7.2 can be installed from bootable ISO images or over the network; the required files can be downloaded via FTP or BitTorrent as described in the sections below. While some of the smaller FTP mirrors may not carry all architectures, they will all generally contain the more common ones, such as i386 and amd64.

# cat test.file | tr -d '[:punct:]' | tr ' ' '\n' | tr 'A-Z' 'a-z' | sort | uniq -c | sort -rn
   6 the
   2 or
   2 i386
   2 ftp
   2 freebsd
   2 can
   2 be
   2 as
   2 architectures
   2 and
   2 amd64
   2 all
   1 will
   1 while
   1 via
   1 they
   1 such
   1 sparc64
   1 some
   1 smaller
   1 sections
   1 required
   1 powerpc
   1 pc98
   1 over
   1 ones
   1 of
   1 now
   1 not
   1 network
   1 more
   1 mirrors
   1 may
   1 iso
   1 is
   1 installed
   1 in
   1 images
   1 ia64
   1 generally
   1 from
   1 for
   1 files
   1 downloaded
   1 described
   1 contain
   1 common
   1 carry
   1 bootable
   1 bittorrent
   1 below
   1 available
   1 72release
   1 72


What I did was to convert all upper case letters to lowercase (because I don't need duplicate words because of one or more letters in different case) and then sort all the words and sent the output to uniq (-c Precede each output line with the count of the number of time the line occurred in the input, followed by a single space.). Then I sorted all the output numerically in descending order.

_________________
Humble user
http://www.ivorde.ro





Top
Display posts from previous:  Sort by  
E-mail friendPrint view

Topics related to - "How to get the word frequency in a text"
 Topics   Author   Replies   Views   Last post 
There are no new unread posts for this topic. How to convert pdf to text in FreeBSD

mandrei99

1

1086

Thu Jan 08, 2015 10:08 am

mandrei99 View the latest post

There are no new unread posts for this topic. How to convert a text into a list of words, one per line

LaR3

0

4427

Wed Aug 05, 2009 7:02 am

LaR3 View the latest post

There are no new unread posts for this topic. grep match pattern: Binary file bincharacters.txt matches - How to make grep treat a file as text

mandrei99

0

642

Wed Dec 11, 2013 7:15 am

mandrei99 View the latest post

 

Who is online
Users browsing this forum: No registered users and 0 guests
You can post new topics in this forum
You can reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum
Jump to:  
cronNews News Site map Site map SitemapIndex SitemapIndex RSS Feed RSS Feed Channel list Channel list


Delete all board cookies | The team | All times are UTC - 5 hours [ DST ]



phpBB SEO