如何使用fgetc（）计算唯一的单词数，然后在C中打印计数

[英]How to count unique number of words using fgetc() then printing the count in C

本文翻译自 Superman 查看原文 2014/11/23 1486 c/ arrays/ fgetc

I have asked a question related to this program but now after much research and headbutting I am stuck...again.

我已经问了一个与这个程序有关的问题，但现在经过大量的研究和头部停留后，我再次陷入困境。

I am trying to write a program that will take the user input and store it then print out all the unique words and the number of times they each occurred

我正在尝试编写一个程序，它将接受用户输入并存储它然后打印出所有唯一的单词和它们每次发生的次数

for example

例如

Please enter something: Hello#@Hello# hs,,,he,,whywhyto[then the user hits enter] 

hello 2 
hs 1 
he 1 
whywhyto 1

The above should be the output, of course whywhyto isn't a word but it doesn't matter in this case because I am assuming any pattern of letters separated by anything that isn't a letter (spaces, 0-9,#$(@ etc.) is considered a word. I need to use 2D arrays because I am not capable of using linked lists nor do I understand them yet.

上面应该是输出，当然为什么为什么不是一个单词但在这种情况下无关紧要因为我假设任何字母的模式由不是字母的任何东西分隔（空格，0-9，＃$ （@等）被认为是一个单词。我需要使用2D数组，因为我不能使用链表，也不能理解它们。

This is all I have so far

这就是我到目前为止所做的一切

#include <stdio.h> 
#include <ctype.h> 

int main() 
{ 
char array[64]; 

int i=0, j, input; 

printf("Please enter an input:"); 


input=fgetc(stdin); 

while(input != '\n')
{ 
if(isalpha(input)) 
{ 


array[i]=input; 
i++; 
} 

input=fgetc(stdin); 
} 

for(j=0;j<i;j++) 
{ 
// printf("%c ",j,array[j]); 
printf("%c",array[j]); 
} 
printf("\n"); 
}

I am using isalpha to get only letters but all this does is it gets rid of anything that isn't a letter, stores it and then prints back, but I have not a clue on how to get it to store words once for their first occurrence and then just increment a count for each word. I can only use fgetc() which is hard for me at least, I only have about 3-4 months of C experience, I know I will have to use 2 dimensional arrays, have been reading up on them but I have not been able to comprehend how I will implement them please help me out a bit.

我正在使用isalpha来获取字母，但所有这一切都是它摆脱了任何不是字母的东西，存储它然后打印回来，但我还没有一个线索如何让它为他们的第一次存储一次发生，然后只增加每个单词的计数。我只能使用fgetc（），这对我来说至少很难，我只有3-4个月的C经验，我知道我将不得不使用二维数组，已经阅读了它们但我还没有能够要理解我将如何实现它们，请帮助我一点。

3 个解决方案

#1

Here is code that seems to work:

这是似乎工作的代码：

#include <assert.h>
#include <ctype.h>
#include <stdio.h>
#include <string.h>

enum { MAX_WORDS = 64, MAX_WORD_LEN = 20 };

int main(void)
{
    char words[MAX_WORDS][MAX_WORD_LEN];
    int  count[MAX_WORDS] = { 0 };
    int w = 0;
    char word[MAX_WORD_LEN];
    int c;
    int l = 0;

    while ((c = getchar()) != EOF)
    {
        if (isalpha(c))
        {
            if (l < MAX_WORD_LEN - 1)
               word[l++] = c;
            else
            {
                fprintf(stderr, "Word too long: %*s%c...\n", l, word, c);
                break;
            }
        }
        else if (l > 0)
        {
            word[l] = '\0';
            printf("Found word <<%s>>\n", word);
            assert(strlen(word) < MAX_WORD_LEN);
            int found = 0;
            for (int i = 0; i < w; i++)
            {
                if (strcmp(word, words[i]) == 0)
                {
                    count[i]++;
                    found = 1;
                    break;
                }
            }
            if (!found)
            {
                if (w >= MAX_WORDS)
                {
                    fprintf(stderr, "Too many distinct words (%s)\n", word);
                    break;
                }
                strcpy(words[w], word);
                count[w++] = 1;
            }
            l = 0;
        }
    }

    for (int i = 0; i < w; i++)
        printf("%3d: %s\n", count[i], words[i]);

    return 0;
}

Sample output:

样本输出：

$ ./wordfreq <<< "I think, therefore I am, I think, or maybe I do not think after all, and therefore I am not."
Found word <<I>>
Found word <<think>>
Found word <<therefore>>
Found word <<I>>
Found word <<am>>
Found word <<I>>
Found word <<think>>
Found word <<or>>
Found word <<maybe>>
Found word <<I>>
Found word <<do>>
Found word <<not>>
Found word <<think>>
Found word <<after>>
Found word <<all>>
Found word <<and>>
Found word <<therefore>>
Found word <<I>>
Found word <<am>>
Found word <<not>>
  5: I
  3: think
  2: therefore
  2: am
  1: or
  1: maybe
  1: do
  2: not
  1: after
  1: all
  1: and
$ ./wordfreq <<< "I think thereforeIamIthinkormaybeI do not think after all, and therefore I am not."
Found word <<I>>
Found word <<think>>
Word too long: thereforeIamIthinkor...
  1: I
  1: think
$ ./wordfreq <<< "a b c d e f g h i j k l m n o p q r s t u v w x y z
>                 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
>                 aa ab ac ad ae af ag ah ai aj ak al am
>                 an ao ap aq ar as at au av aw ax ay az
>                "
Found word <<a>>
Found word <<b>>
Found word <<c>>
Found word <<d>>
Found word <<e>>
Found word <<f>>
Found word <<g>>
Found word <<h>>
Found word <<i>>
Found word <<j>>
Found word <<k>>
Found word <<l>>
Found word <<m>>
Found word <<n>>
Found word <<o>>
Found word <<p>>
Found word <<q>>
Found word <<r>>
Found word <<s>>
Found word <<t>>
Found word <<u>>
Found word <<v>>
Found word <<w>>
Found word <<x>>
Found word <<y>>
Found word <<z>>
Found word <<A>>
Found word <<B>>
Found word <<C>>
Found word <<D>>
Found word <<E>>
Found word <<F>>
Found word <<G>>
Found word <<H>>
Found word <<I>>
Found word <<J>>
Found word <<K>>
Found word <<L>>
Found word <<M>>
Found word <<N>>
Found word <<O>>
Found word <<P>>
Found word <<Q>>
Found word <<R>>
Found word <<S>>
Found word <<T>>
Found word <<U>>
Found word <<V>>
Found word <<W>>
Found word <<X>>
Found word <<Y>>
Found word <<Z>>
Found word <<aa>>
Found word <<ab>>
Found word <<ac>>
Found word <<ad>>
Found word <<ae>>
Found word <<af>>
Found word <<ag>>
Found word <<ah>>
Found word <<ai>>
Found word <<aj>>
Found word <<ak>>
Found word <<al>>
Found word <<am>>
Too many distinct words (am)
  1: a
  1: b
  1: c
  1: d
  1: e
  1: f
  1: g
  1: h
  1: i
  1: j
  1: k
  1: l
  1: m
  1: n
  1: o
  1: p
  1: q
  1: r
  1: s
  1: t
  1: u
  1: v
  1: w
  1: x
  1: y
  1: z
  1: A
  1: B
  1: C
  1: D
  1: E
  1: F
  1: G
  1: H
  1: I
  1: J
  1: K
  1: L
  1: M
  1: N
  1: O
  1: P
  1: Q
  1: R
  1: S
  1: T
  1: U
  1: V
  1: W
  1: X
  1: Y
  1: Z
  1: aa
  1: ab
  1: ac
  1: ad
  1: ae
  1: af
  1: ag
  1: ah
  1: ai
  1: aj
  1: ak
  1: al
$

The test for 'word too long' and 'too many words' help reassure me that the code is sound. Devising such tests is good practice.

对“单词太长”和“太多单词”的测试有助于让我放心，代码是合理的。设计这样的测试是很好的做法。

#2

Don't know if this is homework or not so I did not do everything for you, I also cleaned up your code a little bit. But pretty much if you don't have a knowledge of how many words the person may input you need a dynamic data structure such as a linkedlist

不知道这是否是功课，所以我没有为你做任何事，我也清理了你的代码。但是，如果你不知道这个人可以输入多少单词，你需要动态数据结构，例如链表

#include <stdio.h>
#include <string.h>
#include <ctype.h> 

typedef struct linkedlist linkedlist;
struct linkedlist{
    char *word;
    int count;
    linkedlist *next;
};

int main() 
{ 
    //know your bounds, this will cause trouble if word is longer than 64 chars
    char array[64]; 
    int i=0, input;
    linkedlist *head = NULL;

    printf("Please enter an input:"); 

    while((input=fgetc(stdin)) != '\n')
    { 
        if(isalpha(input) && i!=63) //added this so that code does not brake (word is 64 chars)
        { 
            array[i]=input; 
        }
        else{
            array[i]='\0';
            char *word = malloc(strlen(array)+1);
            strcpy(word, array);
            add_word(word, &head);
            i=0; //need to restart i to keep reading words
        }

        i++;
    } 

    //print out final results
    for(linkedlist *temp = head; temp != NULL; temp = temp->next){
        printf("%s %d ", temp->word, temp->count);
    }
}

//adds word to end of list if does not exist
//increments word count if it exists
void add_word(char *word, linkedlist **ll){
    //implement this
}

//frees resources used by malloc (lookup how to free a linkedlist/destroy a linked list
//make sure to free both final and head in main
void destroy_list(linkedlist **ll){
    //implement this
}

For add_word you will need something along the lines of (PSEUDO-CODE):

对于add_word，你需要的东西是（PSEUDO-CODE）：

list = *ll
if(list == NULL): //new list
    *ll = malloc(sizeof(linkedlist))
    ll->word = word
    ll->count = 1
    ll->next = NULL
    return

while list->next != null:
    if word = list->word:
        free(word)
        list->count++
        return
    list = list->next

if list->word = word: //last word in list
    free(word)
    list->count++
else: //word did not exist, add new word to end of list
    temp = malloc(sizeof(linkedlist))
    temp->word = word
    temp->count = 1
    list->next = temp

Maybe not the most efficient way but you can improve upon it Hope I did not confuse you further, good luck

也许不是最有效的方式，但你可以改进它希望我没有进一步混淆你，祝你好运

#3

OP still has a far amount of work ahead.

OP仍然有很多工作要做。

This trick is to 1) read input 2) identify delimiters 3) compare words to entire buffer and 4) print them only once.

这个技巧是1）读取输入2）识别分隔符3）将单词与整个缓冲区进行比较，4）仅打印一次。

This approach is memory efficient as it only used the 64 char buffer suggested by OP. The search complexity is O(n*n)

这种方法具有内存效率，因为它只使用OP建议的64字符缓冲区。搜索复杂度为O（n * n）

#include <ctype.h>
#include <stdio.h>
#include <string.h>

// Helper function to find word occurrences.
void Print_count(const char *word, const char *array, int i) {
  int count = 0;
  const char *found;
  for (int j = 0; j < i; j++) {
    if (isalpha((unsigned char ) array[j])) {
      if (strcmp(&array[j], word) == 0) {
        found = &array[j];
        count++;
      }
      // skip rest of word
      do {
        j++;
      } while (isalpha((unsigned char ) array[j]));
    }
  }
  if (found == word) {
    printf("%s %d\n", word, count);
  }
}

int main(void) {
  char array[64];
  int i = 0;
  int j;
  int input;
  printf("Please enter an input:");

  // get the input
  while ((input = fgetc(stdin)) != '\n' && input != EOF) {
    array[i] = input;
    if (i + 1 >= sizeof array)
      break;
    i++;
  }
  array[i] = '\0';

  // change all delimiters to \0
  for (j = 0; j < i; j++) {
    if (!isalpha((unsigned char ) array[j])) {
      array[j] = '\0';
    }
  }


  for (j = 0; j < i; j++) {
    // Use the beginning of each word ... 
    if (isalpha((unsigned char ) array[j])) {
      Print_count(&array[j], array, i);
      // skip test of word
      do {
        j++;
      } while (isalpha((unsigned char ) array[j]));
    }
  }
  return 0;
}

Input Hello#@Hello# hs,,,he,,whywhyto

输入Hello＃@ Hello＃hs ,,,他，为什么要这样做

Output:

输出：

Hello 2
hs 1
he 1
whywhyto 1

智能推荐

注意！

本站翻译的文章，版权归属于本站，未经许可禁止转摘，转摘请注明本文地址：http://www.silva-art.net/blog/2014/11/23/68cab6f2c282aaa34249f4b1e3756ca0.html。

猜您在找

如何使用C计算文本文件中的单词数？ - How do I count the number of words in a text file using C? 如何使用带有R的tm包计算网站中的单词数？ - How to count the number of words in a website using tm package with R? 计算NSString中的单词数 - Count the number of words in NSString 如何计算文本(字符串)中的单词数? - How do I count the number of words in a text (string)? 计数短语中的单词数量匹配 - Count number of words match in phrase

赞助商链接