System of Information Feedback on Archive Using Term Frequency-Inverse Document Frequency and Vector Space Model Methods

Didit Suhartono, Khodirun Khodirun

Abstract


The archive is one of the examples of documents that important. Archives are stored systematically with a view to helping and simplifying the storage and retrieval of the archive. In the information retrieval (Information retrieval) the process of retrieving relevant documents and not retrieving documents that are not relevant. To retrieve the relevant documents, a method is needed. Using the Term Frequency-Inverse Document and Vector Space Model methods can find relevant documents according to the level of closeness or similarity, in addition to applying the Nazief-Adriani stemming algorithm can improve information retrieval performance by transforming words in a document or text to the basic word form. then the system indexes the document to simplify and speed up the search process. Relevance is determined by calculating the similarity values between existing documents by querying and represented in certain forms. The documents obtained, then the system sort by the level of relevance to the query.


Keywords


Archive; Information retrieval; TF-IDF; Vector space model.

Full Text:

PDF

References


G. Eason, B. Noble, and I.N. Sneddon, “On certain integrals of Lipschitz-Hankel type involving products of Bessel functions,” Phil. Trans. Roy. Soc. London, vol. A247, pp. 529-551, April 1955.

S.Tan, Neighbor-weighted K-nearest neighbor for unbalanced text corpus, Expert Systems with Applications 28 (2005) 667–671.

G.Salton, C.S.Yang, On the specification of term values in automatic indexing, Journal of Documentation, 29 (1973) 351-372.

W.Zhang, T.Yoshida, A comparative study of TF-IDF, LSI and multi-words for text classification, Expert Systems with Applications 38 (2011) 2758–2765.

H.Han, G.Karypis, V.Kumar, Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification, PAKDD (2001) 53-65.

F.Sebastiani, Machine Learning in Automated Text Categorization, Consiglio Nazionale delle Ricerche, 2002.

H.Jiang, P.Li, X.Hu, S.Wang, An improved method of term weighting for text classification, Intelligent Computing and Intelligent Systems,2009.

J. T.-Y. Kwok, Automatic Text Categorization Using Support Vector Machine, Proceedings of International Conference on Neural Information Processing, (1998) 347-351.

M.Miah, Improved k-NN Algorithm for Text Classification, DMIN (2009) 434-440.

Y.Liao, V. Rao Vemuri, Using K-Nearest Neighbor Classifier for Intrusion Detection, Department of Computer Science, University of California, Davis One Shields Avenue, CA 95616.

L.Wang , X. Zhao, Improved KNN classification algorithms research intext categorization, IEEE, 2012.

M.Lan, C.L.Tan, J.Su, Y.Lu, Supervised and Traditional Term Weighting Methods for Automatic Text Categorization, IEEE Transactions on Pattern Analysis and Machine Intelligence, VOL. 31, NO. 4, 2009.

K. Mikawa, T. Ishidat, M.Goto, A Proposal of Extended Cosine Measure for Distance Metric Learning in Text Classification, 2011.

l.Wang , X. Li, An improved KNN algorithm for text classification, 2010.

G. Guo, H.Wang, D.Bell, Y. Bi, K. Greer, KNN Model-Based Approach in Classification, (2003) 986 – 996.


Refbacks

  • There are currently no refbacks.



International Journal of Informatics and Information Systems
2579-7069 (online)
Published by Bright Publisher
Puri Mersi Baru, Jl.Martadireja II, Gang Sitihingil 3 Blok A No 2, Purwokerto Timur, Jawa Tengah
Website : http://ijiis.org
Email : info@ijiis.org

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0