Umass coherence score

9/1/2023

This doesn't technically make your models perfectly reproducable, but even without the random seed settings, you will see your model perform better (at the cost of computation time) if you increase iterationsor passes. This is also a very useful, timesaving step that keeps you from having to re-run your models every time you start (very important for long-running models). Use ldamodel.save() and ldamodel.load() for model persistency Choose any number you like, but the default value of zero ("off") and use the same number for each rerun - this ensures that the same input into the random number generators always results in the same output (gensim ldamodel documentation). Use random_state in your model specificationĪfaik, all of the gensim methods have a way of specifying the random seed to be used. That being said, there are ways to ensure reproducability between model runs.įrom the Python documentation: "On Python 3.3 and greater, hash randomization is turned on by default." If that is what you want, gensim is the way to go - certainly not the only way to go ( tmtoolkit or sklearn also have LDA) but a pretty good choice of paths. It is a very powerful library that is used by many companies (Amazon and Cisco, for example) and academics (NIH, countless researchers) - to quote from gensim's About page:īy now, Gensim is-to my knowledge-the most robust, efficient and hassle-free piece of software to realize unsupervised semantic modelling from plain text. So calculating the coherence of these models will give you different results every time.īut that doesn't mean the library is worthless. Random number generators, by their definition, generate random stuff, so each model is slightly different. These steps and distribution generation depend on random number generators. Simplifying a little bit here, each time you use it, many Dirichlet distributions are generated, followed by inference steps.

But by its very nature, LDA is a generative probabilistic method. The scikit-learn library also has an LDA implementation, and theirs will also give you different results on each run. It is less a question of trust in the library, but an understanding of the methods involved. You can take other steps to improve your results. You can make LDA reproducible by setting random seeds and PYTHONHASHSEED=0. reproducible between runs - in this case because of fundamental LDA properties.

0 Comments

Umass coherence score

Leave a Reply.

Author

Archives

Categories