Python - 为什么使用除uuid4()之外的任何东西来获取唯一字符串?

[英]Python - Why use anything other than uuid4() for unique strings?


I see quit a few implementations of unique string generation for things like uploaded image names, session IDs, et al, and many of them employ the usage of hashes like SHA1, or others.

我看到退出了一些独特的字符串生成实现,比如上传的图像名称,会话ID等等,其中许多都使用像SHA1或其他类似的哈希。

I'm not questioning the legitimacy of using custom methods like this, but rather just the reason. If I want a unique string, I just say this:

我不是在质疑使用这样的自定义方法的合法性,而只是原因。如果我想要一个独特的字符串,我只想说:

>>> import uuid
>>> uuid.uuid4()
UUID('07033084-5cfd-4812-90a4-e4d24ffb6e3d')

And I'm done with it. I wasn't very trusting before I read up on uuid, so I did this:

而且我已经完成了它。在我读到uuid之前,我不是很信任,所以我这样做了:

>>> import uuid
>>> s = set()
>>> for i in range(5000000):  # That's 5 million!
>>>     s.add(str(uuid.uuid4()))
...
...
>>> len(s)
5000000

Not one repeater (I wouldn't expect one now considering the odds are like 1.108e+50, but it's comforting to see it in action). You could even half the odds by just making your string by combining 2 uuid4()s.

不是一个中继器(我不认为现在考虑的可能性是1.108e + 50,但看到它在行动中令人欣慰)。通过组合2个uuid4()来制作你的弦乐,你甚至可以获得一半的赔率。

So, with that said, why do people spend time on random() and other stuff for unique strings, etc? Is there an important security issue or other regarding uuid?

那么,有了这个说,为什么人们花时间在随机()和其他东西上寻找独特的字符串等?关于uuid是否存在重要的安全问题?

6 个解决方案

#1


20  

Using a hash to uniquely identify a resource allows you to generate a 'unique' reference from the object. For instance, Git uses SHA hashing to make a unique hash that represents the exact changeset of a single a commit. Since hashing is deterministic, you'll get the same hash for the same file every time.

使用哈希来唯一标识资源允许您从对象生成“唯一”引用。例如,Git使用SHA散列来创建一个唯一的散列,表示单个提交的确切变更集。由于散列是确定性的,因此每次都会为同一个文件获取相同的散列值。

Two people across the world could make the same change to the same repo independently, and Git would know they made the same change. UUID v1, v2, and v4 can't support that since they have no relation to the file or the file's contents.

世界各地的两个人可以独立地对相同的回购做出同样的改变,而Git会知道他们做出了同样的改变。 UUID v1,v2和v4不支持,因为它们与文件或文件的内容无关。

#2


12  

Well, sometimes you want collisions. If someone uploads the same exact image twice, maybe you'd rather tell them it's a duplicate rather than just make another copy with a new name.

好吧,有时你想要碰撞。如果有人上传相同的图像两次,也许您宁愿告诉他们这是一个重复,而不是只是用新名称制作另一个副本。

#3


5  

One possible reason is that you want the unique string to be human-readable. UUIDs just aren't easy to read.

一个可能的原因是您希望唯一字符串是人类可读的。 UUID只是不容易阅读。

#4


3  

uuids are long, and meaningless (for instance, if you order by uuid, you get a meaningless result).

uuids很长,没有意义(例如,如果你用uuid命令,你会得到毫无意义的结果)。

And, because it's too long, I wouldn't want to put it in a URL or expose it to the user in any shape or form.

并且,因为它太长了,我不想将它放在URL中或以任何形式或形式将其暴露给用户。

#5


1  

In addition to the other answers, hashes are really good for things that should be immutable. The name is unique and can be used to check the integrity of whatever it is attached to at any time.

除了其他答案之外,散列对于应该是不可变的事物非常有用。该名称是唯一的,可用于随时检查其附加的任何内容的完整性。

#6


1  

Also note other kinds of UUID could even be appropriate. For example, if you want your identifier to be orderable, UUID1 is based in part on a timestamp. It's all really about your application requirements...

另请注意,其他类型的UUID甚至可能是合适的。例如,如果您希望标识符可订购,则UUID1部分基于时间戳。这完全取决于您的应用要求......


注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2010/03/12/38ca56979676cf04d6c649ecb8950116.html



 
© 2014-2018 ITdaan.com 粤ICP备14056181号