Register
Email: Password:
Forum » Extensible Random Name Generator

Extensible Random Name Generator

Narvius 13 years ago
A Ruby script I wrote quite some time back (that was before the RPN parser, even).
Probably the most complex potentially useful thing I wrote in Ruby. The only potentially useful thing.

class StringGenerator
# Parses an input file. If none given, it parses the DATA lines instead (see bottom of file).
def initialize(filename)
file, @patterns, @groups, @capitalize = (filename == nil ? DATA.readlines : IO::readlines(filename)), Array.new, Hash.new([""], true
file.each do |line|
@groups[line[/ \w /].strip] = eval("%w" + line[line.index(/ \w /) + 2, line.length - 1].strip) if line[/^group /]
@patterns << line[7, line.length - 1].strip if line[/^pattern /]
end
end

# Generates one entry. Is also used to evaluate subexpressions in forks.
def generate(pattern, final = true)
pattern = @patterns[rand(@patterns.size)] if pattern == nil
result, iterator = "", 0
while iterator < pattern.length
case pattern[iterator]
when '-' # Does nothing.
when '~' then @capitalize = false
when '^' then @capitalize = true
when '/'
iterator += 1
(result += pattern[iterator]; iterator += 1; @capitalize = false) while pattern[iterator] != '/'
when ':' then result, @capitalize = result + " ", true
when '('
substring, iterator, depth = "", iterator + 1, (pattern[iterator + 1] == '(') ? (2) : (1)
while depth > 0
iterator, substring = iterator + 1, substring + pattern[iterator]
depth += 1 if pattern[iterator] == '('
depth -= 1 if pattern[iterator] == ')'
end
result += generate(random_expression(substring), false)
else
if @capitalize
result, @capitalize = result + "#{@groups[pattern[iterator]][rand(@groups[pattern[iterator]].length)]}".split(' ').collect { |s| s.capitalize + " " }.join.strip, false
else
result += "#{@groups[pattern[iterator]][rand(@groups[pattern[iterator]].length)]}".downcase
end
end
iterator += 1
end
(@capitalize = true; result.strip!) if final
return result
end

# Parses forks and picks one random subexpression from them.
def random_expression(string)
iterator, sub, depth, result = 0, "", 0, Array.new
until iterator == string.length
sub += string[iterator, 1]
depth += 1 if string[iterator, 1] == '('
depth -= 1 if string[iterator, 1] == ')'
iterator += 1
(result << sub; sub = ""; iterator += 1 unless string[iterator] == '(') if string[iterator] == '|' and depth == 0
end
result << sub
return result[rand(result.length)]
end
end

if ARGV.size == 0
puts "\nnamegen [-i path] [-o path] [-p pattern] [-t number]\n\n"
puts "-i [path] \t Defines the path of the input file. For more information, use the -h switch. If none provided, the default (example) is used."
puts "-o [path] \t Redirects the output from the console to the file with the given path."
puts "-p [pattern] \t Use a specific pattern for generating. For information on pattern syntax, use the -h switch."
puts "-t [number] \t Defines how many lines of output should be generated.\n\n"
puts "-h \t\t Prints a full explanation on how to use the generator."
elsif ARGV.include? "-h"
puts DATA.readlines
else
gen, i, p, result = nil, 1, nil, ""
gen = StringGenerator.new(ARGV.include?("-i") ? ARGV[ARGV.index("-i") + 1] : nil)
i = ARGV[ARGV.index("-t") + 1].to_i if ARGV.include? "-t"
p = ARGV[ARGV.index("-p") + 1] if ARGV.include? "-p"
i.times { |i| result += gen.generate(p) + "\n" }
if ARGV.include? "-o"
File.open(ARGV[ARGV.index("-o") + 1], "w") { |file| file << result }
else
puts result
end
end

__END__

The generator works using patterns defined in an external file. Patterns describe how predefined
groups of words / syllables / letters are used to form the final result.

In each file, there should be both group and pattern definitions.
Scheme of a group definition:

group i ( word1 word2 word3 a b c kek kor word4 qqqqq)

[Note:
From there on, the term "word" will be used for any chain of characters that forms one entry in
one of the defined groups. So "word1" is a word, "a" is a word, and so is "kek" or "qqqqq".]

Every group definition starts with the group keyword - in it's own line, and without any preceding
whitespaces, tabs or whatever. Then comes the group identifier, a single, case-sensitive
alphanumeric character, and after that a list of words enclosed by parentheses. Each group
definition must occupy exactly one line.

Usually each word seperated by a space is a new word, but if you want to add expressions with more
than one word to the list (like the "very long" in the example group "a"), escape the space, ie.
precede it with a backslash \.

Some pre-defined example groups:

group a ( short normal long very\ long )
group b ( sword spear )
group c ( destruction doom death )
group d ( fiery frozen )

group e ( ka ne wi do ru so mos )

[Note:
When you define multiple groups with the same ID, the one defined last will be used. This is
just a side-effect of how the script works, not actual design, though.]

Pattern definitions are somewhat more complex, but still not really hard.

pattern a:b

Each alphanumeric character in the pattern (in this case "a" and "b") is substituted with a random
word from the corresponding group. This pattern may, for example, generate "Short Spear" using the
predefined groups. The colon ":" is substituted with a single space.

pattern ab

[Note:
This may generate the highly nonsensical "Very Longspear". But fantasy item names tend to be
silly anyways, so who cares ]

This pattern would therefore generate things like "Shortspear", without the space.
In addition, you can also add constant words to patterns, string literals.
They look like this: /word/.

pattern b:/of/:c

This could generate "Spear of Doom". Unlike after colons, words after string literals are NOT
capitalized. Therefore
pattern b:/of /c
would generate "Spear of doom", not "Spear of Doom".

Forks are the most complex (but still not hard to comprehend) feature in patterns.

pattern (a|d):b

(a|d) is a fork. It contains two expressions, "a" and "d". Pipes | seperate each expression. A
fork may contain any amount of expression, but at least one (in which case it is pointless....
When a fork is encountered, a random expression is chosen, with an equal chance for each to be
picked. The example may therefore evaluate to either

pattern a:b

or

pattern d:b

Expressions in forks can be of any complexity, they even may contains other forks (which in turn
may contain other forks, and so on).

There is another symbol, specifically to be used with forks: the dash -. It means exactly nothing.
It's purpose is to allow empty expressions in forks, as shown in the example.

pattern b:(/of/:c|-)

[Note:
When multiple patterns are defined in a file, a random one is used.]

The generated string are capitalized using very simple rules:
1. The first word and words after colons (:, ie. spaces) are capitalized.
2. All others are not.

You can modify the capitalization with two special operators:
~ Don't capitalize the next word regardless of rules.
^ Capitalize the next word regardless of rules.
They have no influence on string literals, which are copied as-is.

Some pre-defined patterns:

[Note:
All the above patterns and the first example group do not count as defined, because they have
tabs in front of them. Also, there may be no line breaks within groups or patterns.]

This pattern generates a name using the syllables in group e, with a length of 2-4 syllables.
pattern ee(-|e|ee)

A simple item generating patern with variable length.
pattern (a|d):b(:/of/:c|-)

A rather complex pattern generating named weapons with an optional "of xxx" extension. It shows
off pretty much all functionality of the script.
pattern /"/^(dc|ee(-|e|ee))/", the/:cb:(/of/:(~d:|-)c|-)

[Final Note:
The generated results will be very monotonous because of the limited vocabulary. It only serves
as an example. Try to extend it on your own in order to draw any actual use from it. Also, as
you probably noticed, it's safe to write comments in input files. They somewhat slow down the
script if there are a lot of lines, as in a few millions. You could also write them without the
tab, but it's safer to use tabs, because then, if you accidentally start a line with group or
pattern, nothing bad happens.]

Pattern Operator Cheat Sheet:
: space
/ start or end string literal
( start a fork
| start new expression in fork
) end fork
- do nothing (used in forks)
^ capitalize next word
~ don't capitalize next word

-- BY NARVIUS


(yes, all that text at the end is part of the script. The file was originally called namegen.rb, hence th "namegen" given as command in -h. 80 lines of code, including comments + 130 lines of tutorial text)

Using my extremely overloaded input file...

Myohyohiri, the Dusty Duchess of Electrocution
Red Trumpet of the Short Avatar
Piercing Plush Backpack
Hetahuyi, the Smart Douchebag of Ice
Retarded Coding Stone Pan of Evolution
Konuirya, the Turquoise Robot of Fatality
"Meltingstones", the Pen of Wind
Dancing Statue of Hats
"Swearingwater", the Healthrifle of the Blinking Housewive's Piercing Ice
Wahireryoeshu, the Short Bandit of Investigation
Misty Microphone of Cussing Rust
Stone Mittens of Revenge
Snoring Statue of Failing Lightning
Kyuhekyosi, the Powerful Fag of Destruction
Sowo, the Wiggly Sphinx of Earth


No, they make no sense.
#
E_net4 13 years ago
Yes, this thing is awesome!
I hope you don't wish us to recreate the effect on other languages (which is what we've been doing)... without explaining us the algorithm.
#
Narvius 13 years ago
It's explained in-depth in the help file.
Nah, I actually just wanted to share it. But it'd probably fun to see something more complicated translated. :>
I imagine the parsing is just as easy to do in Python, but I'm afraid it will be a pain in strictly-typed OO languages.
I might attempt a Racket (former PLT Scheme) implementation at some point in the future, I guess.

Also, there're a few things to iron out... for example, in multi-word datums (like "very long" from the example) the second and beyond words are copied as-is, not capitalized automatically, like the rest.
#
MageKing17 13 years ago
"Narvius" said:
It's explained in-depth in the help file.
Nah, I actually just wanted to share it. But it'd probably fun to see something more complicated translated. :>
I imagine the parsing is just as easy to do in Python, but I'm afraid it will be a pain in strictly-typed OO languages.
I might attempt a Racket (former PLT Scheme) implementation at some point in the future, I guess.
Starting Python version... now.

"Narvius" said:
Also, there're a few things to iron out... for example, in multi-word datums (like "very long" from the example) the second and beyond words are copied as-is, not capitalized automatically, like the rest.
Do you want me to do that in the Python version? It wouldn't be hard (Python has a function "string.title()" that capitalizes every word).

EDIT: Nevermind, already done (also, either your original code capitalized each word by hand, or I somehow added that capability while porting it (and then replaced it with a .title() call)).

import re
import random
import sys

def string_to_list(string):
result = []
current = ""
escaped = False
string = string[string.index("(")+1:string.index(")")] # If they stick stuff outside the parentheses... too bad!
for char in string:
if char == "\\": escaped = True
elif char == " " and not escaped:
if current:
result.append(current)
current = ""
else:
current += char
escaped = False
if current: result.append(current)
return result

class StringGenerator(object):
def __init__(self, filename):
if filename:
with open(filename, "r") as f:
file = f.readlines()
else:
file = DATA.split("\n")
self.patterns, self.groups, self.capitalize = [], {}, True
for line in file:
if line.startswith("group "): # No need to use regexps to see what a line starts with.
match = re.search(r" \w ", line) # re.search() is the one that finds stuff in the middle of the line.
# Also, the reason we don't use parentheses (to create a group) is because group 0 is always "the matched text" regardless of grouping.
if not match: continue # Account for lines that just say "group " and nothing else!
self.groups[match.group(0).strip()] = string_to_list(line[match.regs[0][1]:].strip()) # No equivilent of %w in Python (that I know of).
elif line.startswith("pattern "): self.patterns.append(line[8:].strip())

def generate(self, pattern=None, final=True):
if pattern == None: pattern = random.choice(self.patterns)
result, iterator = "", 0
while iterator < len(pattern):
if pattern[iterator] == "-": pass
elif pattern[iterator] == "~": self.capitalize = False
elif pattern[iterator] == "^": self.capitalize = True
elif pattern[iterator] == "/":
iterator += 1
while pattern[iterator] != "/":
result += pattern[iterator]
iterator += 1
self.capitalize = False
elif pattern[iterator] == ":": result, self.capitalize = result + " ", True
elif pattern[iterator] == "(":
substring, iterator, depth = "", iterator + 1, (2 if pattern[iterator + 1] == '(' else 1)
while depth > 0:
iterator, substring = iterator + 1, substring + pattern[iterator]
if pattern[iterator] == '(': depth += 1
if pattern[iterator] == ')': depth -= 1
result += self.generate(self.random_expression(substring), False)
else:
if self.capitalize:
result, self.capitalize = result + random.choice(self.groups[pattern[iterator]].title().strip(), False
else:
result += random.choice(self.groups[pattern[iterator]].lower()
iterator += 1
if final:
self.capitalize = True
result = result.strip()
return result

def random_expression(self, string):
iterator, sub, depth, result = 0, "", 0, []
while not iterator == len(string):
sub += string[iterator]
if string[iterator] == '(': depth += 1
if string[iterator] == ')': depth -= 1
iterator += 1
if iterator < len(string) and string[iterator] == "|" and depth == 0:
result.append(sub)
sub = ""
if not string[iterator] == "(": iterator += 1
result.append(sub)
return random.choice(result)

def main():
if len(sys.argv) == 1: # Always includes the name of the script.
print("\nnamegen.py [-i path] [-o path] [-p pattern] [-t number]\n\n")
print("-i [path] \t Defines the path of the input file. For more information, use the -h switch. If none provided, the default (example) is used.")
print("-o [path] \t Redirects the output from the console to the file with the given path.")
print("-p [pattern] \t Use a specific pattern for generating. For information on pattern syntax, use the -h switch.")
print("-t [number] \t Defines how many lines of output should be generated.\n\n")
print("-h \t\t Prints a full explanation on how to use the generator.")
elif "-h" in sys.argv:
print(DATA)
else:
gen, i, p, result = None, 1, None, ""
gen = StringGenerator(sys.argv[sys.argv.index("-i") + 1] if "-i" in sys.argv else None)
if "-t" in sys.argv: i = int(sys.argv[sys.argv.index("-t") + 1]
if "-p" in sys.argv: p = sys.argv[sys.argv.index("-p") + 1]
for _ in range(i):
result += gen.generate(p) + "\n"
if "-o" in sys.argv:
with open(sys.argv[sys.argv.index("-o") + 1], "w") as file:
file.write(result)
else:
print(result)

DATA = '''
The generator works using patterns defined in an external file. Patterns describe how predefined
groups of words / syllables / letters are used to form the final result.

In each file, there should be both group and pattern definitions.
Scheme of a group definition:

group i ( word1 word2 word3 a b c kek kor word4 qqqqq)

[Note:
From there on, the term "word" will be used for any chain of characters that forms one entry in
one of the defined groups. So "word1" is a word, "a" is a word, and so is "kek" or "qqqqq".]

Every group definition starts with the group keyword - in it's own line, and without any preceding
whitespaces, tabs or whatever. Then comes the group identifier, a single, case-sensitive
alphanumeric character, and after that a list of words enclosed by parentheses. Each group
definition must occupy exactly one line.

Usually each word seperated by a space is a new word, but if you want to add expressions with more
than one word to the list (like the "very long" in the example group "a"), escape the space, ie.
precede it with a backslash \.

Some pre-defined example groups:

group a ( short normal long very\ long )
group b ( sword spear )
group c ( destruction doom death )
group d ( fiery frozen )

group e ( ka ne wi do ru so mos )

[Note:
When you define multiple groups with the same ID, the one defined last will be used. This is
just a side-effect of how the script works, not actual design, though.]

Pattern definitions are somewhat more complex, but still not really hard.

pattern a:b

Each alphanumeric character in the pattern (in this case "a" and "b") is substituted with a random
word from the corresponding group. This pattern may, for example, generate "Short Spear" using the
predefined groups. The colon ":" is substituted with a single space.

pattern ab

[Note:
This may generate the highly nonsensical "Very Longspear". But fantasy item names tend to be
silly anyways, so who cares ]

This pattern would therefore generate things like "Shortspear", without the space.
In addition, you can also add constant words to patterns, string literals.
They look like this: /word/.

pattern b:/of/:c

This could generate "Spear of Doom". Unlike after colons, words after string literals are NOT
capitalized. Therefore
pattern b:/of /c
would generate "Spear of doom", not "Spear of Doom".

Forks are the most complex (but still not hard to comprehend) feature in patterns.

pattern (a|d):b

(a|d) is a fork. It contains two expressions, "a" and "d". Pipes | seperate each expression. A
fork may contain any amount of expression, but at least one (in which case it is pointless....
When a fork is encountered, a random expression is chosen, with an equal chance for each to be
picked. The example may therefore evaluate to either

pattern a:b

or

pattern d:b

Expressions in forks can be of any complexity, they even may contains other forks (which in turn
may contain other forks, and so on).

There is another symbol, specifically to be used with forks: the dash -. It means exactly nothing.
It's purpose is to allow empty expressions in forks, as shown in the example.

pattern b:(/of/:c|-)

[Note:
When multiple patterns are defined in a file, a random one is used.]

The generated string are capitalized using very simple rules:
1. The first word and words after colons (:, ie. spaces) are capitalized.
2. All others are not.

You can modify the capitalization with two special operators:
~ Don't capitalize the next word regardless of rules.
^ Capitalize the next word regardless of rules.
They have no influence on string literals, which are copied as-is.

Some pre-defined patterns:

[Note:
All the above patterns and the first example group do not count as defined, because they have
tabs in front of them. Also, there may be no line breaks within groups or patterns.]

This pattern generates a name using the syllables in group e, with a length of 2-4 syllables.
pattern ee(-|e|ee)

A simple item generating patern with variable length.
pattern (a|d):b(:/of/:c|-)

A rather complex pattern generating named weapons with an optional "of xxx" extension. It shows
off pretty much all functionality of the script.
pattern /"/^(dc|ee(-|e|ee))/", the/:cb:(/of/:(~d:|-)c|-)

[Final Note:
The generated results will be very monotonous because of the limited vocabulary. It only serves
as an example. Try to extend it on your own in order to draw any actual use from it. Also, as
you probably noticed, it's safe to write comments in input files. They somewhat slow down the
script if there are a lot of lines, as in a few millions. You could also write them without the
tab, but it's safer to use tabs, because then, if you accidentally start a line with group or
pattern, nothing bad happens.]

Pattern Operator Cheat Sheet:
: space
/ start or end string literal
( start a fork
| start new expression in fork
) end fork
- do nothing (used in forks)
^ capitalize next word
~ don't capitalize next word

-- BY NARVIUS
-- PORTED TO PYTHON BY MAGEKING17
'''

if __name__ == "__main__":
main()


I would just like to add that inconsistent or bad code was ported as-is from Narvius's code (except stuff I wrote from scratch, of course, like string_to_list()).
#
Narvius 13 years ago
xD

Nice. I only glanced through the code, you're right, I excessively used Regexpes. Well, that usually happens when I learn something new (that's also probably part of why I started NarvBot some time back...)

There's a String#capitalize function in Ruby, which I use.

There's no bad and / or inconsistent code / style to be ported over >
(Okay... I *do* have a actually pretty strong tendency to compress my code as much as possible, potentially rendering it unreadable... but still!)

Note that the space escaping in groups is just a side effect of the Ruby special operator %w( ) which builds a string array (and is used here as a cheap, quick parser).
#
Amarth 13 years ago
I haven't looked in detail, but it should be silly easy to do in an OO language. It might be verbose and a lot of code, but not too hard. I might try a Scala implementation if I can work up the courage to start it and read through all the code above.
#
E_net4 13 years ago
I'm having a physics test tomorrow and a computer architecture test in the 14th.... Then I have to get ready for an exam in the 19th... Yeah, I'll work on a Java implementation after this date.

If I ever get to do it.
#
Narvius 13 years ago
Oh, by the way. For actual playaroundage:

group a ( burning freezing sparkling shining blooming exploding living glowing growing amazing laughing whining crying dying prancing dancing mutating crawling flying talking eating drowning reading singing knowing bubbling throwing shocking failing winning spinning disturbing riding cloning hacking programming coding riding shaking falling standing waiting slashing piercing existing googling ogling staring glaring melting hating loving screaming praying swearing cussing swimming diving cycling running sleeping snoring waking drinking blinking killing aching stunning stalking leaking losing looking loosing looping radiating terminating evolving raining spitting )
group b ( dead hot cold red blue yellow diminished removed mighty misty magical cyan magenta turquoise average powerful awesome unholy holy outrageous erroneous feeble broken big small giant tiny kind evil good bad tasty rotten delicious malicious dark light heavy bright elemental funny weird normal spiky wiggly fluorescent blind almighty ugly beautiful nice pretty legendary glorious radiant bloody wintry chilly creepy dusty magnetic static dynamic invisible invincible fast slow retarded smart unremarkable remarkable violent curly straight evolved cellular long short )
group c ( sword axe spear bow staff whip boomerang shuriken knuckleduster morningstar knife dagger katana wakizashi waraxe tanto wand pistol rifle gun bazooka uzi cannon book horn trumpet ocarina horseshoe syringe katar monitor shield clock arm spoon leg breath stove rod club bat chain nunchuk plate cookie cake toothbrush brush sponge anvil lamp pan toilet\ brush claws needle mallet hammer chair saw tooth backpack bottle handbag mittens boxing\ gloves candleholder saber pen ruler CD truck scissors statue candle keyboard lyre saxophone guitar microphone cable rope ship doughnut radiater bacon\ strip )
group d ( wooden iron steel silver mythril rubber liquid marble woollen plastic golden copper tin stone coal glass obsidian ether nether flesh energy hair magnesium calcium sugar candy chocolate skin leather paper silk cardboard plywood plexiglass hemp aluminium papyrus sand wax unobtainium bamboo peppermint noodle dough porcelain polyester nylon denim ore cotton crystal diamond ruby sapphire aquamarin quartz emerald aventurine amethyst lapis\ lazuli waffle caramel ice\ cream laser plush flower bone adamantium fur icing shell oil feather vinyl fibre fibre\ glass nuclear latex meteorite stardust dust teflon saliva titanium ashen cork cheese ham bacon )
group e ( doom life death destruction revenge inertia time fire ice water thunder wind air earth explosions fatality joy happiness fate flowers acid lightning plasma poison mana health speed velocity force resistance recursion repetition confusion certainty money power energy madness inception heaven hell clouds hats music bottles philosophy eyes noses mouths ears riches pleasure competition investigation faces sheathe google stones work procrastination gravity wisdom faith logic objections teleportations psychology nature circuits electricity electrocution pain reality radiation termination evolution reflection songs rust vacuum )
group f ( Prince Duke Princess Duchess King Queen Scholar Student Goddess God Grandfather Grandmother Father Mother Dog Cat Horse Hawk Tiger Lion Gorilla Wizard Hedgehog Knight Samurai Ninja Demon Angel Ginosaji Devil Mastermind Avatar Clown Hero Actor Idiot Wolf Writer Rider Hitchhiker Liar Witch Douchebag Retard Fag Noob Pro Troll Goblin Geek Nerd Hobbit Dwarf Elf Human Sphinx Vampire Werewolf Mafioso Corpse Zombie Mutant Homeless Rich Butler Maid Hyena Kangaroo Whippersnapper Criminal Policeman Guardian Universe Galaxy Star Planet System Brother Sister Bro Sis Cousin Nephew Niece Aunt Uncle Thinker Cowboy Cowbow Cowgirl Sheriff Bandit Pirate Killer Psycho Housewive Public\ Masturbator Serial\ Killer Guy Gal Groom Bride Golem Stalker Robot Winner Looser Terminator Titan Scientist )

group s ( a e i o u ka ke ki ko ku sa se si so su ta te ti to tu na ne ni no nu n ha he hi ho hu ma me mi mo mu ya ye yi yo yu ra re ri ro ru wa we wi wo wu kya kyu kyo sha shu sho cha chu cho nya nyu nyo hya hyu hyo mya myu myo rya ryu ryo )

pattern (/"/^(ae|be|ss(-|s|ss|sss))/", the/:|-)(b:|-)(a:|-)(d:|-)(c|f):(/of/:(/the/:(b:|-)(d:|-)(f|f/'s/:(a:|-)(e:|-)(c|e|f))|(a:|-)(d:|-)(b|-)e(:/and/:(a:|-)e|-))|-)


The result of two brainstorming sessions with my sister. :> Designed for silliness.
#
Amarth 13 years ago
I think I got the Scala version about right. You non-elegant defined grammar is quite confusing (especially the whole capitalization part) but it seems to be mostly working. I'm getting silly results anyway.

import scala.util.parsing.combinator.JavaTokenParsers
import scala.util.Random

abstract class NameGenParsed

case class Group(val id: String, val nouns: List[String] extends NameGenParsed

abstract case class Pattern extends NameGenParsed{
def generate(groups: List[Group] : String
}

case class StringPattern(val s: String) extends Pattern {
def generate(groups) = s
}

case class PatternList(val pats: List[Pattern] extends Pattern {
def generate(groups : List[Group]:String = {
val results = for(pat <- pats) yield pat.generate(groups)
if(results.isEmpty)
""
else
results.reduceLeft(_+_)
}
}

case class GroupID(val id: String) extends Pattern {
override def generate(groups): String = {
for (group <- groups)
group match {
case Group(mid,nouns) if mid == id => {
val r = new Random()
return nouns((r.nextFloat * nouns.length).toInt)
}
case _ => ()
}
return "ERROR: group " + id + " not found!"
}
}

case class OneOf(val choices: List[Pattern] extends Pattern {
override def generate(groups) : String = {
val r = new Random()
choices((r.nextFloat * choices.length).toInt).generate(groups)
}
}

case class Capitalized(val other: Pattern) extends Pattern {
override def generate(groups): String = {
val s = other.generate(groups)
if(s.length > 0)
s(0).toUpperCase + s.substring(1, s.length)
else
""
}
}

case class LowerCase(val other: Pattern) extends Pattern {
override def generate(groups): String = {
val s = other.generate(groups)
s(0).toLowerCase + s.substring(1, s.length)
}
}


object NameGen extends JavaTokenParsers {
override def skipWhitespace = false
def generator = rep(group | pattern)
def group = groupname ~ grouplist <~ "\\n*".r ^^ {case id~nouns => Group(id,nouns)}
def groupname = "group " ~> groupID
def grouplist = " ( " ~> repsep(noun," ") <~ " )"
def groupID = "[a-z]".r
def noun = "[a-zA-Z0-9]+(\\\\ [a-zA-Z0-9]*)*".r
def pattern = "pattern " ~> patternexpr <~ "\\n*".r
def patternexpr : Parser[Pattern] = replacepat ~ rep(cap | noncap) ^^
{case first~rest => PatternList(List(Capitalized(first)) ++ rest)}
def replacepat = groupIDpatt | fork | stringlit | implicitcap
def groupIDpatt = groupID ^^ {id => GroupID(id)}
def stringlit = "/" ~> text <~ "/" ^^ {t => StringPattern(t)}
def text = "[^/]+".r
def fork = "(" ~> repsep(forkpattern,"|") <~ ")" ^^ {l => OneOf(l)}
def forkpattern = nopexpr | extendedpatternexpr
def extendedpatternexpr = implicitcap | patternexpr
def nopexpr = "-" ^^ {_ => StringPattern("")}
def cap = explicitcap | implicitcap
def explicitcap = "^" ~> replacepat ^^ {a => Capitalized(a)}
def implicitcap : Parser[Pattern] = ":" ~> rep(replacepat) ^^
{l => PatternList(List(StringPattern(" "), Capitalized(PatternList(l))))}
def noncap = explicitnoncap | replacepat
def explicitnoncap = "~" ~> replacepat

def main(args : Array[String] {
val myin = "pattern (a|d):b(:/of/:c|-)"
parseAll(generator, input) match {
case Success(r,_) => println(doGenerate(r))
case x => println("Fail! " + x.toString)
}
}

def doGenerate(data: List[NameGenParsed] : String = {
val groups = data filter {case Group(_,_) => true; case _ => false}
val patterns = data -- groups
val r = new Random()
val pattern = patterns((r.nextFloat * patterns.length).toInt)
pattern.asInstanceOf[Pattern].generate(groups.asInstanceOf[List[Group]]
}

val input = """group a ( burning freezing sparkling shining blooming exploding living glowing growing amazing laughing whining crying dying prancing dancing mutating crawling flying talking eating drowning reading singing knowing bubbling throwing shocking failing winning spinning disturbing riding cloning hacking programming coding riding shaking falling standing waiting slashing piercing existing googling ogling staring glaring melting hating loving screaming praying swearing cussing swimming diving cycling running sleeping snoring waking drinking blinking killing aching stunning stalking leaking losing looking loosing looping radiating terminating evolving raining spitting )
group b ( dead hot cold red blue yellow diminished removed mighty misty magical cyan magenta turquoise average powerful awesome unholy holy outrageous erroneous feeble broken big small giant tiny kind evil good bad tasty rotten delicious malicious dark light heavy bright elemental funny weird normal spiky wiggly fluorescent blind almighty ugly beautiful nice pretty legendary glorious radiant bloody wintry chilly creepy dusty magnetic static dynamic invisible invincible fast slow retarded smart unremarkable remarkable violent curly straight evolved cellular long short )
group c ( sword axe spear bow staff whip boomerang shuriken knuckleduster morningstar knife dagger katana wakizashi waraxe tanto wand pistol rifle gun bazooka uzi cannon book horn trumpet ocarina horseshoe syringe katar monitor shield clock arm spoon leg breath stove rod club bat chain nunchuk plate cookie cake toothbrush brush sponge anvil lamp pan toilet\ brush claws needle mallet hammer chair saw tooth backpack bottle handbag mittens boxing\ gloves candleholder saber pen ruler CD truck scissors statue candle keyboard lyre saxophone guitar microphone cable rope ship doughnut radiater bacon\ strip )
group d ( wooden iron steel silver mythril rubber liquid marble woollen plastic golden copper tin stone coal glass obsidian ether nether flesh energy hair magnesium calcium sugar candy chocolate skin leather paper silk cardboard plywood plexiglass hemp aluminium papyrus sand wax unobtainium bamboo peppermint noodle dough porcelain polyester nylon denim ore cotton crystal diamond ruby sapphire aquamarin quartz emerald aventurine amethyst lapis\ lazuli waffle caramel ice\ cream laser plush flower bone adamantium fur icing shell oil feather vinyl fibre fibre\ glass nuclear latex meteorite stardust dust teflon saliva titanium ashen cork cheese ham bacon )
group e ( doom life death destruction revenge inertia time fire ice water thunder wind air earth explosions fatality joy happiness fate flowers acid lightning plasma poison mana health speed velocity force resistance recursion repetition confusion certainty money power energy madness inception heaven hell clouds hats music bottles philosophy eyes noses mouths ears riches pleasure competition investigation faces sheathe google stones work procrastination gravity wisdom faith logic objections teleportations psychology nature circuits electricity electrocution pain reality radiation termination evolution reflection songs rust vacuum )
group f ( Prince Duke Princess Duchess King Queen Scholar Student Goddess God Grandfather Grandmother Father Mother Dog Cat Horse Hawk Tiger Lion Gorilla Wizard Hedgehog Knight Samurai Ninja Demon Angel Ginosaji Devil Mastermind Avatar Clown Hero Actor Idiot Wolf Writer Rider Hitchhiker Liar Witch Douchebag Retard Fag Noob Pro Troll Goblin Geek Nerd Hobbit Dwarf Elf Human Sphinx Vampire Werewolf Mafioso Corpse Zombie Mutant Homeless Rich Butler Maid Hyena Kangaroo Whippersnapper Criminal Policeman Guardian Universe Galaxy Star Planet System Brother Sister Bro Sis Cousin Nephew Niece Aunt Uncle Thinker Cowboy Cowbow Cowgirl Sheriff Bandit Pirate Killer Psycho Housewive Public\ Masturbator Serial\ Killer Guy Gal Groom Bride Golem Stalker Robot Winner Looser Terminator Titan Scientist )
group s ( a e i o u ka ke ki ko ku sa se si so su ta te ti to tu na ne ni no nu n ha he hi ho hu ma me mi mo mu ya ye yi yo yu ra re ri ro ru wa we wi wo wu kya kyu kyo sha shu sho cha chu cho nya nyu nyo hya hyu hyo mya myu myo rya ryu ryo )
pattern (/"/^(ae|be|ss(-|s|ss|sss))/", the/:|-)(b:|-)(a:|-)(d:|-)(c|f):(/of/:(/the/:(b:|-)(d:|-)(f|f/'s/:(a:|-)(e:|-)(c|e|f))|(a:|-)(d:|-)(b|-)e(:/and/:(a:|-)e|-))|-)"""
}

This is quite a different approach to the whole idea. The main point here is that I defined a formal grammar to parse the data into an Interpreter pattern-style object hierarchy, then execute the 'generate' function on that hierarchy. This makes it theoretically easier to change the grammar, and to detect (and protect against) malformed input. Of course, a grammar more suited to this coding approach would yield a simpler implementation.
I want to draw attention to a few points. Everything that is preceded by "def" is a function, including all the patterns in the NameGen object. All the ~> || ^^ stuff are higher-order functions.
EVERYTHING is typed and checked at compile-time. For example, the function explicitnoncap (the last of the patterns) is a function with no arguments, returning a NameGen.Parser[Pattern]. This is not explicitly coded, the compiler derives it by itself. Yes, this made it slightly harder to write, but yes, this saved me from some embarrassing mistakes. Mostly, it forces you to think about what you are writing, which is not a bad thing.
I did not implement the whole argument-reading stuff. It's trivial to add, a simple match expressions in the main function.
I probably also needed to mix-in the whole select-a-random-element-from-a-list thing. I copy-pasted the code three times here. Okay, it's only two lines of code, but still.

I'm starting to like Scala. It's not quite my dream language, but it's got power I previously only attributed to Python (in the sense of adding succinct functional-programming style abilities), and combines it with the awesomeness of static type checking. A shame the Eclipse plugin is so goddamn slow or even wrong at times. And that the whole thing runs on the JVM. I'm not really a fan of the JVM. It's a nice idea, maybe, but it's outdated technology in the year 2011.
#
Narvius 13 years ago
At last a substantially different implementation.

But, just a note. Many languages already have powerful functional-programming style abilities.
#
MageKing17 13 years ago
"Narvius" said:
There's no bad and / or inconsistent code / style to be ported over >
(Okay... I *do* have a actually pretty strong tendency to compress my code as much as possible, potentially rendering it unreadable... but still!)
Actually, I found the whole thing quite readable. The only thing that screwed me up was the "%w" reference since Python has no equivilent (as far as I am aware), hence the string_to_list() function. The inconsistent thing I was thinking of in particular was the fact that in one case you say "while iterator < len(string)" and in another you say "while not iterator == len(string)" (except in ruby-speak... "until iterator == string.length" or somesuch). Both are equivilent, but the former accounts for data corruption where the iterator somehow magically moves too far (which can actually happen without corruption given that certain if-branches move the iterator forward farther).

Personally, I wouldn't use this program myself simply because it's a less-capable BNF string generator (or EBNF or something, more usefully). I'd rather just implement a proper grammar-based generator.

In fact, I'm going to write one right now. Excuse me.
#
Amarth 13 years ago
"Narvius" said:
But, just a note. Many languages already have powerful functional-programming style abilities.
When I said 'power I only attributed to', I meant 'from the languages I know (except Haskell)'. I know Ruby also has this, but there's no big advantage in using Ruby over Python for me, they're both dynamically typed. Duck typing versus monkey-patching is not that big a difference. And Ruby is Japanese, ick.

I should probably take a look at D, though.
#
Narvius 13 years ago
In fact, I didn't mean Ruby (as this is obvious). Even C# has functions as first-class objects and higher-order functions already! The most awesome one is still Scheme, though.

Anyways, I see this as a challenge to write something that might resemble a proper grammar-based generator! It might take a few decades, though. And I will probably write it in Scheme.
#
MageKing17 13 years ago
"Narvius" said:
Anyways, I see this as a challenge to write something that might resemble a proper grammar-based generator! It might take a few decades, though. And I will probably write it in Scheme.
Why on Earth would it take a few decades? I wrote a BNF generator in Java as a weekly assignment. It took me about a half hour. As soon as I start writing a generator in Python, I don't expect it to take more than an hour, and only because I plan to make it more complicated (not sure if I'm going to do EBNF or ABNF; EBNF looks simpler to implement, so probably EBNF. On the other hand, ABNF looks more flexible, and I'm a huge fan of flexibility).
#
Narvius 13 years ago
Because I have no idea how to tackle it. If that's not a problem I might poke you on IRC one day for some pointers, ke?
#
MageKing17 13 years ago
"Narvius" said:
Because I have no idea how to tackle it. If that's not a problem I might poke you on IRC one day for some pointers, ke?
You just do what your random name generator does, except you remove the difference between groups and patterns, and instead of picking a random pattern to generate, you ask the user which group to generate from (or pick the first one, probably as a default). That's a BNF parser right there, just with slightly different syntax.
#
Vacuus 13 years ago
"Amarth" said:
"Narvius" said:
But, just a note. Many languages already have powerful functional-programming style abilities.
When I said 'power I only attributed to', I meant 'from the languages I know (except Haskell)'. I know Ruby also has this, but there's no big advantage in using Ruby over Python for me, they're both dynamically typed. Duck typing versus monkey-patching is not that big a difference. And Ruby is Japanese, ick.

I should probably take a look at D, though.
Personally D resembled an immature, restricted C++ too much for me to put any sort of effort into it.

Don't get me wrong, it's a nice language and if the API was a little more expansive I'd be there in an instant but as it stands now, the average compilers, poor DLL support, lackluster API and generally bellow average third-party API's is enough to turn me off, atleast for a few years.

I'll work on getting a C++ implementation up n' going at some point (new job taking up all my time yaay), though it'll probably be different as I can't really be arsed reading the ruby/python code.
#
Forum » Extensible Random Name Generator

Post Reply


Your email:
Your name: