• Breaking News

    Thursday, June 14, 2018

    GamePad: "the application of machine learning methods to theorem proving in the Coq proof assistant." [abstract + link to PDF] Computer Science

    GamePad: "the application of machine learning methods to theorem proving in the Coq proof assistant." [abstract + link to PDF] Computer Science


    GamePad: "the application of machine learning methods to theorem proving in the Coq proof assistant." [abstract + link to PDF]

    Posted: 13 Jun 2018 09:03 PM PDT

    Attention Machines - Is this a branch of complexity theory?

    Posted: 13 Jun 2018 10:36 AM PDT

    This started with me telling my wife "I need my full attention on the road here", and then the internal contrarian asking me how true this was.

    I started asking myself how you'd go about measuring "fraction of attention" needed to model an object in real time, and came up with a little theoretical space that feels quite rich. It seems to me like this space must be some part of complexity theory that i've never heard of, because it seems both interesting and applicable.

    The idea here is that we could have a little branch of computational complexity theory that deals with models that reflect a changing reality. I feel like this has to exist, but I haven't been in academia for years.

    Typical complexity theory considers machines that inspect strings and make determinations: is this string in that language or not? Suppose we model "reality" using a string over some alphabet, with each consecutive character being the state of "reality" at a point in time. Then we can consider state machines machines as "things that observe a reality that changes its state over time."

    The key difference between this idea and the typical complexity theory is that, unlike a state machine or a turing machine, these models can't see all of reality, only part of it. I call these models "attention machines".

    Finite state automata are special cases of attention machines, that have the ability to see all of reality. Here are the definitions:

    An "n-dimensional reality over alphabet sigma" is a function from the natural numbers to length-n strings over sigma.

    The idea here is that we model reality as "some state that changes over time, driven by an arbitrary function". The natural numbers represent time; at t(0) there's some string, a different string at t(1), a different one at t(2).

    We can then talk about "attention machines" which react to a reality function by modifying their internal state. A regular expression becomes a simple case here, of a system that reacts to a 1-dimensional reality function.

    Instead of laying the string out in "space" and then running the machine over it "in space", the attention machine approach is to think of the string being laid out out in time. Each string over an alphabet corresponds to a single 1-dimensional reality function in that alphabet.

    For example, we can convert the string "cats" into the reality function

    0->c

    1->a

    2->t

    3->s

    We could talk about machines that react to changes in 'reality', over time, and thus compute some aspect of 'reality.' The word "reality" here might be problematic, but I don't know what else to call "The thing which is being observed by the attention machine".

    This case isn't super interesting. It's just regular expressions. We don't have a notion of 'time' when dealing with regular expressions, except to maybe say that they are worst case O(n). The attention machine idea comes out of this because

    • the state machine can "see" all of reality, but most computer systems cannot see all of the system they are observing

    • the state machine can update itself just as fast as reality changes, whereas most computer systems have limits on how rapidly they can modify their internal states.

    What happens if we have some system with a finite number of internal states, that's trying to model some portion of a larger system which changes
    * faster than the smaller system can keep up with, and *in ways that the smaller system can't see.

    Can we talk about limits or bounds on how accurate the Attention Machine's model can be?

    Let's say we have a two-dimensional reality function over [0-9], which means at each time t, the state of reality is given by a single two-digit number.

    Let's say as well that we have a state machine, but the state machine reads, as input, only one of the characters of the state of reality. This is where we diverge from everything I remember about complexity theory.

    If we give 'reality' as a changing, two-character string, and we have the attention machine ( our 'computer') being able to only observe, each time frame, one character. The clock ticks, reality changes according to the reality function, and the state machine updates again. We'll ignore the idea that the machine might update slower than reality, becuase just the first part alone (not seeing all of reailty) introduces a bunch of neat stuff.

    Let's say we could have the states of the state machine encode <em>which</em> bit of reality the state machine will be reading from. We specify the finite state machine as usual, with one extra piece of data per state: the dimension of the reality function that is used as the next input to the state machine. In the case of traditional FSM's, this last piece of data is always '0', as the traditional FSM looks only at one dimensional reality functions (i.e. they consider exactly on character at a time of a string.)

    Can we prove anything about how the limits of this kind of machine?

    "Recognizing a language" with a finite state machine means you get to see the whole shebang before making a decision. In attention machine terms, we can call recognizing a language "recognizing a reality function" . What if you get to see exactly <em>half</em> of the input before making a decision? You get to choose which bits you see, and you know that other bits are passing by, which you'll never see. What kind of limits does that impose on you?

    In our case, if the alphabet is 0-9 and we are dealing with two dimensions, i do not think we could build a state machine in this environment that was capable of recognizing sequences of incrementing integers. For example, the Attention machine might be looking at the ones digit, when the ten's digit decrements. The attention machine can't distinguish between that situation, and one where it's looking at the one's digit, and the ten's digit increments.

    So a two-dimensional attention machine of machine can't possibly determine 2-dimensional reality function "sequences of increasing integers"

    What CAN it determine?

    • Sequences that contain no odd numbers
    • Sequences that contain no multiples of 10
    • Seqences where all values are less than 30 (but it cannot determine "sequences where all values are less then 42!)
    • Sequences where even values greater than 10 are always followed by an odd number

    These seem contrived, i'm not sure there's any utility in saying these things specifically. There does seem to be some structure in here worth exploring, though.

    How much lookback do they have? Adding more states does seem to buy you more lookback history here. For example, if I want to recognize a reality function "All multiples of 5 are followed by at least 30 numbers ending in the digit 3", I could do this with ~30 states.

    What about "all multiples of 5 are followed by at least 30 numbers divisible by 3?"

    I don't think a 2-dimensional attention machine can solve this problem, because the numbers that are divisible by both five and three would require much more internal state than the attention machine has access to.. The word 'context' feels natural to use here -the attention machine can't store enough context internally to differentiate between "i saw a five, 29 3's, and then 40", vs. "I saw a five, 29 3's, and then 30", vs. "I saw a five, 30 3's, and then 30",

    The key idea is that "regular old FSM's" fall into this naturally as "Attention machines operating on 1-dimensional reality functions".

    We could get even crazier with this and ask "what if there are 5 dimensions to the reality function and the attention machine can read two dimension sat a time"

    In addition to asking "which reality functions can be recognized?", we could also ask questions about modelling a reality function. Say we have a 5-dimensional reality function over the alphabet of 3-digit integers. We might be representing 5 stocks whose values are changing over time. Suppose we want our computer to be computing the average value of these five stocks - if the stocks change faster than the state of the computer, can we place guarantees on the accuracy of the computed average?

    We can add on restrictions to "how fast the machine can update its internal state", by having the reality function changing faster than the machine can update.

    We could also talk about restricting the reality function as well: if it's a random function, then the only guarantees an attention machine could provide would be that the most recently measured character would be accurately reflected. What happens when you add on guarantees about the reality function - say, that lower dimensions change very slowly, and higher dimensions can change more rapidly? These restrictions both reflect reality (plate tectonics move slowly, but the local electromagnetic field changes very fast). These restrictions also add interesting tradeoffs in choices and techniques. Perhaps you ignore the rapidly changing higher dimensions (because any accuracy is short lived), focus some attention on the slower-changing dimensions (don't need to visit here too often because these things rarely change), and keep most of your focus on the middle bits, which you can track but also change rapidly.

    You could also chain attention machines together, modeling a computer as one fast, small attention machine (The registers) being modeled by a separate, slower attention machine (the RAM), which is then further modeled by an even larger, slower attention machine (the hard drive). You might inquiring about accuracy limits and throughput.

    With a little extra work, from here we can also talk about Turing machines as being attention machines operating infinite-dimensional reality functions. We'd need to impose restrictions on the structure of the reality function: it can only change in the part under the tape head of the Turing machine.

    Does this thing already exist? Or am I just totally out to lunch here?

    submitted by /u/gomboloid
    [link] [comments]

    Dank Learning: Generating Memes Using Deep Neural Networks

    Posted: 13 Jun 2018 07:56 AM PDT

    How to find an idea for Senior project?

    Posted: 13 Jun 2018 02:07 PM PDT

    We're a team of 5 rising seniors computer science students.

    We're struggling to find an idea for our senior project which should last for two semesters of work.

    Issues we're having:

    • Most of the ideas are either too simple for the period of two semesters or far too complex that we're afraid wee won't be able to complete
    • Some ideas are just dups and have been done zillions of times on github, we're trying mainly to solve a problem or provide a better solution for already solved problem.

    I'd appreciate any suggestion on how to find ideas or ideas themselves.

    submitted by /u/NeumannCracker
    [link] [comments]

    Git Without Fear for Human Beings

    Posted: 13 Jun 2018 05:22 PM PDT

    Ripple Introduces the University Blockchain Research Initiative

    Posted: 13 Jun 2018 09:15 AM PDT

    No comments:

    Post a Comment