Confusion regarding side-effects section of Google's MapReduce Research Paper Computer Science |
- Confusion regarding side-effects section of Google's MapReduce Research Paper
- What online courses provide 100% updated materials and worth to pay?
- Back of the envelope estimation hacks
- Made Another tutorial that teaches you How to Build a Real-Time International Space Station Tracker using Javascript! Very Well Explained!!
- Seeking Professional Advice - Is CS degree needed for design/development of websites and apps?
- Best current methods for clustering high-dimensional data on the fly, with additional data being added to the set?
- Non-Tech Talking in Tech Talk. What do you say?
- Nearest-Neighbor on Massive Datasets
- What are the hardware requirements for building and understanding simple AI?
- Latest from Microsoft mixed reality & AI lab researchers--- great applications for mixed reality: State of the art in 3D Model Fitting!
- Uber’s take on JVM tuning
- An Illustrated Data Structures Cheat Sheet with Working Code
- SAS vs R vs SPSS
- Agile Management & Methodology
- MS Excel Shortcuts
Confusion regarding side-effects section of Google's MapReduce Research Paper Posted: 02 Aug 2020 01:32 AM PDT I'm reading Google's MapReduce paper that was released several years ago, and I had a doubt in section 4.5 (Side-effects). So what I understand is that if a worker fails while writing the output of a map task, that task would be re-run on another worker. That would cause problems in-general for a non-deterministic program if one reduce task worker would have already read the output while another reduce task worker would be reading a different output that will be produced by the new worker. So why is there a separate mention for when multiple output files are produced. Following is a snippet of the section
I'm trying to wrap my head around this but I'm getting really confused. If anyone could help me out or give me a push in the right direction, I'd be really grateful. Thanks [link] [comments] |
What online courses provide 100% updated materials and worth to pay? Posted: 01 Aug 2020 03:35 PM PDT I need your honest recommendation/advice for the following brands. Thank you. (1) Lynda/LinkedIn (2) Linux Academy (3) Pluralsight (4) O'Reilly, and (5) egghead OBS! Who will provide a certificate after completing the course? [link] [comments] |
Back of the envelope estimation hacks Posted: 02 Aug 2020 03:56 AM PDT |
Posted: 02 Aug 2020 02:49 AM PDT You can read the tutorial here on my blog --> https://thecodingpie.com/post/build-a-real-time-iss-tracker-using-javascript/ Live Real-Time ISS Tracker made with Javascript Tried my best to break this tutorial into small steps, so that any beginner can understand it. Hope you like it :) As always, any feedback is accepted... [link] [comments] |
Seeking Professional Advice - Is CS degree needed for design/development of websites and apps? Posted: 02 Aug 2020 02:12 AM PDT |
Posted: 01 Aug 2020 01:23 PM PDT Years ago I did early research work in various forms of unsupervised learning, but I've been away from this area for a long time. I now have an application for some old work I did in this area -- but I'm trying to find what the state of the art is now. So: I have M instances of N-dimensional vectors (most likely between 10 and 20+ dimensions). I have no a priori idea how many data points there are, but I know it will grow over time. I'm looking to find the clusters in this data, though I can't predict how many clusters there are or how they might overlap -- so I can't pre-set a number of categories. I want the algorithm to be able to figure this out on the fly, and continue re-figuring as new data points are added to the set. I also want to be able to identify a new data point's identifying/categorical cluster, and quickly find other instances near it in N dimensions, in its same cluster or not, with a minimum of checking individual instances. My go-to for this (being ancient) is an evolutionary variant of Kohonen's LVQ3 algorithm, but I've toyed with K-means as well. Is this a known/solved problem? Are there different/better algorithms used for this now? And, if this isn't the place for a question like this, what's a good subreddit for discussing this? (I asked this in a weekly discussion thread in /r/dataisbeautiful as well; no responses thus far.) Thanks. [link] [comments] |
Non-Tech Talking in Tech Talk. What do you say? Posted: 01 Aug 2020 04:50 PM PDT What are some words that are not definitively technical such as constructor, parameter (words found in programming basically), but come up while talking programming anyway? Words that a layman might not hear otherwise? i.e. I find I use syntax often, but I could probably teach someone to code without it. I can't say the same for a word like variable. I also remember from about a decade ago, the first word my CS teacher ever gave us the definition to was jargon, which also fits. Can you think of any? [link] [comments] |
Nearest-Neighbor on Massive Datasets Posted: 01 Aug 2020 12:32 PM PDT In a previous article, I introduced an algorithm that can cluster a few hundred thousand N-dimensional vectors in about a minute or two, depending upon the dataset, by first compressing the data down to a single dimension. The impetus for that algorithm was thermodynamics, specifically, clustering data expanding about a point, e.g., a gas expanding in a volume. That algorithm doesn't work for all datasets, but it is useful in thermodynamics, and probably object tracking as well, since it lets you easily identify the perimeter of a set of points. Below is a full-blown clustering algorithm that can nonetheless handle enormous datasets efficiently. Specifically, attached is a simple classification example consisting of two classes of 10,000, 10-dimensional vectors each, for a total of 20,000 vectors. The classification task takes about 14 seconds, running on an iMac, with 100% accuracy. In addition to clustering the data, a compressed representation of the dataset is generated by the classification algorithm, that in turn allows for the utilization of the nearest-neighbor method, which is an incredibly efficient method for prediction, that is in many real world cases, mathematically impossible to beat, in terms of accuracy. Said otherwise, even though nearest-neighbor is extremely efficient, with a dataset of this size, it could easily start to get slow, since you are still comparing an input vector to the entire dataset. As a result, this method of clustering allows you to utilize nearest-neighbor on enormous datasets, simply because the classification process generates a compressed representation of the entire dataset. In the specific case attached below, the dataset consists of 20,000 vectors, and the compressed dataset fed to the nearest-neighbor algorithm consists of just 4 vectors. Classification predictions occurred at a rate of about 8,000 predictions per second, with absolutely no errors at all, over all 20,000 vectors. https://derivativedribble.wordpress.com/2020/08/01/nearest-neighbor-on-massive-datasets/ [link] [comments] |
What are the hardware requirements for building and understanding simple AI? Posted: 01 Aug 2020 01:25 PM PDT Idk if this is the right place to post, but I was hoping you guys could answer my question. *Hardware for pc [link] [comments] |
Posted: 01 Aug 2020 10:57 AM PDT |
Posted: 01 Aug 2020 10:23 AM PDT |
An Illustrated Data Structures Cheat Sheet with Working Code Posted: 01 Aug 2020 05:24 AM PDT |
Posted: 01 Aug 2020 06:09 AM PDT |
Agile Management & Methodology Posted: 01 Aug 2020 06:08 AM PDT |
Posted: 01 Aug 2020 06:09 AM PDT |
You are subscribed to email updates from Computer Science: Theory and Application. To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google, 1600 Amphitheatre Parkway, Mountain View, CA 94043, United States |
No comments:
Post a Comment